What makes a .changes file source-only?

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

What makes a .changes file source-only?

Sean Whitton
Hello,

(1)

I am teaching dgit to verify whether a .changes file is source-only.
This is for a new 'push-source' subcommand.

My first attempt simply looked for any .deb or .udeb entries in the
Files: field.  However, dgit's maintainer would prefer a strict
whitelist: check that each entry in Files: is a .dsc, or an .orig.tar.*,
or a .debian.tar.*, etc.

Is this the preferred way to confirm whether a .changes file is
source-only?

(2)

We are also thinking about a strict whitelist for all .changes files --
the whitelist mentioned above, plus *.deb, *.udeb etc.

Are there currently any plans to add new categories of binary files to
uploads, that we should include in our whitelist?

(3)

We observed that .buildinfo files are included in purportedly
source-only changes files by `dpkg-buildpackage -S`.

Is this correct?  Why are they included in source-only uploads?

Thanks!

(please keep me CCed; I am not subscribed to this list)

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only?

Guillem Jover
Hi!

On Tue, 2017-05-23 at 18:17:18 +0100, Sean Whitton wrote:
> (1)
>
> I am teaching dgit to verify whether a .changes file is source-only.
> This is for a new 'push-source' subcommand.

Ah, interesting question. :)

IMO, in theory, a source-only .changes is primarily defined by its
Architecture field containing only "source" as value. As a consequence
of only containing references to a .dsc and any further file referenced
within.

Even though this might seem backwards, my reasoning is that the
Architecture field values are extremely well defined, while going
based on filenames requires extension scrapping, which while also
well defined always seems a bit icky to me.

Of course, in practice, if going just by the Architecture field, you
need to trust that the software generating the .changes (and the .dsc)
is not buggy, and the entity that commissioned its creation is not
trying to bypass the checks. But for BY-HAND artifacts that do not
follow the well defined name_version_arch.type filename form, then
this will not be represented in the Architecture field, which is
something that should probably be fixed by annotating the field with
some value (probably the host architecture to be conservative).

Also, even though I could imagine someone injecting non-source artifacts
from within the debian/rules clean target even for source only builds,
I'd consider that to be just broken.

But, if your intention is to verify the .changes file, you might as
well perform more extensive sanity checks to be extra sure.

> My first attempt simply looked for any .deb or .udeb entries in the
> Files: field.  However, dgit's maintainer would prefer a strict
> whitelist: check that each entry in Files: is a .dsc, or an .orig.tar.*,
> or a .debian.tar.*, etc.

> Is this the preferred way to confirm whether a .changes file is
> source-only?

Yeah, the former would miss .ddebs (used only on Ubuntu and
derivatives), the old proposed .tdebs (although I don't think this got
much buy-in), or probably other custom Package-Type(s). It would also
miss BY-HAND artifacts, as injected by dpkg-distaddfile.

For source-only, I think going by a whitelist is indeed more sensible,
but I'd just check whether there is a .dsc, and whether the rest of the
references in the .changes are just the files referenced in the .dsc.

> (2)
>
> We are also thinking about a strict whitelist for all .changes files --
> the whitelist mentioned above, plus *.deb, *.udeb etc.
>
> Are there currently any plans to add new categories of binary files to
> uploads, that we should include in our whitelist?

As mentioned above, this would not cover BY-HAND artifacts, and other
current or future Package-Type(s).

OOC, what would be the purpose of checking what is shipped on a binary
upload?

> (3)
>
> We observed that .buildinfo files are included in purportedly
> source-only changes files by `dpkg-buildpackage -S`.
>
> Is this correct?  Why are they included in source-only uploads?

Yes, this is correct, although I've noticed again (as I did in the
past but seem to have forgotten) that the dpkg-buildpackage man page
is out-of-sync regarding this, which I'll be fixing. In any case the
other day I just added a FAQ entry, given that this seems a recurring
question. :)

  <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F>

If there's anything that does not seem to hold, or is unclear I'm happy
to clarify it further.

Thanks,
Guillem

Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only?

Sean Whitton
Hello Guillem and Ian,

Thank you for such a quick reply!

On Wed, May 24, 2017 at 01:57:54AM +0200, Guillem Jover wrote:

> IMO, in theory, a source-only .changes is primarily defined by its
> Architecture field containing only "source" as value. As a consequence
> of only containing references to a .dsc and any further file referenced
> within.
>
> Even though this might seem backwards, my reasoning is that the
> Architecture field values are extremely well defined, while going
> based on filenames requires extension scrapping, which while also
> well defined always seems a bit icky to me.
>
> Of course, in practice, if going just by the Architecture field, you
> need to trust that the software generating the .changes (and the .dsc)
> is not buggy, and the entity that commissioned its creation is not
> trying to bypass the checks. But for BY-HAND artifacts that do not
> follow the well defined name_version_arch.type filename form, then
> this will not be represented in the Architecture field, which is
> something that should probably be fixed by annotating the field with
> some value (probably the host architecture to be conservative).
So BY-HAND .changes could have "Architecture: source" but contain binary
files?  If so, we would definitely need to check all file extensions,
instead of relying on the Architecture field.

> For source-only, I think going by a whitelist is indeed more sensible,
> but I'd just check whether there is a .dsc, and whether the rest of the
> references in the .changes are just the files referenced in the .dsc.

That's a great idea.  By checking against the .dsc, we can avoid having
to update dgit if a new source format is added.  Though, we would also
need to permit the .buildinfo, which is not part of the .dsc

> OOC, what would be the purpose of checking what is shipped on a binary
> upload?

I think Ian just saw the opportunity to add a new sanity check to dgit.

Ian: given that BY-HAND uploads could contain a lot of different kinds
of file, maybe we should stick to the option of only checking purported
source-only .changes files, abandoning this extra check?

> > (3)
> >
> > We observed that .buildinfo files are included in purportedly
> > source-only changes files by `dpkg-buildpackage -S`.
> >
> > Is this correct?  Why are they included in source-only uploads?
>
> Yes, this is correct, although I've noticed again (as I did in the
> past but seem to have forgotten) that the dpkg-buildpackage man page
> is out-of-sync regarding this, which I'll be fixing. In any case the
> other day I just added a FAQ entry, given that this seems a recurring
> question. :)
>
>   <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F>
Thank you for confirming this, and for the FAQ link.

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only? [and 1 more messages]

Ian Jackson-2
In reply to this post by Guillem Jover
Guillem Jover writes ("Re: What makes a .changes file source-only?"):
> IMO, in theory, a source-only .changes is primarily defined by its
> Architecture field containing only "source" as value. As a consequence
> of only containing references to a .dsc and any further file referenced
> within.
>
> Even though this might seem backwards, my reasoning is that the
> Architecture field values are extremely well defined, while going
> based on filenames requires extension scrapping, which while also
> well defined always seems a bit icky to me.

That's why I thought we should ask.

> Of course, in practice, if going just by the Architecture field, you
> need to trust that the software generating the .changes (and the .dsc)
> is not buggy, and the entity that commissioned its creation is not
> trying to bypass the checks. But for BY-HAND artifacts that do not
> follow the well defined name_version_arch.type filename form, then
> this will not be represented in the Architecture field, which is
> something that should probably be fixed by annotating the field with
> some value (probably the host architecture to be conservative).

I think this is a bug, then, in dpkg.  If Architecture is `source',
then there shuld not be any by-hand artifacts.

> Also, even though I could imagine someone injecting non-source artifacts
> from within the debian/rules clean target even for source only builds,
> I'd consider that to be just broken.

It's rather weird that a source package build still looks at the
debian/files.

> For source-only, I think going by a whitelist is indeed more sensible,
> but I'd just check whether there is a .dsc, and whether the rest of the
> references in the .changes are just the files referenced in the .dsc.

I like this suggestion.  Thanks.

But:

> > We observed that .buildinfo files are included in purportedly
> > source-only changes files by `dpkg-buildpackage -S`.
> >
> > Is this correct?  Why are they included in source-only uploads?
>
> Yes, this is correct, although I've noticed again (as I did in the
> past but seem to have forgotten) that the dpkg-buildpackage man page
> is out-of-sync regarding this, which I'll be fixing. In any case the
> other day I just added a FAQ entry, given that this seems a recurring
> question. :)
>
>   <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F>
>
> If there's anything that does not seem to hold, or is unclear I'm happy
> to clarify it further.

Well, of course, your suggestion to check the .dsc against the
.changes will trip over the .buildinfo.

Also, as you write in your FAQ entry:

  By default dpkg-buildpackage does active tasks such as cleaning via
  debian/rules, and makes sure that the dependencies from Build-Depends
  are satisfied as these are needed by the clean target. In addition the
  clean target can perform any kind of action that will affect the
  source package, which is also part of the build, and should be by
  default also reproducible

I think what you mean here is that one might have a source package
which is not a fixed point under `debian/rules clean' for all
reasonable combinations of the build-deps.  I think this is a buggy
package but in practice it seems that many packages are buggy in this
way.

Indeed IMO it is a defect of our overall design that it the concept of
a `non-reproducible source package' even exists.  Sources are the
input to builds, not an output, so the question of reproducing them
does not arise.  That our system as a whole can sometimes mutate the
source package all by itself is a bug.

However, these are not considerations for dgit in this context, since
what dgit uploads is always guaranteed to be equal to the input.
Often the user will have dgit use `git clean' rather than rules clean;
and even if they don't, dgit will check that the results were the
same.

That is, even with the .buildinfo, someone who gets the .dsc cannot
know whether the rules clean target is correct (or to put it another
way, under what conditions the source tree is a fixed point under
rules clean), because dgit has not necessarily run rules clean at all.
I'm sure there are other vcsish tools which have the same property.

(Also, and I hesitate to make this argument because of course I
support reproducible builds, but: if the .buildinfo is not useful,
then it's an unwarranted privacy violation.)

So I think for `dgit push-source', there should be no .buildinfo ?
At least, unless dgit ran the clean target.

This suggests to me that dgit push-source should use dpkg-source
rather than dpkg-buildpackage, as you note in later in the FAQ entry:

  If the intention is to just produce a source package instead of an
  actual build to upload, then using dpkg-source is always the better
  option.

This wording is a bit unclear.  It conflates `build' and `for upload'.
I think regarding `dgit push-source' as a build is perverse.

dgit would have to run dpkg-genchanges.

Alternatively dgit could strip out the .buildinfo, depending on
whether it ran rules clean.


Sean Whitton writes ("Re: What makes a .changes file source-only?"):
> Hello Guillem and Ian,
> > OOC, what would be the purpose of checking what is shipped on a binary
> > upload?
>
> I think Ian just saw the opportunity to add a new sanity check to dgit.

I'm not sure what I was thinking.  This now obviously seems a bad
idea.

> Ian: given that BY-HAND uploads could contain a lot of different kinds
> of file, maybe we should stick to the option of only checking purported
> source-only .changes files, abandoning this extra check?

Indeed.

Ian.

Reply | Threaded
Open this post in threaded view
|

source-only builds and .buildinfo

Ian Jackson-2
Hi.  I'm widening the scope of this thread because I think the
reproducible builds folks might have an opinion.  (Holger said on IRC
that they'd welcome a CC.)  So, I'm going to recap.


dpkg-buildpackage -S (which is the conventional way to build a
source-only upload) generates a .buildinfo file, which ends up
appearing in the .changes.  (I don't know what dak does with it.)

Sean tripped over this in the context of developing a new dgit
operation mode `dgit push-source'.  I found the generation of a
.buildinfo questionable so we asked debian-dpkg.  Guillem pointed me
to this FAQ entry:

  https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F 

As I wrote to Guillem, quoting the FAQ:

>   By default dpkg-buildpackage does active tasks such as cleaning via
>   debian/rules, and makes sure that the dependencies from Build-Depends
>   are satisfied as these are needed by the clean target. In addition the
>   clean target can perform any kind of action that will affect the
>   source package, which is also part of the build, and should be by
>   default also reproducible
>
> I think what you mean here is that one might have a source package
> which is not a fixed point under `debian/rules clean' for all
> reasonable combinations of the build-deps.  I think this is a buggy
> package but in practice it seems that many packages are buggy in this
> way.
>
> Indeed IMO it is a defect of our overall design that it the concept of
> a `non-reproducible source package' even exists.  Sources are the
> input to builds, not an output, so the question of reproducing them
> does not arise.  That our system as a whole can sometimes mutate the
> source package all by itself is a bug.
>
> However, these are not considerations for dgit in this context, since
> what dgit uploads is always guaranteed to be equal to the input.
> Often the user will have dgit use `git clean' rather than rules clean;
> and even if they don't, dgit will check that the results were the
> same.
>
> That is, even with the .buildinfo, someone who gets the .dsc cannot
> know whether the rules clean target is correct (or to put it another
> way, under what conditions the source tree is a fixed point under
> rules clean), because dgit has not necessarily run rules clean at all.
> I'm sure there are other vcsish tools which have the same property.
>
> (Also, and I hesitate to make this argument because of course I
> support reproducible builds, but: if the .buildinfo is not useful,
> then it's an unwarranted privacy violation.)
>
> So I think for `dgit push-source', there should be no .buildinfo ?
> At least, unless dgit ran the clean target.
>
> This suggests to me that dgit push-source should use dpkg-source
> rather than dpkg-buildpackage, as you note in later in the FAQ entry:
>
>   If the intention is to just produce a source package instead of an
>   actual build to upload, then using dpkg-source is always the better
>   option.
>
> This wording is a bit unclear.  It conflates `build' and `for upload'.
> I think regarding `dgit push-source' as a build is perverse.
>
> dgit would have to run dpkg-genchanges.
>
> Alternatively dgit could strip out the .buildinfo, depending on
> whether it ran rules clean.

What do you think ?

(The background here is that `dgit push-source' wants to verify for
itself that the .changes file it is uploading is really source-only.
Because of the possible presence of extraneous (eg BY-HAND) build
artefacts in .changes, Guillem suggested comparing the .changes to the
.dsc.  But of course the .changes contains not only the .dsc and the
files named in it, but also the .buildinfo.)

Thanks,
Ian.

Reply | Threaded
Open this post in threaded view
|

source-only builds and .buildinfo

Ian Jackson-2
In reply to this post by Ian Jackson-2
(Resending with the right CC for [hidden email])

Hi.  I'm widening the scope of this thread because I think the
reproducible builds folks might have an opinion.  (Holger said on IRC
that they'd welcome a CC.)  So, I'm going to recap.


dpkg-buildpackage -S (which is the conventional way to build a
source-only upload) generates a .buildinfo file, which ends up
appearing in the .changes.  (I don't know what dak does with it.)

Sean tripped over this in the context of developing a new dgit
operation mode `dgit push-source'.  I found the generation of a
.buildinfo questionable so we asked debian-dpkg.  Guillem pointed me
to this FAQ entry:

  https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F 

As I wrote to Guillem, quoting the FAQ:

>   By default dpkg-buildpackage does active tasks such as cleaning via
>   debian/rules, and makes sure that the dependencies from Build-Depends
>   are satisfied as these are needed by the clean target. In addition the
>   clean target can perform any kind of action that will affect the
>   source package, which is also part of the build, and should be by
>   default also reproducible
>
> I think what you mean here is that one might have a source package
> which is not a fixed point under `debian/rules clean' for all
> reasonable combinations of the build-deps.  I think this is a buggy
> package but in practice it seems that many packages are buggy in this
> way.
>
> Indeed IMO it is a defect of our overall design that it the concept of
> a `non-reproducible source package' even exists.  Sources are the
> input to builds, not an output, so the question of reproducing them
> does not arise.  That our system as a whole can sometimes mutate the
> source package all by itself is a bug.
>
> However, these are not considerations for dgit in this context, since
> what dgit uploads is always guaranteed to be equal to the input.
> Often the user will have dgit use `git clean' rather than rules clean;
> and even if they don't, dgit will check that the results were the
> same.
>
> That is, even with the .buildinfo, someone who gets the .dsc cannot
> know whether the rules clean target is correct (or to put it another
> way, under what conditions the source tree is a fixed point under
> rules clean), because dgit has not necessarily run rules clean at all.
> I'm sure there are other vcsish tools which have the same property.
>
> (Also, and I hesitate to make this argument because of course I
> support reproducible builds, but: if the .buildinfo is not useful,
> then it's an unwarranted privacy violation.)
>
> So I think for `dgit push-source', there should be no .buildinfo ?
> At least, unless dgit ran the clean target.
>
> This suggests to me that dgit push-source should use dpkg-source
> rather than dpkg-buildpackage, as you note in later in the FAQ entry:
>
>   If the intention is to just produce a source package instead of an
>   actual build to upload, then using dpkg-source is always the better
>   option.
>
> This wording is a bit unclear.  It conflates `build' and `for upload'.
> I think regarding `dgit push-source' as a build is perverse.
>
> dgit would have to run dpkg-genchanges.
>
> Alternatively dgit could strip out the .buildinfo, depending on
> whether it ran rules clean.

What do you think ?

(The background here is that `dgit push-source' wants to verify for
itself that the .changes file it is uploading is really source-only.
Because of the possible presence of extraneous (eg BY-HAND) build
artefacts in .changes, Guillem suggested comparing the .changes to the
.dsc.  But of course the .changes contains not only the .dsc and the
files named in it, but also the .buildinfo.)

Thanks,
Ian.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Sean Whitton
On Wed, May 24, 2017 at 11:59:55AM +0100, Ian Jackson wrote:

> > So I think for `dgit push-source', there should be no .buildinfo ?
> > At least, unless dgit ran the clean target.
> >
> > This suggests to me that dgit push-source should use dpkg-source
> > rather than dpkg-buildpackage, as you note in later in the FAQ entry:
> >
> >   If the intention is to just produce a source package instead of an
> >   actual build to upload, then using dpkg-source is always the better
> >   option.
> >
> > This wording is a bit unclear.  It conflates `build' and `for upload'.
> > I think regarding `dgit push-source' as a build is perverse.
> >
> > dgit would have to run dpkg-genchanges.
> >
> > Alternatively dgit could strip out the .buildinfo, depending on
> > whether it ran rules clean.
While a plain `dgit push-source` will prepare a fresh .dsc and .changes,
we also want it to work with -C, which allows the user to supply an
existing .dsc and .changes.  So even if we use dpkg-source and
dpkg-genchanges directly, we still need a validation function that says
whether a .changes is source-only.

Alternatively we could have dgit not accept -C with push-source.  This
would be to think of push-source as a command to /do/ a source-only
upload, rather than a variant on `dgit push` that /ensures/ a
source-only upload.  This is probably fine.

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ian Jackson-2
Sean Whitton writes ("Re: source-only builds and .buildinfo"):

> On Wed, May 24, 2017 at 11:59:55AM +0100, Ian Jackson wrote:
> > [Ian:]
> > > Alternatively dgit could strip out the .buildinfo, depending on
> > > whether it ran rules clean.
>
> While a plain `dgit push-source` will prepare a fresh .dsc and .changes,
> we also want it to work with -C, which allows the user to supply an
> existing .dsc and .changes.  So even if we use dpkg-source and
> dpkg-genchanges directly, we still need a validation function that says
> whether a .changes is source-only.

Ah, yes.

> Alternatively we could have dgit not accept -C with push-source.  This
> would be to think of push-source as a command to /do/ a source-only
> upload, rather than a variant on `dgit push` that /ensures/ a
> source-only upload.  This is probably fine.

(For others reading: -C is the dgit option to specify an existing
changes file.  Normally `dgit push-source' would generate one.)

I think that would be suboptimal, though.  If you say -C you should
get the .buildinfo that's in the .changes, I guess.  So that means
that dgit needs a validator, and it needs to accept .buildinfo at
least in this case.

I still think `dgit push-source' (without -C) probably shouldn't
include a buildinfo in the upload unless it ran (or caused
dpkg-buildpackage to run) `debian/rules clean'.

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ximin Luo-5
In reply to this post by Ian Jackson-2
Ian Jackson:

> [..]
>
>   https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F 
>
> As I wrote to Guillem, quoting the FAQ:
>
>>   By default dpkg-buildpackage does active tasks such as cleaning via
>>   debian/rules, and makes sure that the dependencies from Build-Depends
>>   are satisfied as these are needed by the clean target. In addition the
>>   clean target can perform any kind of action that will affect the
>>   source package, which is also part of the build, and should be by
>>   default also reproducible
>>
>> I think what you mean here is that one might have a source package
>> which is not a fixed point under `debian/rules clean' for all
>> reasonable combinations of the build-deps.  I think this is a buggy
>> package but in practice it seems that many packages are buggy in this
>> way.
>>
>> Indeed IMO it is a defect of our overall design that it the concept of
>> a `non-reproducible source package' even exists.  Sources are the
>> input to builds, not an output, so the question of reproducing them
>> does not arise.  That our system as a whole can sometimes mutate the
>> source package all by itself is a bug.
>>

I actually would like to see this fixed, then we can put source and binary hashes in /var/lib/dpkg/status for every binary package, then we can add these to .buildinfo files, which is more secure than adding the version number (as we do now).

I agree this is a separate issue, but I have some concrete suggestions that I could go into in another thread, if anyone is interested.

Also the man page for dpkg-buildpackage is out-of-date:

       6. Unless a source-only build has been requested, it runs the buildinfo hook and calls dpkg-genbuildinfo to generate a .buildinfo file.  Several dpkg-buildpackage options are forwarded to dpkg-genbuildinfo.

and also later:

              The current hook-name supported are:
              init preclean source build binary changes postclean check sign done

missing out "buildinfo", and indeed if I run "dpkg-buildpackage --hook-buildinfo=true" the buildinfo file still gets generated.

>> However, these are not considerations for dgit in this context, since
>> what dgit uploads is always guaranteed to be equal to the input.
>> Often the user will have dgit use `git clean' rather than rules clean;
>> and even if they don't, dgit will check that the results were the
>> same.
>>
>> That is, even with the .buildinfo, someone who gets the .dsc cannot
>> know whether the rules clean target is correct (or to put it another
>> way, under what conditions the source tree is a fixed point under
>> rules clean), because dgit has not necessarily run rules clean at all.
>> I'm sure there are other vcsish tools which have the same property.
>>
>> (Also, and I hesitate to make this argument because of course I
>> support reproducible builds, but: if the .buildinfo is not useful,
>> then it's an unwarranted privacy violation.)
>>
>> So I think for `dgit push-source', there should be no .buildinfo ?
>> At least, unless dgit ran the clean target.
>>
>> This suggests to me that dgit push-source should use dpkg-source
>> rather than dpkg-buildpackage, as you note in later in the FAQ entry:
>>
>>   If the intention is to just produce a source package instead of an
>>   actual build to upload, then using dpkg-source is always the better
>>   option.
>>
>> This wording is a bit unclear.  It conflates `build' and `for upload'.
>> I think regarding `dgit push-source' as a build is perverse.
>>
>> dgit would have to run dpkg-genchanges.
>>
>> Alternatively dgit could strip out the .buildinfo, depending on
>> whether it ran rules clean.
>
> What do you think ?
>
> (The background here is that `dgit push-source' wants to verify for
> itself that the .changes file it is uploading is really source-only.
> Because of the possible presence of extraneous (eg BY-HAND) build
> artefacts in .changes, Guillem suggested comparing the .changes to the
> .dsc.  But of course the .changes contains not only the .dsc and the
> files named in it, but also the .buildinfo.)
>

There are a few other options for you:

- Add a --no-buildinfo flag to dpkg-genchanges, then call dpkg-buildpackage --changes-option=--no-buildinfo
- Ignore the buildinfo entry in the .changes file.
- Verify that the buildinfo file contains only ".dsc" entries and that they match up with the ones in the changes file.

I'm actually not sure what your main problem is. Does dgit by default checkout a previously-build .dsc from git? And you are worried that if "dpkg-buildpackage -S" is run, causing "debian/rules clean" to be run, that the second-built .dsc would differ from the one that is checked in?

If this is the case, you have this problem regardless of whether the .changes file contains a .buildinfo file or not, these are two separate issues.

X

--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ian Jackson-2
Hi, Ximin.  Thanks for your attention.

Ximin Luo writes ("Re: source-only builds and .buildinfo"):
> Also the man page for dpkg-buildpackage is out-of-date:

I think maybe you should file a bug about these ?

> >> So I think for `dgit push-source', there should be no .buildinfo ?
> >> At least, unless dgit ran the clean target.
...

> >> Alternatively dgit could strip out the .buildinfo, depending on
> >> whether it ran rules clean.
> >
> > What do you think ?
> >
> > (The background here is that `dgit push-source' wants to verify for
> > itself that the .changes file it is uploading is really source-only.
> > Because of the possible presence of extraneous (eg BY-HAND) build
> > artefacts in .changes, Guillem suggested comparing the .changes to the
> > .dsc.  But of course the .changes contains not only the .dsc and the
> > files named in it, but also the .buildinfo.)
>
> There are a few other options for you:
>
> - Add a --no-buildinfo flag to dpkg-genchanges, then call dpkg-buildpackage --changes-option=--no-buildinfo

dgit would have to work around the lack of the flag anyway.

> - Ignore the buildinfo entry in the .changes file.
> - Verify that the buildinfo file contains only ".dsc" entries and that they match up with the ones in the changes file.

I did an experimental dpkg-buildpackage -S and I got a .buildinfo
containing the following fields:

  Format
  Source
  Binary
  Architecture
  Version
  Checksums-Md5
  Checksums-Sha1
  Checksums-Sha256
  Build-Origin
  Build-Architecture
  Build-Date
  Installed-Build-Depends
  Environment

Because of the weirdness with `debian/rules clean', it is logically
possible for things like the build depends and the environment to
affect the generated source package.

But, I'm not sure what this buildinfo means in the context of
reproducible builds.  Is it an assertion that if the b-deps etc. are
as specified, this source package will reproduce itself (ie, will be a
fixed point) ?

That doesn't seem very useful.  Sane build machinery which consumes
Debian sources will transport (and, if necessary, modify) those
sources without invoking them to regenerate themselves, so will not
mind source packges which are not a fixed point under
dpkg-buildpackage -S.  (By this definition of `sane' many of our
normal tools are not; but I think any tool that is trying to do build
reproduction must be sane by this definition because otherwise it will
be constantly tripping over buggy packages.)

And of course only pretty bad packages are not a fixed point with any
reasonable combination of the build-deps.  In practice bugs where the
package is simply broken will far outweigh situationns where rules
clean works properly only with certain versions of the depndencies.
Nothing normally actually verifies the fixed-point-ness.  So if the
.buildinfo is such an assertion, it will be a lie in any situation
where the information in it might be useful.

Finally in the context of dgit, the information seems even less likely
to be useful.  Much of the time the person generating the source
package will have avoided the use of rules clean at all.  In such a
situation the build-deps were not involved in generating the source
package.  And dgit does check that the .dsc being uploaded corresponds
to the source the maintainer intended; so with dgit a situation cannot
arise where what is Uploaded = S(Intended) != Intended (where S is the
transformation "unpack, run dpkg-buildpackage -S, grab out the
resulting source package").  With dgit, if S(Intended) != Intended,
either dgit will upload Intended, oblivious to the bug because it
never runs rules clean; or it will run rules clean, discover the
discrepancy, and bomb out.

> I'm actually not sure what your main problem is.

Well, we tripped over this anomaly while trying to decide what dgit
push-source should do.

dgit push-source definitely needs to verify that the .changes file it
is uploading is a source-only upload.  That is a safety catch against
unintended binaryful uploads (for example caused due to some
miscommunication in the stacks of build machinery, or the user
manually specifying the wrong .changes file).  That means dgit
push-source needs to account for every file in the .changes.

The obvious algorithm is to check that every file in the .changes is
either the .dsc itself, or named in the .dsc.  But we discover that
there's a .buildinfo there too.  So we need to decide what to do about
it.

Ignoring the .buildinfo seems like an easy workaround but 1. I don't
understand the implications 2. this seems like it's leaving a bug (the
.buildinfo generation) unfixed and unreported 3. the .buildinfo
contains information which ought not to be disseminated (and
published!) unless necessary (or at least, useful).

Particularly (3) means I'm leaning towards arranging for the
.buildinfo to be stripped out (or not generated).  But then I am
dismantling Chesterton's fence.

Is there a downside to having dgit make source-only uploads which do
not contain .buildinfo ?  Is, indeed, there any downside to having
dpkg-buildpackage not generate the .buildinfo in source-only builds ?

> Does dgit by default checkout a previously-build .dsc from git?

I'm not sure what you mean, but I think not.

dgit manipulates _source trees_ in git.  The .dsc is not represented
directly in git (and in general cannot be regenerated from git because
there may be missing origs etc., and also there may be deviations in
behaviour of tools like dpkg-source).  dgit may need to construct a
source package, in which case the intent is that in similar
circumstances dgit will produce .dscs which are semantically
equivalent.  I don't think it's necessary to generate an identical
.dsc, because actual builds take (or can take) a source directory tree
as input, not a .dsc; and an "appropriate" .dsc by this definition
implies the same source tree (which is a property dgit does check - at
least, as far as the local dpkg-source is concerned).

Regards
Ian.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Guillem Jover
In reply to this post by Ximin Luo-5
Hi!

[ Just a very quick reply, will go over the other mails during the week. ]

On Wed, 2017-05-24 at 13:58:00 +0000, Ximin Luo wrote:

> Also the man page for dpkg-buildpackage is out-of-date:
>
>        6. Unless a source-only build has been requested, it runs the
> buildinfo hook and calls dpkg-genbuildinfo to generate a .buildinfo
> file.  Several dpkg-buildpackage options are forwarded to dpkg-genbuildinfo.
>
> and also later:
>
>               The current hook-name supported are:
>               init preclean source build binary changes postclean check
>               sign done
>
> missing out "buildinfo", and indeed if I run "dpkg-buildpackage
> --hook-buildinfo=true" the buildinfo file still gets generated.

Yes, as I mentioned on
<https://lists.debian.org/debian-dpkg/2017/05/msg00024.html> this is
something I've noticed now several times, but forgot to fix. I did so
the other day and have queued it for a future 1.18.25 release.

Thanks,
Guillem

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Sean Whitton
In reply to this post by Ian Jackson-2
On Wed, May 24, 2017 at 09:47:09PM +0100, Ian Jackson wrote:

> dgit manipulates _source trees_ in git.  The .dsc is not represented
> directly in git (and in general cannot be regenerated from git because
> there may be missing origs etc., and also there may be deviations in
> behaviour of tools like dpkg-source).  dgit may need to construct a
> source package, in which case the intent is that in similar
> circumstances dgit will produce .dscs which are semantically
> equivalent.  I don't think it's necessary to generate an identical
> .dsc, because actual builds take (or can take) a source directory tree
> as input, not a .dsc; and an "appropriate" .dsc by this definition
> implies the same source tree (which is a property dgit does check - at
> least, as far as the local dpkg-source is concerned).
Not directly relevant, but there are some comments regarding 100%
reproducible builds of source packages in #756978.

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Sean Whitton
In reply to this post by Guillem Jover
Dear Guillem,

On Thu, May 25, 2017 at 03:03:51AM +0200, Guillem Jover wrote:
> [ Just a very quick reply, will go over the other mails during the week. ]

Have you had more time to think about this one?  I'd like to make
progress on my patch series to dgit, if possible.  Thanks.

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ian Jackson-2
Sean Whitton writes ("Re: source-only builds and .buildinfo"):
> Dear Guillem,
>
> On Thu, May 25, 2017 at 03:03:51AM +0200, Guillem Jover wrote:
> > [ Just a very quick reply, will go over the other mails during the week. ]
>
> Have you had more time to think about this one?  I'd like to make
> progress on my patch series to dgit, if possible.  Thanks.

In the absence of objections to my analysis, I suggest we proceed on
the following basis:

A .buildinfo file is not useful for a source-only upload which is
veried to be identical to the intended source as present in the
uploader's version control (eg, by the use of dgit).

Therefore, dgit should not include .buildinfos in source-only uploads
it performs.  If dgit sees that a lower-layer tool like
dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
should strip it out of .changes.

Ian.

--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Daniel Kahn Gillmor-3
Hi Ian--

On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
> A .buildinfo file is not useful for a source-only upload which is
> veried to be identical to the intended source as present in the
> uploader's version control (eg, by the use of dgit).
>
> Therefore, dgit should not include .buildinfos in source-only uploads
> it performs.  If dgit sees that a lower-layer tool like
> dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
> should strip it out of .changes.

I often do source-only uploads which include the .buildinfo.

I do source-only uploads because i don't want the binaries built on my
own personal infrastructure to reach the public.  But i want to upload
the .buildinfo because i want to provide a corroboration of what i
*expect* the buildds to produce.

why wouldn't dgit take the same approach?  stripping the .buildinfo from
the .changes seems like a wasted shot at a potential corroboration.
or am i misunderstanding the question here?

    --dkg

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only? [and 1 more messages]

Guillem Jover
In reply to this post by Ian Jackson-2
[ Sorry for the delay, got caught in other stuff, release, real life, etc,
  and the thread looked like requiring some non-small blocks of time. ]

Hi!

On Wed, 2017-05-24 at 11:33:15 +0100, Ian Jackson wrote:

> Guillem Jover writes ("Re: What makes a .changes file source-only?"):
> > Of course, in practice, if going just by the Architecture field, you
> > need to trust that the software generating the .changes (and the .dsc)
> > is not buggy, and the entity that commissioned its creation is not
> > trying to bypass the checks. But for BY-HAND artifacts that do not
> > follow the well defined name_version_arch.type filename form, then
> > this will not be represented in the Architecture field, which is
> > something that should probably be fixed by annotating the field with
> > some value (probably the host architecture to be conservative).
>
> I think this is a bug, then, in dpkg.  If Architecture is `source',
> then there shuld not be any by-hand artifacts.

Actually, nevermind, this might have just been a regression during the
1.18.x cycle, but starting with the 1.18.19 release, dpkg-genchanges
should not add any artifact that is not a source nor a buildinfo file
for source-only uploads.

> > Also, even though I could imagine someone injecting non-source artifacts
> > from within the debian/rules clean target even for source only builds,
> > I'd consider that to be just broken.
>
> It's rather weird that a source package build still looks at the
> debian/files.

The debian/files is just used to be able to communicate the .buildinfo
filename from dpkg-genbuildinfo to dpkg-genchanges.

> > For source-only, I think going by a whitelist is indeed more sensible,
> > but I'd just check whether there is a .dsc, and whether the rest of the
> > references in the .changes are just the files referenced in the .dsc.
>
> I like this suggestion.  Thanks.
>
> But:
>
> > > We observed that .buildinfo files are included in purportedly
> > > source-only changes files by `dpkg-buildpackage -S`.
> > >
> > > Is this correct?  Why are they included in source-only uploads?
> >
> > Yes, this is correct, although I've noticed again (as I did in the
> > past but seem to have forgotten) that the dpkg-buildpackage man page
> > is out-of-sync regarding this, which I'll be fixing. In any case the
> > other day I just added a FAQ entry, given that this seems a recurring
> > question. :)
> >
> >   <https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Why_are_.buildinfo_files_always_generated_with_dpkg-buildpackage.3F>
> >
> > If there's anything that does not seem to hold, or is unclear I'm happy
> > to clarify it further.
>
> Well, of course, your suggestion to check the .dsc against the
> .changes will trip over the .buildinfo.

Sorry for the confusing sentence, yes, I meant to say the .dsc and the
.buildinfo files here.

> Also, as you write in your FAQ entry:
>
>   By default dpkg-buildpackage does active tasks such as cleaning via
>   debian/rules, and makes sure that the dependencies from Build-Depends
>   are satisfied as these are needed by the clean target. In addition the
>   clean target can perform any kind of action that will affect the
>   source package, which is also part of the build, and should be by
>   default also reproducible

> I think what you mean here is that one might have a source package
> which is not a fixed point under `debian/rules clean' for all
> reasonable combinations of the build-deps.  I think this is a buggy
> package but in practice it seems that many packages are buggy in this
> way.

Most probably, we'd need to check specific instances I guess. In any
case that's one of the reasons for the .buildinfo file, so that you can
reproduce the source package from the specific set of Build-Depends.

Also, I don't remember where, but ISTR some of our documentation
recommending (or used to) that some update actions should be done in
the clean target (maybe that's old and no longer the case though).

> Indeed IMO it is a defect of our overall design that it the concept of
> a `non-reproducible source package' even exists.  Sources are the
> input to builds, not an output, so the question of reproducing them
> does not arise.  That our system as a whole can sometimes mutate the
> source package all by itself is a bug.

Actually I don't think that's entirely accurate, at least for dpkg PoV.
For non-native packages the input is the orig.tar(s) + the unpacked source
tree, for native packages the input is just the unpacked source tree.

In both cases the full Debian source packages is part of the output.

I do agree that having a non-declarative active actions when doing
source-only uploads is an issue. And it might have been better to
instead have a declarative one, that some dpkg-foo tool would use to
prepare the tree when building the source package.

> That is, even with the .buildinfo, someone who gets the .dsc cannot
> know whether the rules clean target is correct (or to put it another
> way, under what conditions the source tree is a fixed point under
> rules clean), because dgit has not necessarily run rules clean at all.
> I'm sure there are other vcsish tools which have the same property.
>
> (Also, and I hesitate to make this argument because of course I
> support reproducible builds, but: if the .buildinfo is not useful,
> then it's an unwarranted privacy violation.)

The .buildinfo file contains at most advisory information. If the
entity producing the upload is malicious then they have control over
its contents, so you'd need to build it and compare the only thing
you've possibly got in front of you, which are the actual artifacts.
If the entity producing the upload is not malicious it might still
be running a buggy toolchain, or on a broken/damaged hardware or
installation, etc. So no matter what, you can use the information
in the .buildinfo to ease tracking the environment supposedly used,
but you then really need to build the stuff to check if it matches.

> So I think for `dgit push-source', there should be no .buildinfo ?
> At least, unless dgit ran the clean target.

The .buildinfo file on source-only uploads serves several purposes,
one is for the reproducible source part, the other is to possibly
include references to binary packages built but not included in the
upload (f.ex. with «dpkg-buildpackage --changes-option=-S»).

Not including the .buildinfo file in all (new) source-only uploads
seems to me would make those then less uniform, and slightly more
difficult to try to attest if they have been tampered with, as there's
then no common advisory base-line for the environment it was built on.
Of course dgit uploads are marked as such, but you could concoct that.

> This suggests to me that dgit push-source should use dpkg-source
> rather than dpkg-buildpackage, as you note in later in the FAQ entry:
>
>   If the intention is to just produce a source package instead of an
>   actual build to upload, then using dpkg-source is always the better
>   option.
>
> This wording is a bit unclear.  It conflates `build' and `for upload'.
> I think regarding `dgit push-source' as a build is perverse.
>
> dgit would have to run dpkg-genchanges.

Hmm, I guess a problem might be with the overloaded meanings of build?

Of course you build a source with «dpkg-source --build», in the same
way you build a binary with «dpkg-deb --build», but doing “a build”
in my mind would be the equivalent of preparing a release, which might
include sources and/or binaries from «debian/rules binary» or similar,
from dpkg-buildpackage or an equivalent tool. And whether that is
intended as an upload would be determined by whether you have generated
the .changes file.

Does that clarify things? It's also very possible my mental model does
not match that of other people. :)

Anything that needs to produce a .changes file is preparing a possible
upload in my mind.

> Alternatively dgit could strip out the .buildinfo, depending on
> whether it ran rules clean.

I'm not sure why that would be desirable though?

Thanks,
Guillem

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Adrian Bunk-3
In reply to this post by Daniel Kahn Gillmor-3
On Tue, Jun 20, 2017 at 02:47:20PM -0400, Daniel Kahn Gillmor wrote:

> Hi Ian--
>
> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
> > A .buildinfo file is not useful for a source-only upload which is
> > veried to be identical to the intended source as present in the
> > uploader's version control (eg, by the use of dgit).
> >
> > Therefore, dgit should not include .buildinfos in source-only uploads
> > it performs.  If dgit sees that a lower-layer tool like
> > dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
> > should strip it out of .changes.
>
> I often do source-only uploads which include the .buildinfo.
>
> I do source-only uploads because i don't want the binaries built on my
> own personal infrastructure to reach the public.  But i want to upload
> the .buildinfo because i want to provide a corroboration of what i
> *expect* the buildds to produce.
>...

If you expect that, then your expectation is incorrect.

If you upload a package right now, chances are the buildds will use both
older versions of some packages [1] and more recent versions of some
other packages [2] than what you used.

>     --dkg

cu
Adrian

[1] buildd chroots are regenerated twice per week and not updated
    prior to each build
[2] some packages might already have been updated compared to what
    you used

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ximin Luo-5
Adrian Bunk:

> On Tue, Jun 20, 2017 at 02:47:20PM -0400, Daniel Kahn Gillmor wrote:
>> Hi Ian--
>>
>> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
>>> A .buildinfo file is not useful for a source-only upload which is
>>> veried to be identical to the intended source as present in the
>>> uploader's version control (eg, by the use of dgit).
>>>
>>> Therefore, dgit should not include .buildinfos in source-only uploads
>>> it performs.  If dgit sees that a lower-layer tool like
>>> dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
>>> should strip it out of .changes.
>>
>> I often do source-only uploads which include the .buildinfo.
>>
>> I do source-only uploads because i don't want the binaries built on my
>> own personal infrastructure to reach the public.  But i want to upload
>> the .buildinfo because i want to provide a corroboration of what i
>> *expect* the buildds to produce.
>> ...
>
> If you expect that, then your expectation is incorrect.
>
> If you upload a package right now, chances are the buildds will use both
> older versions of some packages [1] and more recent versions of some
> other packages [2] than what you used.
>

I think what dkg means here (and what we the R-B team has wanted for ages and is working towards), is not that the buildds use the *versioned dependencies* listed in the buildinfo, but produce the same *output hashes* as what's in the buildinfo.

The point being specifically that the dependencies used could change, but if the output remains constant, we're more assured that the build was done properly and reproducibly.

X

--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Holger Levsen-2
In reply to this post by Adrian Bunk-3
Hi,

trigger warning: nitpicking.

On Wed, Jun 21, 2017 at 11:34:17AM +0300, Adrian Bunk wrote:
> > I do source-only uploads because i don't want the binaries built on my
> > own personal infrastructure to reach the public.  But i want to upload
> > the .buildinfo because i want to provide a corroboration of what i
> > *expect* the buildds to produce.
> If you expect that, then your expectation is incorrect.
 
I actually think that dkg's expectation is right, "just" that reality is wrong.

The design of the Debian buildd network is from times when machines were much
less powerful than what we have today and it shows.

I'd rather have deterministic builds than the current unpredictable mess.


--
cheers,
        Holger

signature.asc (828 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Adrian Bunk-3
In reply to this post by Ximin Luo-5
On Wed, Jun 21, 2017 at 09:28:00AM +0000, Ximin Luo wrote:

> Adrian Bunk:
> > On Tue, Jun 20, 2017 at 02:47:20PM -0400, Daniel Kahn Gillmor wrote:
> >> Hi Ian--
> >>
> >> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
> >>> A .buildinfo file is not useful for a source-only upload which is
> >>> veried to be identical to the intended source as present in the
> >>> uploader's version control (eg, by the use of dgit).
> >>>
> >>> Therefore, dgit should not include .buildinfos in source-only uploads
> >>> it performs.  If dgit sees that a lower-layer tool like
> >>> dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
> >>> should strip it out of .changes.
> >>
> >> I often do source-only uploads which include the .buildinfo.
> >>
> >> I do source-only uploads because i don't want the binaries built on my
> >> own personal infrastructure to reach the public.  But i want to upload
> >> the .buildinfo because i want to provide a corroboration of what i
> >> *expect* the buildds to produce.
> >> ...
> >
> > If you expect that, then your expectation is incorrect.
> >
> > If you upload a package right now, chances are the buildds will use both
> > older versions of some packages [1] and more recent versions of some
> > other packages [2] than what you used.
> >
>
> I think what dkg means here (and what we the R-B team has wanted for ages and is working towards), is not that the buildds use the *versioned dependencies* listed in the buildinfo, but produce the same *output hashes* as what's in the buildinfo.
>
> The point being specifically that the dependencies used could change, but if the output remains constant, we're more assured that the build was done properly and reproducibly.

How is that supposed to work when the compiler is not exactly identical?

As an example, gcc-6 6.3.0-18 and gcc-6 6.3.0-19 will likely produce
different output for every non-trivial piece of software.

The reason is that every new gcc upload usually contains whatever
bugfixes are on the upstream branch.

> X

cu
Adrian

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

12