What makes a .changes file source-only?

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ximin Luo-5
Adrian Bunk:

> On Wed, Jun 21, 2017 at 09:28:00AM +0000, Ximin Luo wrote:
>> Adrian Bunk:
>>> On Tue, Jun 20, 2017 at 02:47:20PM -0400, Daniel Kahn Gillmor wrote:
>>>> Hi Ian--
>>>>
>>>> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
>>>>> A .buildinfo file is not useful for a source-only upload which is
>>>>> veried to be identical to the intended source as present in the
>>>>> uploader's version control (eg, by the use of dgit).
>>>>>
>>>>> Therefore, dgit should not include .buildinfos in source-only uploads
>>>>> it performs.  If dgit sees that a lower-layer tool like
>>>>> dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
>>>>> should strip it out of .changes.
>>>>
>>>> I often do source-only uploads which include the .buildinfo.
>>>>
>>>> I do source-only uploads because i don't want the binaries built on my
>>>> own personal infrastructure to reach the public.  But i want to upload
>>>> the .buildinfo because i want to provide a corroboration of what i
>>>> *expect* the buildds to produce.
>>>> ...
>>>
>>> If you expect that, then your expectation is incorrect.
>>>
>>> If you upload a package right now, chances are the buildds will use both
>>> older versions of some packages [1] and more recent versions of some
>>> other packages [2] than what you used.
>>>
>>
>> I think what dkg means here (and what we the R-B team has wanted for ages and is working towards), is not that the buildds use the *versioned dependencies* listed in the buildinfo, but produce the same *output hashes* as what's in the buildinfo.
>>
>> The point being specifically that the dependencies used could change, but if the output remains constant, we're more assured that the build was done properly and reproducibly.
>
> How is that supposed to work when the compiler is not exactly identical?
>
> As an example, gcc-6 6.3.0-18 and gcc-6 6.3.0-19 will likely produce
> different output for every non-trivial piece of software.
>
> The reason is that every new gcc upload usually contains whatever
> bugfixes are on the upstream branch.
>

It would depend on the situation which dependencies should be "irrelevant" towards the final output, right. If the dependencies are different and the buildinfo is different, it does not necessarily mean there is a problem, the upload does not need to be rejected. But it's a signal that other people (including the uploader) might want to re-try the build with the newer dependencies.

OTOH if the outputs match, we get more certainty, which is a good thing.

We also need to get some real data on this, it could be that a change from -18 to -19 would only affect a small number of packages, and most other ones would actually be compiled identically.

X

--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Adrian Bunk-3
On Wed, Jun 21, 2017 at 10:09:00AM +0000, Ximin Luo wrote:

> Adrian Bunk:
> > On Wed, Jun 21, 2017 at 09:28:00AM +0000, Ximin Luo wrote:
> >> Adrian Bunk:
> >>> On Tue, Jun 20, 2017 at 02:47:20PM -0400, Daniel Kahn Gillmor wrote:
> >>>> Hi Ian--
> >>>>
> >>>> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
> >>>>> A .buildinfo file is not useful for a source-only upload which is
> >>>>> veried to be identical to the intended source as present in the
> >>>>> uploader's version control (eg, by the use of dgit).
> >>>>>
> >>>>> Therefore, dgit should not include .buildinfos in source-only uploads
> >>>>> it performs.  If dgit sees that a lower-layer tool like
> >>>>> dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
> >>>>> should strip it out of .changes.
> >>>>
> >>>> I often do source-only uploads which include the .buildinfo.
> >>>>
> >>>> I do source-only uploads because i don't want the binaries built on my
> >>>> own personal infrastructure to reach the public.  But i want to upload
> >>>> the .buildinfo because i want to provide a corroboration of what i
> >>>> *expect* the buildds to produce.
> >>>> ...
> >>>
> >>> If you expect that, then your expectation is incorrect.
> >>>
> >>> If you upload a package right now, chances are the buildds will use both
> >>> older versions of some packages [1] and more recent versions of some
> >>> other packages [2] than what you used.
> >>>
> >>
> >> I think what dkg means here (and what we the R-B team has wanted for ages and is working towards), is not that the buildds use the *versioned dependencies* listed in the buildinfo, but produce the same *output hashes* as what's in the buildinfo.
> >>
> >> The point being specifically that the dependencies used could change, but if the output remains constant, we're more assured that the build was done properly and reproducibly.
> >
> > How is that supposed to work when the compiler is not exactly identical?
> >
> > As an example, gcc-6 6.3.0-18 and gcc-6 6.3.0-19 will likely produce
> > different output for every non-trivial piece of software.
> >
> > The reason is that every new gcc upload usually contains whatever
> > bugfixes are on the upstream branch.
> >
>
> It would depend on the situation which dependencies should be "irrelevant" towards the final output, right. If the dependencies are different and the buildinfo is different, it does not necessarily mean there is a problem, the upload does not need to be rejected. But it's a signal that other people (including the uploader) might want to re-try the build with the newer dependencies.
>
> OTOH if the outputs match, we get more certainty, which is a good thing.
>...

"more certainty" on what exactly?

"signal that other people might want to" is quite vague,
what do you want to prove and how exactly should people
spend time proving it?

In the best case [1] we would know that the buildd on the one
architecture that happens to be used by the person doing the
source upload produced the same binaries.

Once you start verifying that all binaries in the archive were built
from the sources in the archive, this will automatically be covered.

> X

cu
Adrian

[1] excluding the binary-all special case

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Adrian Bunk-3
In reply to this post by Holger Levsen-2
On Wed, Jun 21, 2017 at 09:30:14AM +0000, Holger Levsen wrote:

> Hi,
>
> trigger warning: nitpicking.
>
> On Wed, Jun 21, 2017 at 11:34:17AM +0300, Adrian Bunk wrote:
> > > I do source-only uploads because i don't want the binaries built on my
> > > own personal infrastructure to reach the public.  But i want to upload
> > > the .buildinfo because i want to provide a corroboration of what i
> > > *expect* the buildds to produce.
> > If you expect that, then your expectation is incorrect.
>  
> I actually think that dkg's expectation is right, "just" that reality is wrong.
>
> The design of the Debian buildd network is from times when machines were much
> less powerful than what we have today and it shows.
>
> I'd rather have deterministic builds than the current unpredictable mess.

I understand what you want, but using buildinfo is not a good idea here.

Based on how many broken binaries get uploaded from developers,
the environment of the person uploading the sources is not a good
basis for determining what package versions to install when building
on the buildds.

> cheers,
> Holger

cu
Adrian

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Holger Levsen-2
On Wed, Jun 21, 2017 at 02:16:00PM +0300, Adrian Bunk wrote:
> Based on how many broken binaries get uploaded from developers,

we should disallow binary uploads for everybody for all packages by default.
those porters who need it, should get that enabled for those packages where
they need it, when they need.


--
cheers,
        Holger

signature.asc (828 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ian Jackson-2
In reply to this post by Daniel Kahn Gillmor-3
Daniel Kahn Gillmor writes ("Re: source-only builds and .buildinfo"):

> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
> > A .buildinfo file is not useful for a source-only upload which is
> > veried to be identical to the intended source as present in the
> > uploader's version control (eg, by the use of dgit).
> >
> > Therefore, dgit should not include .buildinfos in source-only uploads
> > it performs.  If dgit sees that a lower-layer tool like
> > dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
> > should strip it out of .changes.
>
> I often do source-only uploads which include the .buildinfo.
>
> I do source-only uploads because i don't want the binaries built on my
> own personal infrastructure to reach the public.  But i want to upload
> the .buildinfo because i want to provide a corroboration of what i
> *expect* the buildds to produce.

This is an interesting use case which dgit should support.

But I think this is not what dgit push-source should do.  Sean's
proposed dgit push-source does not do any kind of binary package
build.  I think this is correct.  But this means there are no binaries
and nothing for the .buildinfo to talk about.

Do the "source-only uploads" that you are talking about mention the
hashes of these locally-built .debs in their .buildinfo, then ?

Certainly `dgit push' will not do anything to any .buildinfo you may
have.  I think maybe that your use case should be supported by having
a version of dgit push which drops the .debs from the .changes, but
leaves the .buildinfo ?  Is that how you construct these uploads now ?

(Also: is there anything right now that verifies your assertions about
the .debs?  Not that the lack of such a thing would make the
.buildinfos useless, but my experience is that without closing that
loop it is likely that the arrangements for generating the .buildinfo
are wrong somehow in a way we haven't spotted.)

> why wouldn't dgit take the same approach?  stripping the .buildinfo from
> the .changes seems like a wasted shot at a potential corroboration.
> or am i misunderstanding the question here?

I hope you're misunderstanding.  I'm open to being convinced I'm
wrong...

Regards,
Ian.

--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Daniel Kahn Gillmor-3
In reply to this post by Adrian Bunk-3
On Wed 2017-06-21 14:16:00 +0300, Adrian Bunk wrote:

> On Wed, Jun 21, 2017 at 09:30:14AM +0000, Holger Levsen wrote:
>> Hi,
>>
>> trigger warning: nitpicking.
>>
>> On Wed, Jun 21, 2017 at 11:34:17AM +0300, Adrian Bunk wrote:
>> > > I do source-only uploads because i don't want the binaries built on my
>> > > own personal infrastructure to reach the public.  But i want to upload
>> > > the .buildinfo because i want to provide a corroboration of what i
>> > > *expect* the buildds to produce.
>> > If you expect that, then your expectation is incorrect.
>>  
>> I actually think that dkg's expectation is right, "just" that reality is wrong.
>>
>> The design of the Debian buildd network is from times when machines were much
>> less powerful than what we have today and it shows.
>>
>> I'd rather have deterministic builds than the current unpredictable mess.
>
> I understand what you want, but using buildinfo is not a good idea here.
>
> Based on how many broken binaries get uploaded from developers,
> the environment of the person uploading the sources is not a good
> basis for determining what package versions to install when building
> on the buildds.
lest there be any misunderstanding: i am *not* suggesting that i want
the build daemons to select their packages based on what's in my
.buildinfo.  Ximin's interpretation of my intent is the correct one: i
want to see whether we manage to reproduce the same output.

if the binary package outputs differ, and the installed build-deps
differ, fine.  that's data that someone tracking how things are built
can use in a future analysis.  if the binar package outputs do *not*
differ, and the build-deps differ, that's also interesting information.

my goal here isn't to use the build daemons as the r-b infrastructure --
we've already got the r-b infrastructure for that. :)  But i'm happy to
be able to see some corroborative (or anti-corroborative) .buildinfos
published so that people who want to analyze them can do so.

          --dkg

PS I fully agree that the right outcome for debian overall is to not
   allow binary uploads from anyone, unless they're granted special
   dispensation (e.g. toolchain porters), but that's getting far afield
   from this thread.

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Daniel Kahn Gillmor-3
In reply to this post by Ian Jackson-2
On Wed 2017-06-21 13:38:42 +0100, Ian Jackson wrote:
> This is an interesting use case which dgit should support.

cool!

> But I think this is not what dgit push-source should do.  Sean's
> proposed dgit push-source does not do any kind of binary package
> build.  I think this is correct.  But this means there are no binaries
> and nothing for the .buildinfo to talk about.

I don't know anything about "dgit push-source", so i defer to you on that.

> Do the "source-only uploads" that you are talking about mention the
> hashes of these locally-built .debs in their .buildinfo, then ?

yes.  when building foo version 1.2-3, the .changes file mentions only:

  - foo_1.2-3.dsc
  - foo_1.2.orig.tar.bz2
  - foo_1.2-3.debian.tar.bz2
  - foo_1.2-3_amd64.buildinfo

and the buildinfo file mentions:

  - foo_1.2-3.dsc
  - libfoo_1.2-3.deb
  - foo-tools_1.2-3.deb

I *do not* upload any of the *.deb files to the archive.

> Certainly `dgit push' will not do anything to any .buildinfo you may
> have.  I think maybe that your use case should be supported by having
> a version of dgit push which drops the .debs from the .changes, but
> leaves the .buildinfo ?  Is that how you construct these uploads now ?

I really don't have to do anything manually.  The standard
dpkg-buildpackage toolchain does it for me if i pass
--changes-option=-S  -- it all works as i expect, and kudos to the dpkg
developers for that :)

> (Also: is there anything right now that verifies your assertions about
> the .debs?  Not that the lack of such a thing would make the
> .buildinfos useless, but my experience is that without closing that
> loop it is likely that the arrangements for generating the .buildinfo
> are wrong somehow in a way we haven't spotted.)

In a standard upload of the type i'm describing i've asserted:

 a) I built version 1.2-3 on amd64, and it should be included in debian

 b) here are the digests of the source code (including debian packaging)

 c) given this explicit set of build dependencies, here are the digests
    of the binary packages that were produced on my system.

You say "verify my assertions about the .debs", i think you're talking
about part (c), but there's nothing specifically to verify there.  I'm
saying to the world what *i* found when i built them.  You want to tell
me you found something different?  fine!  Now we have something to
investigate.  You found the same thing?  Great, but that's a
corroboration, not a verification.

I agree with you that it'd be nice in the future to "close the loop" by
having infrastructure that monitors all of these developer-generated
.buildinfo files, compares them to the buildd-generated .buildinfo
files, and provides some sort of interface for easy reasoning about
them.  and such infrastructure could well show that something is wrong
with how we're generating .buildinfo files; that's fine, we'd then
modify how we generate buildinfo files going forward to correct it, if
necessary.

Imagine a fancy console that a debian developer could pull up which
shows a list of binary packages they submitted which differ from the one
being shipped by the archive, and which build-dependencies it noticed
were different (or, just shows a green light if it's the case all of
their current uploads have been corroboratively rebuilt). cool, eh?

Or some future, stricter debian variant might even want to only allow a
package to enter the archive if the binary packages created by the
buildd of the submitted architecture match the binaries claimed by the
submitting developer (i'm *not* proposing this for debian today.  it
could introduce hassle and delay because of the concerns about build-dep
synchronization that Adrian raises, and we don't have the workflow for
it smooth enough yet).

But i don't think that we need to officially "close the loop" in any
fancy (or strict) way to warrant shipping .buildinfo files from
developers.  The fancy console i propose above (or anything like it) can
only be built and used across the archive once we have shipped the
.buildinfo files.  Unnecessarily stripping .buildinfo files that we know
about only delays that process.

Regards,

           --dkg

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Vagrant Cascadian-4
In reply to this post by Ian Jackson-2
On 2017-06-21, Ian Jackson wrote:

> Daniel Kahn Gillmor writes ("Re: source-only builds and .buildinfo"):
>> On Tue 2017-06-20 18:10:49 +0100, Ian Jackson wrote:
>> > A .buildinfo file is not useful for a source-only upload which is
>> > veried to be identical to the intended source as present in the
>> > uploader's version control (eg, by the use of dgit).
>> >
>> > Therefore, dgit should not include .buildinfos in source-only uploads
>> > it performs.  If dgit sees that a lower-layer tool like
>> > dpkg-buildpackage provided a .buildinfo for a source-only upload, dgit
>> > should strip it out of .changes.
>>
>> I often do source-only uploads which include the .buildinfo.
>>
>> I do source-only uploads because i don't want the binaries built on my
>> own personal infrastructure to reach the public.  But i want to upload
>> the .buildinfo because i want to provide a corroboration of what i
>> *expect* the buildds to produce.
>
> This is an interesting use case which dgit should support.
Agreed!


> But I think this is not what dgit push-source should do.  Sean's
> proposed dgit push-source does not do any kind of binary package
> build.  I think this is correct.  But this means there are no binaries
> and nothing for the .buildinfo to talk about.

Yes, this makes sense for the most part.


> Do the "source-only uploads" that you are talking about mention the
> hashes of these locally-built .debs in their .buildinfo, then ?

That's the goal, sure.

I've done this with all my recent source-only uploads, and then gone
back and verified that the buildd machines produced (in most cases), the
same hashes for the .deb files.

For example, this references the buildinfo of simple-cdd 0.6.5 I
uploaded with a source-only changes file in:

  https://buildinfo.debian.net/30f7000b0025b570c7ae2202fc6fd79e4ca27798/simple-cdd_0.6.5_all

And this is a buildinfo produced over a month later on the reproducible
builds build network, on a different architecture (i386), with a
different build environment, that produced the same hashes:

  https://buildinfo.debian.net/1d300b71445ac7d756e93546a7e6b36d3c1882c7/simple-cdd_0.6.5_all

And you can check the .buildinfo in the build logs on the buildd
produced the same sha1 hashes:

  https://buildd.debian.org/status/fetch.php?pkg=simple-cdd&arch=all&ver=0.6.5&stamp=1494884527&raw=0

And then you can compare the hashes of simple-cdd packages in the
archive are the same hashes listed.

Given that at least three machines, of differing architecture, with over
a month between the packages in the build toolchain, produced the same
binary packages... I have *some* confidence that this package is
reproducible.

It's not the most complicated package, but it demonstrates that it is
now possible, for a reasonable portion of the archive, to at least
manually verify many of the builds. Some of this could be automated...


> Certainly `dgit push' will not do anything to any .buildinfo you may
> have.  I think maybe that your use case should be supported by having
> a version of dgit push which drops the .debs from the .changes, but
> leaves the .buildinfo ?  Is that how you construct these uploads now ?

I use sbuild's --source-only-changes option, which creates two .changes
files, one with the debs (ARCH.changes), and one
without(source.changes). In both cases, the .buildinfo referenced in
.changes includes hashes of the .deb files.


> (Also: is there anything right now that verifies your assertions about
> the .debs?  Not that the lack of such a thing would make the
> .buildinfos useless, but my experience is that without closing that
> loop it is likely that the arrangements for generating the .buildinfo
> are wrong somehow in a way we haven't spotted.)

There's nothing corroborating the results of .deb files in the archive
against tests.reproducible-builds.org build results, but that does
rebuild all packages in the archive with permutations of the build
environment, and logs when they aren't reproducible.

The archive is keeping the .buildinfo files uploaded with packages,
though they aren't, to my knowledge, exposed yet. But it would allow for
retroactive verification of said packages once the .buildinfo files are
available. A few relevent bugs on ftp.debian.org regarding this:

  https://bugs.debian.org/763822
  https://bugs.debian.org/862073
  https://bugs.debian.org/862538
  https://bugs.debian.org/863470


live well,
  vagrant

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ian Jackson-2
In reply to this post by Daniel Kahn Gillmor-3
Daniel Kahn Gillmor writes ("Re: source-only builds and .buildinfo"):

> On Wed 2017-06-21 13:38:42 +0100, Ian Jackson wrote:
> > Certainly `dgit push' will not do anything to any .buildinfo you may
> > have.  I think maybe that your use case should be supported by having
> > a version of dgit push which drops the .debs from the .changes, but
> > leaves the .buildinfo ?  Is that how you construct these uploads now ?
>
> I really don't have to do anything manually.  The standard
> dpkg-buildpackage toolchain does it for me if i pass
> --changes-option=-S  -- it all works as i expect, and kudos to the dpkg
> developers for that :)

Then I think `dgit ... sbuild ...' (a binaryful build) followed by
`dgit --ch:-S push' (a binaryless upload) will probably do the same
thing.

Definitely in this case, dgit ought not to mess with the .buildinfo.
(Ie I think it will be included in the .changes, and dgit ought to
leave it there.)

>  c) given this explicit set of build dependencies, here are the digests
>     of the binary packages that were produced on my system.
>
> You say "verify my assertions about the .debs", i think you're talking
> about part (c), but there's nothing specifically to verify there.  I'm
> saying to the world what *i* found when i built them.  You want to tell
> me you found something different?  fine!  Now we have something to
> investigate.  You found the same thing?  Great, but that's a
> corroboration, not a verification.

Well, (c) is only useful if the build "is" reproducible.  (That is,
"is reproducible in some plausible scenario".)

> But i don't think that we need to officially "close the loop" in any
> fancy (or strict) way to warrant shipping .buildinfo files from
> developers.  The fancy console i propose above (or anything like it) can
> only be built and used across the archive once we have shipped the
> .buildinfo files.  Unnecessarily stripping .buildinfo files that we know
> about only delays that process.

My comments here are more of an aside.  I'm certainly not suggesting
that theis line of reasoning suggests any .buildinfos should be
stripped; merely that if I were you I would want to see about closing
this loop so because right now you are perhaps generating .buildinfos
which are going to be difficult to use this way in the future.

If some routine consumer of these .buildinfos comes into being, then
it would definitely be a good idea for dgit to gain convenient and
meaningful option(s) to generate such uploads.  More convenient than
`--ch:-S' (which is using an escape hatch, and hence undesirable for
routine use).


However, `dgit push-source' is a different case.  That is a command
where the dgit user asks dgit to upload their package source code to
Debian, but without doing any kind of binary package build at all.

(Probably the user has done some kind of pre-upload test to check that
the source does generate working binaries, but perhaps of a source
package with a different version in the changelog, or something.)

In that case, dpkg-buildpackage currently does still generate a
.buildinfo.  That .buildinfo does not contain any information about
binary package builds - since there were no binary package builds.

Nor is the build-dependency information in the .buildinfo particularly
useful even for figuring out in what circumstances the uploader was
able to successfully run `debian/rules clean'.  The experienced [d]git
user will probably be cleaning their working trees with git clean, not
rules clean.  And, regardless, even if the uploader did run rules
clean, this has no bearing on the source package that gets uploaded,
since dgit verifies that the source package is identical to the
uploader's git tree.


Part of the confusion in this thread is, I think, due to the
overloading of the term "source-only upload" for your hybrid upload
which did _build_ binaries, and describes them in the .buildinfo, but
does not actually _ship_ them.

This is a very useful concept but I suggest you give it a new name.
"binaries-attested upload" perhaps ?

To me "source-only upload" means that there were no binaries built,
and therefore no information about binaries included in the upload.


Regards,
Ian.

Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only? [and 1 more messages]

Ian Jackson-2
In reply to this post by Guillem Jover
Guillem Jover writes ("Re: What makes a .changes file source-only? [and 1 more messages]"):

> On Wed, 2017-05-24 at 11:33:15 +0100, Ian Jackson wrote:
> > Indeed IMO it is a defect of our overall design that it the concept of
> > a `non-reproducible source package' even exists.  Sources are the
> > input to builds, not an output, so the question of reproducing them
> > does not arise.  That our system as a whole can sometimes mutate the
> > source package all by itself is a bug.
>
> Actually I don't think that's entirely accurate, at least for dpkg PoV.
> For non-native packages the input is the orig.tar(s) + the unpacked source
> tree, for native packages the input is just the unpacked source tree.
>
> In both cases the full Debian source packages is part of the output.

When I said "Sources are the input to builds, not an output" I was
making a general assertion about fundamentals of software engineering.

That "sources are the input to builds" is part of the definition of
"sources".  Another part of the definition is that the source can be
transported conveniently from one place to another; and another is
that it is human-editable.

Insofar as any ecosystem or tool lacks something which
  * is the primary input to builds
  * can be easily transported
  * is human-editable (with appropriate tools)
then that ecoysystem or tool is wrongheaded because software using
it does not properly has source code.  (Examples might include
some kinds of persistent VM systems, ancient and modern.)

In the Debian context, the source code can only mean the source
package.  The source package meets all of the above.  The form you
propose is not readily transportable.  (Indeed there are directory
trees which are unrepresentable as source packages.)

Insofar as dpkg's behaviour is incompatible with this view, dpkg is
buggy.  As the original designer I don't think, however, that dpkg's
fundamental design is at variance with the above principles.


> > I think what you mean here is that one might have a source package
> > which is not a fixed point under `debian/rules clean' for all
> > reasonable combinations of the build-deps.  I think this is a buggy
> > package but in practice it seems that many packages are buggy in this
> > way.
>
> Most probably, we'd need to check specific instances I guess. In any
> case that's one of the reasons for the .buildinfo file, so that you can
> reproduce the source package from the specific set of Build-Depends.

As I say, I think this is completely wrongheaded.

No-one cares whethere a source package "can be reproduced".

Any tools which care about reproducibility (and indeed, almost all of
our actual build infrastructure etc.) treat the source package as the
input.

Certainly, actual source packages uploaded by dgit are affected by any
infelicities that may exist in the package clean target.

> > So I think for `dgit push-source', there should be no .buildinfo ?
> > At least, unless dgit ran the clean target.
>
> The .buildinfo file on source-only uploads serves several purposes,
> one is for the reproducible source part, the other is to possibly
> include references to binary packages built but not included in the
> upload (f.ex. with «dpkg-buildpackage --changes-option=-S»).

I am only talking about truly source-only uploads, where no binary
packages were generated.  See my responses to dkg.

> Not including the .buildinfo file in all (new) source-only uploads
> seems to me would make those then less uniform, and slightly more
> difficult to try to attest if they have been tampered with, as there's
> then no common advisory base-line for the environment it was built on.

Source packages are not built from objects that anyone other than the
uploader has.  So it does not make sense to talk of anyone "verifying"
whether the source package was "tampered" with as it was "built".

In Debian, source packages are transported unmodified by all of our
central software including the archive and buildds.

A .buildinfo is of no help against surreptitious modification of the
source on an uploader's machine since such modifications would be to
the package source code and the .buildinfo would not help detect them.
The defences are publication, review, debdiff, etc.

> > This suggests to me that dgit push-source should use dpkg-source
> > rather than dpkg-buildpackage, as you note in later in the FAQ entry:
> >
> >   If the intention is to just produce a source package instead of an
> >   actual build to upload, then using dpkg-source is always the better
> >   option.
> >
> > This wording is a bit unclear.  It conflates `build' and `for upload'.
> > I think regarding `dgit push-source' as a build is perverse.
> >
> > dgit would have to run dpkg-genchanges.
>
> Hmm, I guess a problem might be with the overloaded meanings of build?
>
> Of course you build a source with «dpkg-source --build», in the same
> way you build a binary with «dpkg-deb --build», but doing “a build”
> in my mind would be the equivalent of preparing a release, which might
> include sources and/or binaries from «debian/rules binary» or similar,
> from dpkg-buildpackage or an equivalent tool. And whether that is
> intended as an upload would be determined by whether you have generated
> the .changes file.

Yes.  In my quote above I meant `build' in the second sense.  I read
`build' in your FAQ entry the same way.

My point is that uploads might not involve builds{2}.  Specifically, a
pure source-only upload does not build any binaries.  It is a copy
operation (with significant semantics associated with the destination
of the copy).

So it is wrong to oppose
   just produce a source package
with
   an actual build to upload

A pure source-only upload is "produce a source package, for upload,
and upload it".  It is neither of the above, because it is not "just
produce a source package" (the package gets uploaded) and it is not
"an actual build" (since nothing is built{2}).

> > Alternatively dgit could strip out the .buildinfo, depending on
> > whether it ran rules clean.
>
> I'm not sure why that would be desirable though?

For dgit push-source, the .buildinfo at the very least misleading
(since it may well contain information about the developer's dirty
working environment, rather than any clean environment they use for
binary builds - if any) and useless (no-one benefits from it).

It is also a privacy leak.

And, dgit push-source wants to check that what it is uploading is
actually a binaryless upload.  Therefore it needs to check that
binaries are not include; therefore it needs to iterate over the
.changes, and explicitly make a decision about .buildinfo.

My current view is that rejecting the upload would have been correct,
except that it is necessary to work around the bug that dpkg-source
and dpkg-genchanges generate a buildinfo when there has been no
build{2}.

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: What makes a .changes file source-only? [and 1 more messages]

Ian Jackson-2
Ian Jackson writes ("Re: What makes a .changes file source-only? [and 1 more messages]"):
> Certainly, actual source packages uploaded by dgit are affected by any
                                                         ^ NOT
> infelicities that may exist in the package clean target.

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Daniel Kahn Gillmor-3
In reply to this post by Ian Jackson-2
On Wed 2017-06-21 15:42:07 +0100, Ian Jackson wrote:
> This is a very useful concept but I suggest you give it a new name.
> "binaries-attested upload" perhaps ?

I like the idea that we should name this thing, but i'd call it
something like a "source-only upload with .buildinfo" or
"source+buildinfo upload" instead.

> To me "source-only upload" means that there were no binaries built,
> and therefore no information about binaries included in the upload.

i tend to think "source-only" in this phrase applies to "upload",
meaning that the upload doesn't include binaries, and what i'm uploading
doesn't include binaries.  i acknowledge that it also includes some
stuff that isn't actually sources, but this is true of normal
"source-only" uploads too -- for example, such uploads include
cryptographic signatures and selected elements of the changelogs, which
are also not sources.

</bikeshed>

☺,

        --dkg

signature.asc (847 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ximin Luo-5
In reply to this post by Adrian Bunk-3
Adrian Bunk:

> On Wed, Jun 21, 2017 at 10:09:00AM +0000, Ximin Luo wrote:
>> Adrian Bunk:
>>> [..]
>>>
>>> How is that supposed to work when the compiler is not exactly identical?
>>>
>>> As an example, gcc-6 6.3.0-18 and gcc-6 6.3.0-19 will likely produce
>>> different output for every non-trivial piece of software.
>>>
>>> The reason is that every new gcc upload usually contains whatever
>>> bugfixes are on the upstream branch.
>>>
>>
>> It would depend on the situation which dependencies should be "irrelevant" towards the final output, right. If the dependencies are different and the buildinfo is different, it does not necessarily mean there is a problem, the upload does not need to be rejected. But it's a signal that other people (including the uploader) might want to re-try the build with the newer dependencies.
>>
>> OTOH if the outputs match, we get more certainty, which is a good thing.
>> ...
>
> "more certainty" on what exactly?
>

More certainty that the binaries produced, were actually produced from the source code, rather than by malicious or compromised machines.

> "signal that other people might want to" is quite vague,
> what do you want to prove and how exactly should people
> spend time proving it?
>

That the binaries uploaded were actually produced from the source code. People spend time proving it by running the build against and seeing if the binaries match, possibly also recreating various aspects of previous build environments recorded in other .buildinfo files.

> In the best case [excluding the binary-all special case] we would know that the buildd on the one
> architecture that happens to be used by the person doing the
> source upload produced the same binaries.
>
> Once you start verifying that all binaries in the archive were built
> from the sources in the archive, this will automatically be covered.
>

What we'd like to aim for, is to give users some security guarantee *independent of the distributor i.e. Debian or DD uploaders* that the binaries they're using is actually produced from the source code.

One way to give security that is independent of third parties, is to provide some sort of mathematically-verifiable proof. However the world isn't at that stage yet for compiler technology.

Buildinfo files are more like claims rather than proofs. Whilst it can be used as a proof, i.e. by running the build yourself, this is an expensive process which we can't expect most users to do, and doesn't really fit the idea of a "proof" in a security system, which are supposed to be low-cost for verifiers.

For users that can't directly verify everything that they themselves run, one "next best thing" they can do is to check that different parties that they trust - or many parties that they don't trust, that they nevertheless believe are probably not all colluding to attack them - claimed to have performed the build or verified each others' proofs.

So, the more buildinfo files we have, from different parties (DDs, the Debian archive, etc) the better this is for users, because they have more sources of claims. How much they "trust" each individual source, is indeed not something that is concretely measurable and no existing security system tries to model this more precisely unfortunately; however I think we can all agree that "more is better" here.

Therefore, there is still value in using DDs' uploaded buildinfo files, even if the buildds are "likely" to use different dependencies and "likely" to produce different binaries. If they have identical output, great, they get a nice green tick somewhere. If not, people can run the builds again to try to get the identical output. And some builds are indeed not reproducible today, and these indicate bugs rather than builders being compromised.

Besides, I think "non-identical builds due to changed dependencies" won't actually be so likely in practice. For example GCC-6 -18 was there for 3-4 weeks, and plenty of uploads happened during that time. Most DDs would update, build and upload within several minutes or hours of each other.

X


--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Adrian Bunk-3
On Thu, Jun 22, 2017 at 08:26:00AM +0000, Ximin Luo wrote:
>...
> One way to give security that is independent of third parties, is to provide some sort of mathematically-verifiable proof. However the world isn't at that stage yet for compiler technology.

What changes in compiler technology are you hoping for?

The main reason for fixing optimizer bugs in the compiler is to get
different (no longer buggy) output.

>...
> For users that can't directly verify everything that they themselves run, one "next best thing" they can do is to check that different parties that they trust - or many parties that they don't trust, that they nevertheless believe are probably not all colluding to attack them - claimed to have performed the build or verified each others' proofs.
>
> So, the more buildinfo files we have, from different parties (DDs, the Debian archive, etc) the better this is for users, because they have more sources of claims. How much they "trust" each individual source, is indeed not something that is concretely measurable and no existing security system tries to model this more precisely unfortunately; however I think we can all agree that "more is better" here.
>...

I don't see how more random information is helpful for users.

One or more trusted instances verifying that all packages in a release
were built from their sources is the information that would be useful
for users.

For some users it would also be important to be able to verify this for
the whole archive themselves.

> X

cu
Adrian

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Ximin Luo-5
Adrian Bunk:

> On Thu, Jun 22, 2017 at 08:26:00AM +0000, Ximin Luo wrote:
>> ...
>> One way to give security that is independent of third parties, is to provide some sort of mathematically-verifiable proof. However the world isn't at that stage yet for compiler technology.
>
> What changes in compiler technology are you hoping for?
>
> The main reason for fixing optimizer bugs in the compiler is to get
> different (no longer buggy) output.
>
>> ...
>> For users that can't directly verify everything that they themselves run, one "next best thing" they can do is to check that different parties that they trust - or many parties that they don't trust, that they nevertheless believe are probably not all colluding to attack them - claimed to have performed the build or verified each others' proofs.
>>
>> So, the more buildinfo files we have, from different parties (DDs, the Debian archive, etc) the better this is for users, because they have more sources of claims. How much they "trust" each individual source, is indeed not something that is concretely measurable and no existing security system tries to model this more precisely unfortunately; however I think we can all agree that "more is better" here.
>> ...
>
> I don't see how more random information is helpful for users.
>
> One or more trusted instances verifying that all packages in a release
> were built from their sources is the information that would be useful
> for users.
>

Different users can choose who they want to trust. A DD signing a buildinfo and uploading this to the archive, is not "random information". Some users would be happy to trust 1 DD plus the buildd, but not either one individually; other users would want other third-party builders to re-perform the build and sign their own buildinfo files.

The point is that making the information available gives more choice for users. If specific users don't trust a DD, they can ignore this extra information. But if we don't provide this information, we're preventing people from getting assurance about the software they're running.

BTW this sort of trust-system I'm suggesting is not like the CA system where 1 trusted party can break your security. Instead, here all/most trusted parties would have to collude to publish bad buildinfo files, to break your security. The security dynamics, is closer to bitcoin and other blockchain tech. There are certain nuances to be made when doing the security logic, for example someone could sign 100 bad buildinfo files pretending to be from different people; but I think the success of blockchain tech shows that there is some demand from users to have these AND-trust systems where many trusted sources, even from "random strangers", can help make the system stronger, as opposed to OR-trust systems where 1 CA can go MITM everyone.

> For some users it would also be important to be able to verify this for
> the whole archive themselves.
>

Yeah, we agree on this point. For many other users, who don't have the resources to rebuild everything, it's equally important to be able to see that {whatever they choose} other people have claimed to have done it.

X

--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git

Reply | Threaded
Open this post in threaded view
|

Re: source-only builds and .buildinfo

Sean Whitton
In reply to this post by Ian Jackson-2
Hello,

Thank you all for some interesting reading in this thread.

On Wed, Jun 21, 2017 at 03:42:07PM +0100, Ian Jackson wrote:
> However, `dgit push-source' is a different case.  That is a command
> where the dgit user asks dgit to upload their package source code to
> Debian, but without doing any kind of binary package build at all.
>
> (Probably the user has done some kind of pre-upload test to check that
> the source does generate working binaries, but **perhaps of a source
> package with a different version in the changelog, or something.)**

FWIW, this is precisely where I expect to be using `dgit build-source`
myself (though it wasn't me who came up with the feature).

`dgit sbuild`, test the binaries, `dch -r`, `dgit push-source` -- no
need to do a binary build to test the `dch -r`.

--
Sean Whitton

signature.asc (849 bytes) Download Attachment
12