Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

classic Classic list List threaded Threaded
64 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Russ Allbery-2
There's been a lot of discussion of this, but it seems to have been fairly
inconclusive.  We need to decide what we're doing, if anything, for wheezy
fairly soon, so I think we need to try to drive this discussion to some
concrete conclusions.

First, Steve's point here is very good:

Steve Langasek <[hidden email]> writes:

> I guess we're looking at the same data, yet we seem to have reached
> opposite conclusions.

>  - Riku reports that 33 out of 82k files have different compression when
>    using current gzip vs. 10-year-old gzip.  I'd be surprised if any of
>    those binary packages hadn't been superseded long ago.  It's not a
>    guarantee, but I think the risks, and ultimate cost, of relying on gzip
>    output to not change often and to just do sourceful rebuilds when it
>    isn't are a lot smaller than if we go about manually splitting our
>    packages further.

>  - The cases where gzip output has been reported to not be reproducible
>    seem to all boil down to a single issue with gzip being passed
>    different arguments due to the unreproducible nature of *find*'s
>    output.  A patch has been made available already on the bug, and this
>    patch seems to address the instances of the problem that we've hit so
>    far in the Ubuntu archive.

> Now, it's worth following up with gzip upstream about our concerns, but
> even without that, I just don't see this being problematic.

It isn't the end of the world if we have some conflicts provided that we
can detect them and can do something consistent to fix them.  I'm rather
nervous about relying on reproducibility of gzip because of Joey's
experience with pristine-tar, where he does find a lot of variation in
practice, but it is true that, for the purposes of multiarch, Debian *can*
possibly construct things such that we only need to worry about our own
gzip, which does simplify the situation.

However, as we've subsequently discussed, those are not the only issues
with file overlaps between packages.  So I'm going to try to summarize and
propose some possible solutions for the different issues.  I'm going to
discuss these issues in order from the most consistent with a refcounting
solution to the least consistent.

1. Uncompressed files that we know are absolutely identical between
   different architectures.  These include arch-independent header files
   that are just copied verbatim from the upstream source and data files
   in textual formats or arch-independent binary formats that aren't
   compressed and whose generation doesn't vary.  (Symlinks are a special
   case of this.)  Reference counting works great for these.  These also
   resolve most of the file overlaps between -dev packages, and many of
   the harder cases for interpackage dependencies if we split everything
   out.  I think it makes a lot of sense to use refcounting for these
   files.

2. Files like the above but that are compressed.  This is most common in
   the doc directory for things like README or the upstream changelog.
   Upstream man pages written directly in *roff fall into this category as
   well, for -dev packages.  With Steve's point above about gzip, I think
   we're probably okay using refcounting for this as well.

3. Generated documentation.  Here's where I think refcounting starts
   failing.  Man pages generated from POD may change if the version of
   Perl used to generate them changes, if Pod::Simple or Pod::Man have had
   a new release.  Doxygen-generated HTML documentation is even more
   likely to change.  Many documentation generation systems will include
   timestamps or other information that changes, or (even more likely)
   will have minor changes in their output and formatting even if there is
   nothing as obvious as a version number or timestamp.

   I don't think we can use refcounting for generated documentation
   produced as part of the package build process.  If there is
   Doxygen-generated documentation, generated man pages, or the like, I
   think those have to be split into a separate arch: all package.  Even
   if it's just a couple of man pages.  This is rather annoying, but I
   think trying to use refcounting here is just too fragile.

4. Lintian overrides.  I believe these should be qualified with the
   architecture on any multiarch: same package so that the overrides can
   vary by architecture, since this is a semi-frequent use case for
   Lintian.

5. Data files that vary by architecture.  This includes big-endian
   vs. little-endian issues.  These are simply incompatible with multiarch
   as currently designed, and incompatible with the obvious variations
   that I can think of, and will have to either be moved into
   arch-qualified directories (with corresponding patches to the paths
   from which the libraries load the data) or these packages can't be made
   multiarch.

6. Debian changelogs.  The actual content of these files change with
   binNMUs, so these obviously can't be refcounted at all right now.  We
   have to do something else here, probably by generating new
   binary-specific changelog files for binNMUs.

Does this seem comprehensive to everyone?  Am I missing any cases?

If this is comprehensive, then I propose the following path forward, which
is a mix of the various solutions that have been discussed:

* dpkg re-adds the refcounting implementation for multiarch, but along
  with a Policy requirement that packages that are multiarch must only
  contain files in classes 1 and 2 above.

* All packages that want to be multiarch: same have to move all generated
  documentation into a separate package unless the maintainer has very
  carefully checked that the generated documentation will be byte-for-byte
  identical even across minor updates of the documentation generation
  tools and when run at different times.

* Lintian should recognize arch-qualified override files, and multiarch:
  same packages must arch-qualify their override files.  debhelper
  assistance is desired for this.

* Policy prohibits arch-varying data files in multiarch: same packages
  except in arch-qualified paths.

* The binNMU process is changed to add the binNMU changelog entry to an
  arch-qualified file (changelog.Debian.arch, probably).  We need to
  figure out what this means if the package being binNMU'd has a
  /usr/share/doc/<package> symlink to another package, though; it's not
  obvious what to do here.

Please note that this is a bunch of work.  I think the Lintian work is a
good idea regardless, and it can start independently.  I think the same is
true of the binNMU changelog work, since this will address some
long-standing issues with changelog handling in some situations, including
resolving just how we're supposed to handle /usr/share/doc symlinks.  But
even with those aside, this is a lot of stuff that we need to agree on,
and in some cases implement, in a fairly short timeline if this is going
to make wheezy.

--
Russ Allbery ([hidden email])               <http://www.eyrie.org/~eagle/>


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/874nutncef.fsf_-_@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Raphael Hertzog-3
On Mon, 13 Feb 2012, Russ Allbery wrote:
> There's been a lot of discussion of this, but it seems to have been fairly
> inconclusive.  We need to decide what we're doing, if anything, for wheezy
> fairly soon, so I think we need to try to drive this discussion to some
> concrete conclusions.

Thanks for this.

> 2. Files like the above but that are compressed.  This is most common in
>    the doc directory for things like README or the upstream changelog.
>    Upstream man pages written directly in *roff fall into this category as
>    well, for -dev packages.  With Steve's point above about gzip, I think
>    we're probably okay using refcounting for this as well.

Yes, but I would still document at the policy level that, when feasible
without downsides, it's best to move compressed files in a shared package.

Also it might be wise to relax the policy rules on compression for
multi-arch: same and to let dh_compress not compress (some) files in such
packages.

> Does this seem comprehensive to everyone?  Am I missing any cases?

It's a good summary, yes.

> If this is comprehensive, then I propose the following path forward, which
> is a mix of the various solutions that have been discussed:

I agree with this plan.

> * The binNMU process is changed to add the binNMU changelog entry to an
>   arch-qualified file (changelog.Debian.arch, probably).  We need to
>   figure out what this means if the package being binNMU'd has a
>   /usr/share/doc/<package> symlink to another package, though; it's not
>   obvious what to do here.

I wonder what's the proper way to handle this. In theory, it would be nice
to deal with that at the dpkg-dev level but dpkg-dev is not at all
involved in installing the changelog. And I believe that the bin-nmu
process just adds a top-level entry to debian/changelog.

So the code should go to dh_installchangelogs... but it doesn't seem to be
a good idea to put the bin-nmu logic there in particular since we might
extend it (see #440094).

Somehow my suggestion is then to extend dpkg-parsechangelog to provide
the required logic to split the changelog in its bin-nmu part and its
usual content.

dpkg-parsechangelog --split-binnmu <binnmu-part-file> <remaining-part-file>

Then dh_installchangelogs could try to use this (and if it fails, fallback
to the standard changelog installation).

Does that sound sane? If yes, I can have a look at implementing this.

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214073923.GA866@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Philipp Kern-4
On 2012-02-14, Raphael Hertzog <[hidden email]> wrote:

> I wonder what's the proper way to handle this. In theory, it would be nice
> to deal with that at the dpkg-dev level but dpkg-dev is not at all
> involved in installing the changelog. And I believe that the bin-nmu
> process just adds a top-level entry to debian/changelog.
>
> So the code should go to dh_installchangelogs... but it doesn't seem to be
> a good idea to put the bin-nmu logic there in particular since we might
> extend it (see #440094).
>
> Somehow my suggestion is then to extend dpkg-parsechangelog to provide
> the required logic to split the changelog in its bin-nmu part and its
> usual content.
>
> dpkg-parsechangelog --split-binnmu <binnmu-part-file> <remaining-part-file>
>
> Then dh_installchangelogs could try to use this (and if it fails, fallback
> to the standard changelog installation).
>
> Does that sound sane? If yes, I can have a look at implementing this.

In theory sbuild could also offload this to dpkg-buildpackage by passing
something like "--binnmu-version 2 --binnmu-changelog 'Rebuild for libfoo
transition'".  The only thing that would be annoying is checking if the old
style or the new style must be used.  (I.e. there must be some sort of feature
query first.)

Kind regards
Philipp Kern


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/slrnjjk9br.nqd.trash@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Niels Thykier
In reply to this post by Russ Allbery-2
On 2012-02-14 07:43, Russ Allbery wrote:
> [...]
>
> * Lintian should recognize arch-qualified override files, and multiarch:
>   same packages must arch-qualify their override files.  debhelper
>   assistance is desired for this.
>
> [...]

I have no problem with Lintian accepting arch-qualified override files,
but I do not see the "strict" (i.e. "must") requirement[1].  Lintian
already allows you to do arch-specific overrides and 2.5.5 will even
allow architecture wildcards as well[2].


~Niels

[1] Exception being compressed override files, but of the 161 override
files on my system not a single one of them are compressed.

[2] http://lintian.debian.org/manual/section-2.4.html#section-2.4.1

Admittedly the link above does not have an example with it, but Lintian
(git) has an entire "section" dedicated to architecture specific
overrules.  So there will be several examples with next release.  :)


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/4F3A4888.906@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Raphael Hertzog-3
In reply to this post by Philipp Kern-4
On Tue, 14 Feb 2012, Philipp Kern wrote:

> On 2012-02-14, Raphael Hertzog <[hidden email]> wrote:
> > Somehow my suggestion is then to extend dpkg-parsechangelog to provide
> > the required logic to split the changelog in its bin-nmu part and its
> > usual content.
> >
> > dpkg-parsechangelog --split-binnmu <binnmu-part-file> <remaining-part-file>
> >
> > Then dh_installchangelogs could try to use this (and if it fails, fallback
> > to the standard changelog installation).
> >
> > Does that sound sane? If yes, I can have a look at implementing this.
>
> In theory sbuild could also offload this to dpkg-buildpackage by passing
> something like "--binnmu-version 2 --binnmu-changelog 'Rebuild for libfoo
> transition'".  The only thing that would be annoying is checking if the old
> style or the new style must be used.  (I.e. there must be some sort of feature
> query first.)

Yes but that doesn't change anything to the fact that dpkg-dev should not
install files in the generated .deb. So we still need some interaction
with dh_installchangelogs... but your suggestion lead me to another
proposal.

dpkg-buildpackage --binary-version <ver> --binary-changelog 'foo'
could create debian/changelog.build with the given changelog version and
changelog entry.

dpkg-parsechangelog could be taught to read debian/changelog.build
before debian/changelog so that dpkg-parsechangelog continues to do the
right thing (when called from debian/rules).

And dh_installchangelogs can be taught to install debian/changelog.build
as /usr/share/doc/<foo>/changelog.Debian.build-$arch.

dpkg-buildpackage would clean up debian/changelog.build if it wasn't
passed the proper option. dpkg-source would learn to not include it in
generated source packages, too.

This looks like rather appealing to me. What do you think?

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214131720.GD11824@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Ian Jackson-2
In reply to this post by Russ Allbery-2
Russ Allbery writes ("Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)"):
> There's been a lot of discussion of this, but it seems to have been fairly
> inconclusive.  We need to decide what we're doing, if anything, for wheezy
> fairly soon, so I think we need to try to drive this discussion to some
> concrete conclusions.

Yes.

> If this is comprehensive, then I propose the following path forward, which
> is a mix of the various solutions that have been discussed:

Thanks for this summary and analysis.  I agree with your conclusions.

> Does this seem comprehensive to everyone?  Am I missing any cases?

I think you have covered all of the cases that have been brought up on
this list, which I think are all of the important and frequent cases.

Thinking about other corner cases can be deferred for now because we
can put off converting affected packages until wheezy+1, or if we
really want to convert we can very probably add a "common" package.
So let us press on.

> * The binNMU process is changed to add the binNMU changelog entry to an
>   arch-qualified file (changelog.Debian.arch, probably).  We need to
>   figure out what this means if the package being binNMU'd has a
>   /usr/share/doc/<package> symlink to another package, though; it's not
>   obvious what to do here.

If we always put the binNMU changelog file in
 /usr/share/doc/<package>/changelog.Debian.<package>:<arch>
then in the symlink case we can put it file in
 /usr/share/doc/<symlink-target>/changelog.Debian.<original-package>:<arch>
and everything will work (apart from the fact that some minority of
changelog-reading tools will need to be taught to look at the new path).

> Please note that this is a bunch of work.  I think the Lintian work is a
> good idea regardless, and it can start independently.  I think the same is
> true of the binNMU changelog work, since this will address some
> long-standing issues with changelog handling in some situations, including
> resolving just how we're supposed to handle /usr/share/doc symlinks.  But
> even with those aside, this is a lot of stuff that we need to agree on,
> and in some cases implement, in a fairly short timeline if this is going
> to make wheezy.

Yes.  The work that absolutely needs to be done ASAP seems to be:
 - put the refcounting back in dpkg
 - lintian support for arch-qualified overrides
 - update the binNMU machinery to write the new changelog file instead

Things that should be done but are not on the critical paths:
 - transpose the restrictions on use of refcounting into policy
   (for now they can go in a text file in dpkg-dev, or even just
   a reference to your email)
 - update changelog-reading tools to look for binNMU changelogs too

Things which we can do at our leisure:
 - convert individual libraries
 - think about whether to always arch-qualify the whole changelog
 - think other refcounting corner cases (see my comments above)

> 5. Data files that vary by architecture.  This includes big-endian
>    vs. little-endian issues.  These are simply incompatible with
>    multiarch as currently designed, and incompatible with the obvious
>    variations that I can think of, and will have to either be moved
>    into arch-qualified directories (with corresponding patches to the
>    paths from which the libraries load the data) or these packages
>    can't be made multiarch.

Yes.  Of these, arch-qualifying the path seem to be to be obviously
the right answer.  Of course eg if the data files just come in big-
and little-endian, you can qualify the path with only the endianness
and use refcounting to allow the equal-endianness packages to share.

Ian.


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20282.26861.126288.496227@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Guillem Jover
In reply to this post by Russ Allbery-2
On Mon, 2012-02-13 at 22:43:04 -0800, Russ Allbery wrote:
> If this is comprehensive, then I propose the following path forward, which
> is a mix of the various solutions that have been discussed:

> * dpkg re-adds the refcounting implementation for multiarch, but along
>   with a Policy requirement that packages that are multiarch must only
>   contain files in classes 1 and 2 above.
>
> * All packages that want to be multiarch: same have to move all generated
>   documentation into a separate package unless the maintainer has very
>   carefully checked that the generated documentation will be byte-for-byte
>   identical even across minor updates of the documentation generation
>   tools and when run at different times.

If packages have to be split anyway to cope with the other cases, then
the number of new packages which might not be needed otherwise will be
even smaller than the predicted amount, at which point it makes even
less sense to support refcnt'ing.

It also requires maintainers to carefully consider if the (doc, etc)
toolchains will generate predictible ouput.

Your proposal still requires papering over the other corner-cases.

> * Policy prohibits arch-varying data files in multiarch: same packages
>   except in arch-qualified paths.

Well, there's no escape from this any way you look at it, regardless of
refcnt'ing or not.

> * The binNMU process is changed to add the binNMU changelog entry to an
>   arch-qualified file (changelog.Debian.arch, probably).  We need to
>   figure out what this means if the package being binNMU'd has a
>   /usr/share/doc/<package> symlink to another package, though; it's not
>   obvious what to do here.

This requires IMO multitude of hacks when the simplest and obvious
arch-qualified pkgname solves this cleanly, and allows debhelper to
automatically deal with it. And for tools to just change where they
always look for those files in the M-A:same case regardless of the
package being binNMUed or not.

This still does not solve the other issues I listed, namely binNMUs
have to be performed in lock-step, more complicated transitions /
upgrades. And introduces different solutions for different problems,
while my proposal is generic for all cases.

So this is still pretty much unconvincing, and seems like clinging
into the refcnt'ing “solution” while it makes things overall more
complicated, will introduce inconsistency and incertainty to
maintainers, needs way more global changes to keep it going, etc.

What I'd change to my proposal in the summary mail, is that arch-indep
files might be considered for splitting at maintainers discretion,
when it actually seems worth it, in the same way we've handled
splitting arch-indep files from arch:any up to now. So for example a
couple of headers could be kept on the -dev package, or Ian's case on
essential and data files could also be kept on the same lib package,
as long as their paths are arch-qualified either trhough a pkgname:arch
or the multiarch triplet. This would reduce even more the amount of
newly split packages.

regards,
guillem


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214140138.GA23158@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Ian Jackson-2
In reply to this post by Russ Allbery-2
Guillem Jover writes ("Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)"):

> On Mon, 2012-02-13 at 22:43:04 -0800, Russ Allbery wrote:
> > * The binNMU process is changed to add the binNMU changelog entry to an
> >   arch-qualified file (changelog.Debian.arch, probably).  We need to
> >   figure out what this means if the package being binNMU'd has a
> >   /usr/share/doc/<package> symlink to another package, though; it's not
> >   obvious what to do here.
>
> This requires IMO multitude of hacks when the simplest and obvious
> arch-qualified pkgname solves this cleanly, and allows debhelper to
> automatically deal with it. And for tools to just change where they
> always look for those files in the M-A:same case regardless of the
> package being binNMUed or not.

I agree that it would be nice to always arch-qualify the changelog
filename.  But that would involve a lot of changes to
changelog-reading tools which we perhaps don't want to do right now.

Note that even if we decide to always arch-qualify, we will still have
lots of old packages so all changelog-reading tools will need to look
in both places.

For most changelog-reading tools it won't be very troublesome if they
accidentally don't spot a binNMU entry.  So Russ's proposal is a good
step towards your proposal.  And if we decide we don't need to go all
the way then it's good enough for now.

> This still does not solve the other issues I listed, namely binNMUs
> have to be performed in lock-step, more complicated transitions /
> upgrades.

I don't think I see where this is coming from.  Are you talking about
variation in gzip output ?  Given the evidence we've seen here, in
practice I think that is not going to be a problem.  Certainly it
won't demand that binNMUs be performed in lock-step.

> So this is still pretty much unconvincing, and seems like clinging
> into the refcnt'ing “solution” while it makes things overall more
> complicated, will introduce inconsistency and incertainty to
> maintainers, needs way more global changes to keep it going, etc.

I think the refcounting approach is very worthwhile because it
eliminates unnecessary work (by human maintainers) in many simple
cases.

Ian.


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20282.28586.577528.890135@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Josselin Mouette
In reply to this post by Russ Allbery-2
Le lundi 13 février 2012 à 22:43 -0800, Russ Allbery a écrit :
> There's been a lot of discussion of this, but it seems to have been fairly
> inconclusive.  We need to decide what we're doing, if anything, for wheezy
> fairly soon, so I think we need to try to drive this discussion to some
> concrete conclusions.

Thank you very much for your constructive work.

> 3. Generated documentation.  Here's where I think refcounting starts
>    failing.

So we need to move a lot of documentation generated with gtk-doc or
doxygen from -dev packages to -doc packages. But it really seems an
acceptable tradeoff between the amount of work required and the
cleanness of the solution.

> Does this seem comprehensive to everyone?  Am I missing any cases?

Are there any cases of configuration files in /etc that vary across
architectures? Think of stuff like ld.so.conf, where some plugins or
library path is coded in a configuration file.

--
 .''`.      Josselin Mouette
: :' :
`. `'
  `-


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/1329230441.3297.378.camel@pi0307572

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Jakub Wilk-4
In reply to this post by Raphael Hertzog-3
* Raphael Hertzog <[hidden email]>, 2012-02-14, 14:17:

>dpkg-buildpackage --binary-version <ver> --binary-changelog 'foo' could
>create debian/changelog.build with the given changelog version and
>changelog entry.
>
>dpkg-parsechangelog could be taught to read debian/changelog.build
>before debian/changelog so that dpkg-parsechangelog continues to do the
>right thing (when called from debian/rules).
>
>And dh_installchangelogs can be taught to install
>debian/changelog.build as
>/usr/share/doc/<foo>/changelog.Debian.build-$arch.
>
>dpkg-buildpackage would clean up debian/changelog.build if it wasn't
>passed the proper option. dpkg-source would learn to not include it in
>generated source packages, too.
>
>This looks like rather appealing to me. What do you think?

Yes, it does look appealing. But...

Are we sure than no existing package uses debian/changelog.build for
their own purposes?

Are we sure that all existing packages (and helpers) that parse
debian/changelog use dpkg-parsechangelog?

--
Jakub Wilk


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214144341.GA3346@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Sven Joachim
In reply to this post by Ian Jackson-2
On 2012-02-14 15:28 +0100, Ian Jackson wrote:

> Guillem Jover writes:
>
>> This still does not solve the other issues I listed, namely binNMUs
>> have to be performed in lock-step, more complicated transitions /
>> upgrades.
>
> I don't think I see where this is coming from.  Are you talking about
> variation in gzip output ?  Given the evidence we've seen here, in
> practice I think that is not going to be a problem.  Certainly it
> won't demand that binNMUs be performed in lock-step.

Guillem is referring to the need to to keep package versions in sync
across architectures, pretty much a necessity if you permit shared
files.

Cheers,
       Sven


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/871upxcvk4.fsf@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Raphael Hertzog-3
In reply to this post by Guillem Jover
Hi,

On Tue, 14 Feb 2012, Guillem Jover wrote:

> > * All packages that want to be multiarch: same have to move all generated
> >   documentation into a separate package unless the maintainer has very
> >   carefully checked that the generated documentation will be byte-for-byte
> >   identical even across minor updates of the documentation generation
> >   tools and when run at different times.
>
> If packages have to be split anyway to cope with the other cases, then
> the number of new packages which might not be needed otherwise will be
> even smaller than the predicted amount, at which point it makes even
> less sense to support refcnt'ing.

Why are you so opposed to the refcnt'ing?

It's not such a big deal to maintain this feature in dpkg. And even if the
current implementation is not perfect, it can be improved later when dpkg
will store by itself checksums of provided files.

To me it looks like you don't like refcnt'ing and you're trying to find
some reasons to make it unacceptable.

> It also requires maintainers to carefully consider if the (doc, etc)
> toolchains will generate predictible ouput.

If the maintainer has to install files in non-standard path (because of
the need to arch-qualify it), it will also need maintainers to carefully
consider how to ensure that this move doesn't break anything.

It's not a white/black situation. You're trading one potential problem for
another. And the differing files are likely to be much more easy to spot
than other behaviour changes that might be implied by the move of some
files to arch qualified paths.

> Your proposal still requires papering over the other corner-cases.

Can you be explicit about which corner cases you're referring to ?

> This still does not solve the other issues I listed, namely binNMUs
> have to be performed in lock-step

Can you explain why? If the binnmu changelog is in a arch-specific file,
then we're free to bin-nmu packages separately.

dpkg must just ensure that all "M-A: same" packages have the same source
version (instead of the binary version as currently).

>, more complicated transitions / upgrades.

We have no experience on this. It's a bit early to say whether those
constraints are going to be problematic or not.

> And introduces different solutions for different problems, while my
> proposal is generic for all cases.

There's nothing like a generic solution. You still have to decide whether
you move files to a -common package or if you arch qualify them and keep
them in the M-A: same package. And in both cases, you have to evaluate the
implications, in terms of package installation ordering in one case, in
terms of modifications to do to properly support the arch-qualified files
in the other one.

While it may sound like "cleaner" from a theoretical point of view, I'm
not convinced that it's better than the approach outlined by Russ.

Also you completely ignore the fact that what you're proposing is an
important change for multi-arch packages that have already been converted
both in Debian and in Ubuntu. You're pushing back the work to package
maintainers when there's not reason to not deal with this at the build
infrastructure level.

To reduce some of the downsides associated to compressed files in M-A:
same packages, we could/should investigate how to not compress files
in such packages instead of duplicating them needlessly.

> So this is still pretty much unconvincing, and seems like clinging
> into the refcnt'ing “solution” while it makes things overall more
> complicated, will introduce inconsistency and incertainty to
> maintainers, needs way more global changes to keep it going, etc.

This is not a fair characterization of the situation. IMO "Global changes" are
better than "lots of maintainers having to do busy-work splitting their
packages".

You see inconsistency in Russ's proposal but you don't see
inconsistency/incertainty when you change the standard location of
changelog files.

And the "more complicated", it might be true at the dpkg level, but I
don't believe that it's true from the maintainers points of view.

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214151318.GA14915@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Marvin Renich
In reply to this post by Russ Allbery-2
* Russ Allbery <[hidden email]> [120214 01:48]:
> If this is comprehensive, then I propose the following path forward, which
> is a mix of the various solutions that have been discussed:

I thought Goswin's suggestion in [1] of having dpkg use implicit
diversions has merit and deserves further scrutiny.  It essentially
implements refcnt-like behavior by using the existing diversion
mechanism.  I did not see any message in the thread that even
acknowledged that part of his message.  Did I miss something?

...Marvin

[1] http://lists.debian.org/debian-devel/2012/02/msg00511.html


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214151437.GA14820@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Raphael Hertzog-3
In reply to this post by Sven Joachim
Hi,

On Tue, 14 Feb 2012, Sven Joachim wrote:

> > Guillem Jover writes:
> >
> >> This still does not solve the other issues I listed, namely binNMUs
> >> have to be performed in lock-step, more complicated transitions /
> >> upgrades.
> >
> > I don't think I see where this is coming from.  Are you talking about
> > variation in gzip output ?  Given the evidence we've seen here, in
> > practice I think that is not going to be a problem.  Certainly it
> > won't demand that binNMUs be performed in lock-step.
>
> Guillem is referring to the need to to keep package versions in sync
> across architectures, pretty much a necessity if you permit shared
> files.

It's a matter of defining "in sync" as "having the same source version"
instead of "having the same binary version".

But such a change will require updates for APT too.

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214151637.GB14915@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Raphael Hertzog-3
In reply to this post by Jakub Wilk-4
On Tue, 14 Feb 2012, Jakub Wilk wrote:
> Are we sure than no existing package uses debian/changelog.build for
> their own purposes?

No, but with debian/changelog.dpkg-build we should be safe.

> Are we sure that all existing packages (and helpers) that parse
> debian/changelog use dpkg-parsechangelog?

No, but I would consider anything else as a bug and we would notice
relatively quickly (we could even do a full rebuild to try to verify
pro-actively).

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214152757.GC14915@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Raphael Hertzog-3
In reply to this post by Marvin Renich
On Tue, 14 Feb 2012, Marvin Renich wrote:
> I thought Goswin's suggestion in [1] of having dpkg use implicit
> diversions has merit and deserves further scrutiny.

I don't. diversions support 2 packages, the "diverted" one and the
"diverting" one. Multi-Arch: same must support co-installation of
any number of packages.

So you can't reuse the existing logic without heavy modifications.

And changing the destination path at installation time is not a
good idea. What if the package really requires that specific version
of the file at the indicated path ? We will have configured the package
as if everything went fine when in fact it did not.

Cheers,
--
Raphaël Hertzog ◈ Debian Developer

Pre-order a copy of the Debian Administrator's Handbook and help
liberate it: http://debian-handbook.info/liberation/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214153350.GD14915@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Guillem Jover
In reply to this post by Ian Jackson-2
On Tue, 2012-02-14 at 14:28:58 +0000, Ian Jackson wrote:

> Guillem Jover writes ("Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)"):
> > On Mon, 2012-02-13 at 22:43:04 -0800, Russ Allbery wrote:
> > > * The binNMU process is changed to add the binNMU changelog entry to an
> > >   arch-qualified file (changelog.Debian.arch, probably).  We need to
> > >   figure out what this means if the package being binNMU'd has a
> > >   /usr/share/doc/<package> symlink to another package, though; it's not
> > >   obvious what to do here.
> >
> > This requires IMO multitude of hacks when the simplest and obvious
> > arch-qualified pkgname solves this cleanly, and allows debhelper to
> > automatically deal with it. And for tools to just change where they
> > always look for those files in the M-A:same case regardless of the
> > package being binNMUed or not.
>
> I agree that it would be nice to always arch-qualify the changelog
> filename.  But that would involve a lot of changes to
> changelog-reading tools which we perhaps don't want to do right now.

I've never proposed to arch-qualify the filename for the stuff under
/usr/share/doc/pkgname/, I've proposed to arch-qualify the pkgname in
the path (/usr/share/doc/pkgname:arch/), but only for M-A:same packages,
which are the only ones needing the disambiguation. This is how dpkg
handles pkgname output, or how it stores their data in the db too.

And it should be easy to ask a multiarch enabled dpkg-query for example
to normalize the pkgname output to be used on those paths, or otherwise
do it by hand:

  if M-A == same
    pkgname:arch
  else
    pkgname

> Note that even if we decide to always arch-qualify, we will still have
> lots of old packages so all changelog-reading tools will need to look
> in both places.

> For most changelog-reading tools it won't be very troublesome if they
> accidentally don't spot a binNMU entry.  So Russ's proposal is a good
> step towards your proposal.  And if we decide we don't need to go all
> the way then it's good enough for now.

How many tools are there that actually read the binary package changelog
file anyway? I only know of packages.d.o. Any other tool reading from
the installed path, cannot really rely on it being present at all
anyway, per policy.

And in addition, binNMU split changelogs are going to be there forever,
and as such their possible double locations. While the possible double
location for M-A:same packages using pkgname:arch qualified pathnames
would only be temporary and disappear once the packages have been rebuilt
with a new debhelper which automatically installs them in the correct
place.

> > So this is still pretty much unconvincing, and seems like clinging
> > into the refcnt'ing “solution” while it makes things overall more
> > complicated, will introduce inconsistency and incertainty to
> > maintainers, needs way more global changes to keep it going, etc.
>
> I think the refcounting approach is very worthwhile because it
> eliminates unnecessary work (by human maintainers) in many simple
> cases.

As I mentioned in Riku's reply, the amount of packages that would need
splitting that would otherwise not be needed should be even less than
before (which was predicted at around 700), also as I mentioned there
too, nothing prevents us from arch-qualifying paths (with Debian arch
or multiarch triplet depending on the case) if that's more convenient
or safer (as per your essential data example), and is what we've been
doing anyway for arch-indep data shipped in arch:any packages all along.
Given the amount of hacks or special casing piling up to make refcnt'ing
workable, when all that's really needed is a one time handling (or a
possible additional change for already converted packages, for things
that debhelper might not be able to handle) of moving qualifying paths
or splitting into new packages, it really does not seem worth it, no.

regards,
guillem


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120214164015.GA27571@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal

Russ Allbery-2
In reply to this post by Niels Thykier
Niels Thykier <[hidden email]> writes:
> On 2012-02-14 07:43, Russ Allbery wrote:
>> [...]
>>
>> * Lintian should recognize arch-qualified override files, and multiarch:
>>   same packages must arch-qualify their override files.  debhelper
>>   assistance is desired for this.
>>
>> [...]

> I have no problem with Lintian accepting arch-qualified override files,
> but I do not see the "strict" (i.e. "must") requirement[1].  Lintian
> already allows you to do arch-specific overrides and 2.5.5 will even
> allow architecture wildcards as well[2].

Ah, yes, you're right, that's a good point.  If you use architecture
restrictions in the overrides, then you can install the same override file
on all architectures, so this doesn't need to be dealt with immediately
(if at all; we only have to do this if we want to support installing
different override files per architecture, which isn't strictly
necessary).

> [1] Exception being compressed override files, but of the 161 override
> files on my system not a single one of them are compressed.

I don't think we ever told anyone Lintian could support compressed
override files.  In fact, I didn't know it could.  :)

--
Russ Allbery ([hidden email])               <http://www.eyrie.org/~eagle/>


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/878vk58dq9.fsf@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Guillem Jover
In reply to this post by Ian Jackson-2
On Tue, 2012-02-14 at 14:28:58 +0000, Ian Jackson wrote:
> I think the refcounting approach is very worthwhile because it
> eliminates unnecessary work (by human maintainers) in many simple
> cases.

Aside from what I said on my other reply, I just wanted to note that
this seems to be a recurring point of tension in the project when it
comes to archive wide source package changes, where supposed short
term convenience (with its usually long term harmful effects) appears
to initially seduce people over what seems to be the cleaner although
slightly a bit more laborious solution.

Other recent-ish incarnations of this tension could be the build-arch
build-indep targets, or the build flag settings; where the former got
recently resolved so that the right thing to do is for *all* packages
needing to eventually support those targets, or for the latter which
got switched from the seemingly more convenient to the more laborious
but correct solution, that is, *all* packages need to set those build
flags by themselves.

This is a fundamental issue with how our source packages are handled,
and the freedom and power it gives to experiment and implement them
whatever way the maintainer wants, has the price that doing some
archive wide changes is sometimes more costly, than changing something
centrally and be done with it. But trying to workaround this by coming
up with stacks of hacked up solutions will not solve that fundamental
issue, and this kind of tension will keep coming up again and again,
as long as the foundation is not reworked. Either that, or the project
needs to accept that fact and learn to live with this kind of changes,
with patience.

regards,
guillem


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]
Archive: http://lists.debian.org/20120215011510.GA15353@...

Reply | Threaded
Open this post in threaded view
|

Re: Multiarch file overlap summary and proposal (was: Summary: dpkg shared / reference counted files and version match)

Joey Hess
Guillem Jover wrote:

> Aside from what I said on my other reply, I just wanted to note that
> this seems to be a recurring point of tension in the project when it
> comes to archive wide source package changes, where supposed short
> term convenience (with its usually long term harmful effects) appears
> to initially seduce people over what seems to be the cleaner although
> slightly a bit more laborious solution.
>
> Other recent-ish incarnations of this tension could be the build-arch
> build-indep targets, or the build flag settings; where the former got
> recently resolved so that the right thing to do is for *all* packages
> needing to eventually support those targets, or for the latter which
> got switched from the seemingly more convenient to the more laborious
> but correct solution, that is, *all* packages need to set those build
> flags by themselves.
>
> This is a fundamental issue with how our source packages are handled,
> and the freedom and power it gives to experiment and implement them
> whatever way the maintainer wants, has the price that doing some
> archive wide changes is sometimes more costly, than changing something
> centrally and be done with it. But trying to workaround this by coming
> up with stacks of hacked up solutions will not solve that fundamental
> issue, and this kind of tension will keep coming up again and again,
> as long as the foundation is not reworked. Either that, or the project
> needs to accept that fact and learn to live with this kind of changes,
> with patience.
Very interesting mail. While I certianly agree with your examples, it's
worth remembering the counterexample of the /usr/doc transition which
took approximately 5 years to complete[1], and probably could have been
accomplished quickly and without pain with a simple hack to dpkg.

Anyway, my worry about the refcounting approach (or perhaps M-A: same in
general) is not the details of the implementation in dpkg, but the added
mental complexity of dpkg now being able to have multiple distinct
packages installed under the same name. I had a brief exposure to rpm,
which can install multiple versions of the same package, and that was
the main cause of much confusing behavior in rpm. While dpkg's invariant
that all co-installable package names be unique (and have unique files)
has certianly led to lots of ugly package names, it's kept the users'
and developers' mental models quite simple.

I worry that we have barely begun to scratch the surface of the added
complexity of losing this invariant.

--
see shy jo

[1] To the extent it was ever completed.. master.debian.org still has
    a vestigial /usr/doc/

signature.asc (845 bytes) Download Attachment
1234