Re: .deb format: let's use 0.939, zstd, drop bzip2

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ian Jackson-2
(adding debian-dpkg)

Adam Borowski writes (".deb format: let's use 0.939, zstd, drop bzip2"):
> First, the 0.939 format, as described in "man deb-old".  While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release.  Why?  It has at least two upsides over 2.0:

What an interesting proposal.  I don't think I agree, but:

> * there's no 10¹⁰ bytes (~9.31GB) limit
>   While no package this big is in the archive _yet_ (max being 1⎖652⎖244⎖360
>   bytes), both storage sizes and software bloat grow pretty fast, thus we'll
>   break this barrier in a few years.  And there's a world outside the
>   official archive -- I bet someone already has been burned by this limit.

This is a problem.

> * it's faster by a small but non-negligible factor.  A task "unpack all
>   packages in default XFCE GUI install" gets done by stock dpkg (after
>   repacking everything as gzip) 3% faster.

I'm not sure why it should be faster.

As the person who deprecated deb-old in favour of the current format,
my motives were:
 - the old format was a real pain to unpack without a custom utility
    (this used to be a much more serious problem)
 - the old format was not very extensible.

Debian doesn't really use much of the extensibility.  Some people
invented a .deb signing system which put signatures in there too but I
don't think any such things are deployed.

We use the extensibility for compression format changes, but
compressors all have magic numbers and we could just use those.

It would be much less convenient to change our archive format from tar
to something else, as proposed by Ansgar, without this extensibility.
(I don't necessarily think Ansgar's idea is a good one, but it makes
an example here.)

As for the size limit, this was discussed in May 2016:
  https://lists.debian.org/debian-dpkg/2016/05/msg00027.html

(I can't find a bug about it, though).  I made a proposal.
No decision was made and nothing was done, unfortunately.

Ian.


--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Guillem Jover
Hi!

On Wed, 2019-05-08 at 19:38:26 +0200, Adam Borowski wrote:
> First, the 0.939 format, as described in "man deb-old".  While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release.  Why?  It has at least two upsides over 2.0:

I'll try to detangle the discussion and address this first. Some of
what I'm going to write has already been writen in the thread, but
I'm just going to condense and give it some additional context and
lay down the direction I'd like to go with.

To recap, format 0.93x has multiple problems:

  - Cannot be handled with stock tools.
  - Not easily extensible.
  - Bad data alignment.
  - Bad commpression support.
  - Bad tool coverage (see below).

I don't think it's correct that most tools support that format, from the
list of programs that I've tracked that handle .deb directly, I'd even
say almost none do <https://wiki.debian.org/Teams/Dpkg/DebSupport>. This
list does not include many projects/programs not within Debian handling
.deb archives directly.


The size limit is indeed a problem, and was already known and tracked
in deb(5) and <https://wiki.debian.org/Teams/Dpkg/TimeTravelFixes>, see
the “.deb size limit” item there, and then later discussed in
<https://lists.debian.org/debian-dpkg/2016/05/msg00027.html>. And
while I think the workarounds I listed there are probably still valid
in most cases, if this is affecting people then prioritizing fixing it
now would be good.

The crazy idea I came up with at the time was to use a dual-format PAX+ar
container (that would embed the ar(5) header in the first PAX name entry).
This would make old tools at least detect this is a .deb package, with a
higher major version.

  <https://lists.debian.org/debian-dpkg/2016/06/msg00005.html>

But I guess I was never sold on it either, and thinking about it, the
tradeoff does not really look very good. file(1) does not even recognize
it out-of-the-box as a .deb anyway, and we'd just get a nicer error
message from some of the tools handling .debs, but all of them need to
be updated anyway to support any new format. It also destroys some of the
nice properties of the 2.x format, namely:

  - Not requiring special tools to build/extract.
  - Using a non-widespread format (PAX).

Getting rid of ar(5) also would make the format more portable, as the
ar(5) format does change depending on the Unix system! Even besides the
main common format and its BSD and GNU variants, there are other
wildly different layouts. It would also mean we do not need binutils
to analyze them when there is no dpkg-deb around.

For the same reason using PAX would probably be a bad idea, as it's a
format that has unfortunately not really caught up, and takes more
space due to the additional headers, and we do not really need xattr
in the contains. I went for that for its unlimited length metadata, but
since dpkg 1.18.24 that should not be an issue as I implemented GNU
large file metadata support which means we have pretty much "unlimited"
length metadata, and I'd say its encoding is more widespread than PAX
(for example star supports it).

So I think Andrej is on the spot, and we should just switch from ar(5)
to tar(5) as the container, but not to PAX, just the GNU extensions we
already support, which would only be used when necessary. And ignore
any crazy idea of embedding an ar header inside the first member, as
that will just complicate matters and be cruft once we have switched.
So given that we'd need to modify any program handling .debs directly
anyway, I'd go for the most straightforward and simple of the options.

I'll propose an actual diff I've got here of deb(5) tomorrow, but
otherwise if there are no great concerns, I'd like to start adding
support for this for dpkg 1.20.x.

Thanks,
Guillem

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
On Fri, May 10, 2019 at 05:18:18AM +0200, Guillem Jover wrote:
> On Wed, 2019-05-08 at 19:38:26 +0200, Adam Borowski wrote:
> > First, the 0.939 format, as described in "man deb-old".  While still being
> > accepted by dpkg, it had been superseded before even the very first stable
> > release.  Why?  It has at least two upsides over 2.0:

> To recap, format 0.93x has multiple problems:
>
>   - Cannot be handled with stock tools.

"ar" is an obscure historical thing, akin to "cpio" or such.  It is used
deep within the format, but I wouldn't call this part an upside at all.

>   - Not easily extensible.

Huh?  Seems exactly as extensible as 2.0 (where all deployed extensions went
to control.tar).

>   - Bad data alignment.

Yeah, but it's still faster than 2.0.  And I don't expect decompressors to
care about alignment.  Might matter for "cat", though -- but I don't imagine
many uncompressed archives.  Heck, if we'd start 3.0, I'd recommend lz4
instead.

>   - Bad commpression support.

Trivial to add.

>   - Bad tool coverage (see below).
>
> I don't think it's correct that most tools support that format, from the
> list of programs that I've tracked that handle .deb directly, I'd even
> say almost none do <https://wiki.debian.org/Teams/Dpkg/DebSupport>.

Most of those have no business looking at the format's details, just the
payload.

> The crazy idea I came up with at the time was to use a dual-format PAX+ar
> container (that would embed the ar(5) header in the first PAX name entry).
> This would make old tools at least detect this is a .deb package, with a
> higher major version.
[...]
> So I think Andrej is on the spot, and we should just switch from ar(5)
> to tar(5) as the container

I would heavily advise against archive-in-archive.  Especially not tar, with
its block madness.  The blocks disappear when compressed but you're not
going to compress the outer layer.  Also, you can't shed the outer layer of
tar without a filter.

According to the benchmarks I just posted, even less than 1/3 loaded
processor is already bottlenecked on passing data from layer to layer.  I
tried a zero-copy implementation with libarchive's callbacks, but it doesn't
seem to help:

gzip, median of 101, libarchive implementation:
0.93: real 0.97 user 14.74
2.0:  real 0.99 user 15.89

> I'll propose an actual diff I've got here of deb(5) tomorrow, but
> otherwise if there are no great concerns, I'd like to start adding
> support for this for dpkg 1.20.x.

Let's not be hasty -- unlike 0.93 which has an existing (if spotty) support,
a complete format break should be better researched.  Ansgar's concerns for
example should be at least considered.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Andrej Shadura-2
On Fri, 10 May 2019 at 06:47, Adam Borowski <[hidden email]> wrote:

> > The crazy idea I came up with at the time was to use a dual-format PAX+ar
> > container (that would embed the ar(5) header in the first PAX name entry).
> > This would make old tools at least detect this is a .deb package, with a
> > higher major version.
> [...]
> > So I think Andrej is on the spot, and we should just switch from ar(5)
> > to tar(5) as the container
>
> I would heavily advise against archive-in-archive.  Especially not tar, with
> its block madness.  The blocks disappear when compressed but you're not
> going to compress the outer layer.  Also, you can't shed the outer layer of
> tar without a filter.

You may be amazed, but that’s actually what ipkg used to do (and opkg
still supports):

[1]: https://web.archive.org/web/20100823030002/http://www.handhelds.org/moin/moin.cgi/Ipkg#head-133c8da00cf5d277becb22540e75e6fbe5536902

[2]: https://git.yoctoproject.org/cgit/cgit.cgi/opkg/tree/libopkg/opkg_archive.c#n523

(I think it would be quite ironic if Debian switches to a package
format derived from Debian to be used elsewhere.)

--
Cheers,
  Andrej

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Michael Stone-2
In reply to this post by Guillem Jover
On Fri, May 10, 2019 at 05:18:18AM +0200, Guillem Jover wrote:
>be updated anyway to support any new format. It also destroys some of the
>nice properties of the 2.x format, namely:
>
>  - Not requiring special tools to build/extract.

This is really not a property worth preserving. I think it would be
fairly easy to get significant performance improvements if we dropped
the archive nesting, and all it would cost is losing a bullet point that
nobody really cares about all that much. I remember when this was one of
the "reasons" to advocate .deb over .rpm but in the real world people
just apt install rpm and the anecdotes about this one time somebody
wanted to unpack a deb on an ancient sunos box aren't worth slowing down
every install until the end of time.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Sam Hartman-3
>>>>> "Michael" == Michael Stone <[hidden email]> writes:

    Michael> On Fri, May 10, 2019 at 05:18:18AM +0200, Guillem Jover wrote:
    >> be updated anyway to support any new format. It also destroys
    >> some of the nice properties of the 2.x format, namely:
    >>
    >> - Not requiring special tools to build/extract.

    Michael> This is really not a property worth preserving. I think it
    Michael> would be fairly easy to get significant performance
    Michael> improvements if we dropped the archive nesting, and all it
    Michael> would cost is losing a bullet point that nobody really
    Michael> cares about all that much. I remember when this was one of
    Michael> the "reasons" to advocate .deb over .rpm but in the real
    Michael> world people just apt install rpm and the anecdotes about
    Michael> this one time somebody wanted to unpack a deb on an ancient
    Michael> sunos box aren't worth slowing down every install until the
    Michael> end of time.

I've certainly heard people describe our use of both ar and tar as an
architectural minus especially on embedded platforms just because the
dependency set of dpkg needed to be larger.

I don't know how big of a concern that still is, but it does seem
strange to use multiple different archiving technologies in the same
format today.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

W. Martin Borgert
Quoting Sam Hartman <[hidden email]>:
> I've certainly heard people describe our use of both ar and tar as an
> architectural minus especially on embedded platforms just because the
> dependency set of dpkg needed to be larger.

On my embedded systems, I don't have ar installed, only tar.
I assume, that dpkg speaks ar natively?

> I don't know how big of a concern that still is, but it does seem
> strange to use multiple different archiving technologies in the same
> format today.

Anything wrong with one .zip, just like .jar or .odf?

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
On Fri, May 10, 2019 at 02:49:01PM +0200, W. Martin Borgert wrote:
> Quoting Sam Hartman <[hidden email]>:
> > I've certainly heard people describe our use of both ar and tar as an
> > architectural minus especially on embedded platforms just because the
> > dependency set of dpkg needed to be larger.
>
> On my embedded systems, I don't have ar installed, only tar.
> I assume, that dpkg speaks ar natively?

Both yes and no, of course. :p

> > I don't know how big of a concern that still is, but it does seem
> > strange to use multiple different archiving technologies in the same
> > format today.
>
> Anything wrong with one .zip, just like .jar or .odf?

/usr on the box I'm sitting at:
* zip the program: dies horribly due to /usr/lib/llvm-7/build/ symlink
  loops.
* zip:
        1891345142 bytes
* zip-the-concept (individually compressed files), xz
        1516943024 bytes
* tar.xz
        1092591508 bytes

Linux source:
* zip:
        213820843 bytes
* individually compressed files, xz
        180997203 bytes
* tar.xz:
        104318396 bytes

So no, I don't want zip, nor even a randomly accessible format.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ian Jackson-2
In reply to this post by Sam Hartman-3
W. Martin Borgert writes ("Re: .deb format: let's use 0.939, zstd, drop bzip2"):
> Quoting Sam Hartman <[hidden email]>:
> > I've certainly heard people describe our use of both ar and tar as an
> > architectural minus especially on embedded platforms just because the
> > dependency set of dpkg needed to be larger.
>
> On my embedded systems, I don't have ar installed, only tar.
> I assume, that dpkg speaks ar natively?

dpkg-deb has a built-in decoder for the subset of ar that is used for
deb(5).  One reason I chose ar rather than tar is that handwriting a
decoder for ar was much simpler than for tar.

Ian.

--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

John Goerzen-3

On Fri, May 10 2019, Ian Jackson wrote:

>> On my embedded systems, I don't have ar installed, only tar.
>> I assume, that dpkg speaks ar natively?
>
> dpkg-deb has a built-in decoder for the subset of ar that is used for
> deb(5).  One reason I chose ar rather than tar is that handwriting a
> decoder for ar was much simpler than for tar.

Plus, of course, when discussing tar, there is always the "which tar
format do you mean?" question.

https://manpages.debian.org/stretch/libarchive-dev/tar.5.en.html

I should note that dpkg does have a maximum file size limit that's
rather lower than the ar limit, due to its interpretation of tar
headers.  I believe I filed a bug on this but I'm not able to find it
right now, unfortunately.

John

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ian Jackson-2
John Goerzen writes ("Re: .deb format: let's use 0.939, zstd, drop bzip2"):
> Plus, of course, when discussing tar, there is always the "which tar
> format do you mean?" question.
>
> https://manpages.debian.org/stretch/libarchive-dev/tar.5.en.html

Quite.

> I should note that dpkg does have a maximum file size limit that's
> rather lower than the ar limit, due to its interpretation of tar
> headers.  I believe I filed a bug on this but I'm not able to find it
> right now, unfortunately.

IDK if dpkg is still using this but back in the dqy, when I needed
dpkg to handle each file in a tarball individually, I found a tar
parsing library under a rock somewhere.  It wasn't a bad one but I'm
sure it could be improved...

Ian.

--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Guillem Jover
On Sat, 2019-05-11 at 12:08:05 +0100, Ian Jackson wrote:
> John Goerzen writes ("Re: .deb format: let's use 0.939, zstd, drop bzip2"):
> > Plus, of course, when discussing tar, there is always the "which tar
> > format do you mean?" question.
> >
> > https://manpages.debian.org/stretch/libarchive-dev/tar.5.en.html

This should be documented already in deb(5), if it is not clear or you
feel it lacks detail I'd be glad to improve it.

> > I should note that dpkg does have a maximum file size limit that's
> > rather lower than the ar limit, due to its interpretation of tar
> > headers.  I believe I filed a bug on this but I'm not able to find it
> > right now, unfortunately.

Nope. This was already mentioned in this thread. dpkg-deb's tar
implementation has no practical limits anymore since 1.18.24 (#850834),
also documented in deb(5).

> IDK if dpkg is still using this but back in the dqy, when I needed
> dpkg to handle each file in a tarball individually, I found a tar
> parsing library under a rock somewhere.  It wasn't a bad one but I'm
> sure it could be improved...

The tar extraction code has seen substantial rework since its
introduction. I'd also probably say it's pretty robust compared to
many of its alternatives.

Thanks,
Guillem

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Guillem Jover
In reply to this post by Guillem Jover
On Fri, 2019-05-10 at 05:18:18 +0200, Guillem Jover wrote:
> I'll propose an actual diff I've got here of deb(5) tomorrow, but
> otherwise if there are no great concerns, I'd like to start adding
> support for this for dpkg 1.20.x.

Unfortunately I think I'll have to retract the above statements, and
will not be discussing several things I had pending on few subthreads,
nor I'm planning on starting any implementation work anymore. :(

With the prospects looming around of the tech-ctte being morphed into
something like a steering-ctte, and random potentially contentious
topics being redirected away into that ctte, exclusively in the name
of progress' sake, the dangers of that happening with a dicussion like
this which seems currently a bit contentious are too distressing.

This is all extremely demotivating. :`(

Thanks,
Guillem