.deb format: let's use 0.939, zstd, drop bzip2

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

.deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
Hi!
I've recently did some research on how can we improve the speed of unpacking
packages.  There's a lot of other stages that can be improved, but let's
talk about the .deb format.

First, the 0.939 format, as described in "man deb-old".  While still being
accepted by dpkg, it had been superseded before even the very first stable
release.  Why?  It has at least two upsides over 2.0:

* there's no 10¹⁰ bytes (~9.31GB) limit
  While no package this big is in the archive _yet_ (max being 1⎖652⎖244⎖360
  bytes), both storage sizes and software bloat grow pretty fast, thus we'll
  break this barrier in a few years.  And there's a world outside the
  official archive -- I bet someone already has been burned by this limit.
* it's faster by a small but non-negligible factor.  A task "unpack all
  packages in default XFCE GUI install" gets done by stock dpkg (after
  repacking everything as gzip) 3% faster.

Obviously, 3% is not worth fighting for, but as the size limit needs fixing
anyway...

Alas, while current dpkg handles 0.939 archives well, it supports only two
compressors: gzip and cat.  Neither of them is adequate these days.  Thus,
we'd need to enable others -- which means not being able to unpack new .debs
with old dpkg.  Barring ugly versioned pre-depends on dpkg, that'd require
waiting two release cycles.

So let's pick compressors to enable.  For compression ratio, xz still wins
(at least among popular compressors).  But there's a thing to say about
zstd: firefox.deb zstd -19 takes to unpack:
* 2.644s .xz, stock dpkg
* 2.532s .xz, my tool (libarchive based)
* 0.290s .zst, my tool
* 0.738s .gz, stock dpkg
* 0.729s .gz 0.939, stock dpkg
File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.

XFCE install total: 723M xz, 773M zstd, 963M gzip.

Thus, even though we'd want to stick with xz for the official archive, speed
gains from zstd are so massive that it's tempting to add support for it,
at least for non-official uses, possibly also for common Build-Depends.
The usual objection, "we don't want to bloat the Essential set" doesn't hold
water because 1. libzstd is already a part of the Required set in Buster,
2. a non-default compressor can be dlopened.

Thoughts?

But, the dlopen idea shows a potential victim: bzip2.  Let's kill it.

Stats for Buster's packages:

.deb format:
2.0:    100%

control:
gz      11671
xz      45210

data:
gz      966
xz      55915

With not a single package in the archive still using bz2, removing support
would be reasonable.  It'd be okay to give a clear error message telling the
user to install libbz2-1.0 (dlopen) or bzip2 (pipe) -- so folks can still
unpack historic .debs if need be.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ .globl _start↵.data↵rc: .ascii "/etc/init.d/rcS\0"↵.text↵_start
⣾⠁⢰⠒⠀⣿⡁ mov $57,%rax↵syscall↵cmp $0,%rax↵jne child↵parent:↵mov $61,%rax
⢿⡄⠘⠷⠚⠋⠀ mov $-1,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall↵jmp parent↵child:
⠈⠳⣄⠀⠀⠀⠀ mov $59,%rax↵mov $rc,%rdi↵xor %rsi,%rsi↵xor %rdx,%rdx↵syscall

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Sam Hartman-3
>>>>> "Adam" == Adam Borowski <[hidden email]> writes:

    Adam> Hi!  I've recently did some research on how can we improve the
    Adam> * there's no 10¹⁰ bytes (~9.31GB) limit While no package this
    Adam> big is in the archive _yet_ (max being 1⎖652⎖244⎖360 bytes),
    Adam> both storage sizes and software bloat grow pretty fast, thus
    Adam> we'll break this barrier in a few years.  And there's a world
    Adam> outside the official archive -- I bet someone already has been
    Adam> burned by this limit.

I have.

I have not evaluated the rest of your proposal, but I can say that
outside the official archive the size limit is a real problem today.
It's not critical; I work around it, but it does hurt and will hurt more
as time increases.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Xavier Guimard-3
In reply to this post by Adam Borowski-3
Le 08/05/2019 à 19:38, Adam Borowski a écrit :

> Hi!
> I've recently did some research on how can we improve the speed of unpacking
> packages.  There's a lot of other stages that can be improved, but let's
> talk about the .deb format.
>
> First, the 0.939 format, as described in "man deb-old".  While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release.  Why?  It has at least two upsides over 2.0:
>
> * there's no 10¹⁰ bytes (~9.31GB) limit
>   While no package this big is in the archive _yet_ (max being 1⎖652⎖244⎖360
>   bytes), both storage sizes and software bloat grow pretty fast, thus we'll
>   break this barrier in a few years.  And there's a world outside the
>   official archive -- I bet someone already has been burned by this limit.
> * it's faster by a small but non-negligible factor.  A task "unpack all
>   packages in default XFCE GUI install" gets done by stock dpkg (after
>   repacking everything as gzip) 3% faster.
>
> Obviously, 3% is not worth fighting for, but as the size limit needs fixing
> anyway...
>
> Alas, while current dpkg handles 0.939 archives well, it supports only two
> compressors: gzip and cat.  Neither of them is adequate these days.  Thus,
> we'd need to enable others -- which means not being able to unpack new .debs
> with old dpkg.  Barring ugly versioned pre-depends on dpkg, that'd require
> waiting two release cycles.
>
> So let's pick compressors to enable.  For compression ratio, xz still wins
> (at least among popular compressors).  But there's a thing to say about
> zstd: firefox.deb zstd -19 takes to unpack:
> * 2.644s .xz, stock dpkg
> * 2.532s .xz, my tool (libarchive based)
> * 0.290s .zst, my tool
> * 0.738s .gz, stock dpkg
> * 0.729s .gz 0.939, stock dpkg
> File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.
>
> XFCE install total: 723M xz, 773M zstd, 963M gzip.
>
> Thus, even though we'd want to stick with xz for the official archive, speed
> gains from zstd are so massive that it's tempting to add support for it,
> at least for non-official uses, possibly also for common Build-Depends.
> The usual objection, "we don't want to bloat the Essential set" doesn't hold
> water because 1. libzstd is already a part of the Required set in Buster,
> 2. a non-default compressor can be dlopened.
>
> Thoughts?
>
> But, the dlopen idea shows a potential victim: bzip2.  Let's kill it.
>
> Stats for Buster's packages:
>
> .deb format:
> 2.0:    100%
>
> control:
> gz      11671
> xz      45210
>
> data:
> gz      966
> xz      55915
>
> With not a single package in the archive still using bz2, removing support
> would be reasonable.  It'd be okay to give a clear error message telling the
> user to install libbz2-1.0 (dlopen) or bzip2 (pipe) -- so folks can still
> unpack historic .debs if need be.
>
> Meow!

Hi,

devscripts MR!122[1] proposes to add Zstandard support to uscan.

Cheers,
Xavier

https://salsa.debian.org/debian/devscripts/merge_requests/122

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adrian Bunk-3
In reply to this post by Adam Borowski-3
On Wed, May 08, 2019 at 07:38:26PM +0200, Adam Borowski wrote:

>...
> So let's pick compressors to enable.  For compression ratio, xz still wins
> (at least among popular compressors).  But there's a thing to say about
> zstd: firefox.deb zstd -19 takes to unpack:
> * 2.644s .xz, stock dpkg
> * 2.532s .xz, my tool (libarchive based)
> * 0.290s .zst, my tool
> * 0.738s .gz, stock dpkg
> * 0.729s .gz 0.939, stock dpkg
> * 0.729s .gz 0.939, stock dpkg
> File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.
>
> XFCE install total: 723M xz, 773M zstd, 963M gzip.
>
> Thus, even though we'd want to stick with xz for the official archive, speed
> gains from zstd are so massive that it's tempting to add support for it,
> at least for non-official uses, possibly also for common Build-Depends.
>...

Is this single-threaded or parallel?

pbzip2 decompression speed scales nicely with the number of CPUs,
and in general for anyone interested in massive speed gains the
way forward would be towards parallel decompression.

> But, the dlopen idea shows a potential victim: bzip2.  Let's kill it.
>
> Stats for Buster's packages:
>
> .deb format:
> 2.0:    100%
>
> control:
> gz      11671
> xz      45210
>
> data:
> gz      966
> xz      55915
>
> With not a single package in the archive still using bz2,

You were only looking at binary packages,
for source packages bz2 is still pretty common.

> removing support
> would be reasonable.  It'd be okay to give a clear error message telling the
> user to install libbz2-1.0 (dlopen) or bzip2 (pipe) -- so folks can still
> unpack historic .debs if need be.

It would be neither reasonable nor okay to create such hassle for users
for no benefits at all.

And if the tiny 75 kB libbz2 would be considered a problem,
the huge 650 kB libzstd would obviously never be an option
for packages in the archive.

cu
Adrian

--

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Martin Steigerwald
Adrian Bunk - 08.05.19, 21:45:

> On Wed, May 08, 2019 at 07:38:26PM +0200, Adam Borowski wrote:
> >...
> >
> > So let's pick compressors to enable.  For compression ratio, xz
> > still wins (at least among popular compressors).  But there's a
> > thing to say about zstd: firefox.deb zstd -19 takes to unpack:
> > * 2.644s .xz, stock dpkg
> > * 2.532s .xz, my tool (libarchive based)
> > * 0.290s .zst, my tool
> > * 0.738s .gz, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.
> >
> > XFCE install total: 723M xz, 773M zstd, 963M gzip.
> >
> > Thus, even though we'd want to stick with xz for the official
> > archive, speed gains from zstd are so massive that it's tempting to
> > add support for it, at least for non-official uses, possibly also
> > for common Build-Depends.>
> >...
>
> Is this single-threaded or parallel?
>
> pbzip2 decompression speed scales nicely with the number of CPUs,
> and in general for anyone interested in massive speed gains the
> way forward would be towards parallel decompression.

Or lbzip2, in quite old tests with my packbench ruby script lbzip scaled
better than pbzip on an Intel hexacore system. These were published in
an issue of german Linux User magazine. As I had not multicore laptop
back then, I was not able to the measurements myself.

Or pxz.

[1] https://martin-steigerwald.de/computer/programme/packbench/
index.html (I did not yet re-upload source repo to Gitlab or so, but
tarballs are available. Its outdated as well. I did not test whether it
works with the current Ruby version)

Thanks,
--
Martin


Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ansgar Burchardt-8
In reply to this post by Adam Borowski-3
Adam Borowski writes:
> I've recently did some research on how can we improve the speed of unpacking
> packages.  There's a lot of other stages that can be improved, but let's
> talk about the .deb format.
>
> First, the 0.939 format, as described in "man deb-old".  While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release.  Why?  It has at least two upsides over 2.0:

Switching to a different binary format will break various tools.  If we
want to do this, I wonder if we shouldn't take the chance to move away
from tar?

We have various applications that only want to extract single members of
the package (changelog, NEWS, copyright, ...); tar is a really bad
format for such an operation.  Other formats (zip, 7z, ...) are more
suited for them.

Ansgar

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ian Jackson-2
In reply to this post by Adam Borowski-3
(adding debian-dpkg)

Adam Borowski writes (".deb format: let's use 0.939, zstd, drop bzip2"):
> First, the 0.939 format, as described in "man deb-old".  While still being
> accepted by dpkg, it had been superseded before even the very first stable
> release.  Why?  It has at least two upsides over 2.0:

What an interesting proposal.  I don't think I agree, but:

> * there's no 10¹⁰ bytes (~9.31GB) limit
>   While no package this big is in the archive _yet_ (max being 1⎖652⎖244⎖360
>   bytes), both storage sizes and software bloat grow pretty fast, thus we'll
>   break this barrier in a few years.  And there's a world outside the
>   official archive -- I bet someone already has been burned by this limit.

This is a problem.

> * it's faster by a small but non-negligible factor.  A task "unpack all
>   packages in default XFCE GUI install" gets done by stock dpkg (after
>   repacking everything as gzip) 3% faster.

I'm not sure why it should be faster.

As the person who deprecated deb-old in favour of the current format,
my motives were:
 - the old format was a real pain to unpack without a custom utility
    (this used to be a much more serious problem)
 - the old format was not very extensible.

Debian doesn't really use much of the extensibility.  Some people
invented a .deb signing system which put signatures in there too but I
don't think any such things are deployed.

We use the extensibility for compression format changes, but
compressors all have magic numbers and we could just use those.

It would be much less convenient to change our archive format from tar
to something else, as proposed by Ansgar, without this extensibility.
(I don't necessarily think Ansgar's idea is a good one, but it makes
an example here.)

As for the size limit, this was discussed in May 2016:
  https://lists.debian.org/debian-dpkg/2016/05/msg00027.html

(I can't find a bug about it, though).  I made a proposal.
No decision was made and nothing was done, unfortunately.

Ian.


--
Ian Jackson <[hidden email]>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Jeremy Stanley
In reply to this post by Ansgar Burchardt-8
On 2019-05-08 22:35:58 +0200 (+0200), Ansgar wrote:

> Adam Borowski writes:
> > I've recently did some research on how can we improve the speed of unpacking
> > packages.  There's a lot of other stages that can be improved, but let's
> > talk about the .deb format.
> >
> > First, the 0.939 format, as described in "man deb-old".  While still being
> > accepted by dpkg, it had been superseded before even the very first stable
> > release.  Why?  It has at least two upsides over 2.0:
>
> Switching to a different binary format will break various tools.  If we
> want to do this, I wonder if we shouldn't take the chance to move away
> from tar?
>
> We have various applications that only want to extract single members of
> the package (changelog, NEWS, copyright, ...); tar is a really bad
> format for such an operation.  Other formats (zip, 7z, ...) are more
> suited for them.
Are you talking about source packages or binary packages here? The
latter use ar, not tar.
--
Jeremy Stanley

signature.asc (981 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Mike Hommey
On Wed, May 08, 2019 at 09:04:49PM +0000, Jeremy Stanley wrote:

> On 2019-05-08 22:35:58 +0200 (+0200), Ansgar wrote:
> > Adam Borowski writes:
> > > I've recently did some research on how can we improve the speed of unpacking
> > > packages.  There's a lot of other stages that can be improved, but let's
> > > talk about the .deb format.
> > >
> > > First, the 0.939 format, as described in "man deb-old".  While still being
> > > accepted by dpkg, it had been superseded before even the very first stable
> > > release.  Why?  It has at least two upsides over 2.0:
> >
> > Switching to a different binary format will break various tools.  If we
> > want to do this, I wonder if we shouldn't take the chance to move away
> > from tar?
> >
> > We have various applications that only want to extract single members of
> > the package (changelog, NEWS, copyright, ...); tar is a really bad
> > format for such an operation.  Other formats (zip, 7z, ...) are more
> > suited for them.
>
> Are you talking about source packages or binary packages here? The
> latter use ar, not tar.

Binary packages use both.

$ ar t /var/cache/apt/archives/libgcc-9-dev_9.1.0-1_amd64.deb
debian-binary
control.tar.xz
data.tar.xz

Mike

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ansgar Burchardt-8
In reply to this post by Jeremy Stanley
Jeremy Stanley writes:

> On 2019-05-08 22:35:58 +0200 (+0200), Ansgar wrote:
>> Switching to a different binary format will break various tools.  If we
>> want to do this, I wonder if we shouldn't take the chance to move away
>> from tar?
>>
>> We have various applications that only want to extract single members of
>> the package (changelog, NEWS, copyright, ...); tar is a really bad
>> format for such an operation.  Other formats (zip, 7z, ...) are more
>> suited for them.
>
> Are you talking about source packages or binary packages here? The
> latter use ar, not tar.

I'm talking about binary packages (*.deb).  They currently use tar
archives (control.tar.*, data.tar.*) packed in an ar archive.

Ansgar

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
In reply to this post by Adrian Bunk-3
On Wed, May 08, 2019 at 10:45:21PM +0300, Adrian Bunk wrote:

> On Wed, May 08, 2019 at 07:38:26PM +0200, Adam Borowski wrote:
> > So let's pick compressors to enable.  For compression ratio, xz still wins
> > (at least among popular compressors).  But there's a thing to say about
> > zstd: firefox.deb zstd -19 takes to unpack:
> > * 2.644s .xz, stock dpkg
> > * 2.532s .xz, my tool (libarchive based)
> > * 0.290s .zst, my tool
> > * 0.738s .gz, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > * 0.729s .gz 0.939, stock dpkg
> > File sizes being 60628216 gz, 47959544 zstd, 44506304 xz.
> >
> > XFCE install total: 723M xz, 773M zstd, 963M gzip.
> >
> > Thus, even though we'd want to stick with xz for the official archive, speed
> > gains from zstd are so massive that it's tempting to add support for it,
> > at least for non-official uses, possibly also for common Build-Depends.
> >...
>
> Is this single-threaded or parallel?

This one was single-threaded (AKA: all cores were available to the
decompressor, running dpkg-deb without any arguments).

Most other tests were on a fully loaded processor, with one (hyper-)thread
available per task.

> pbzip2 decompression speed scales nicely with the number of CPUs,
> and in general for anyone interested in massive speed gains the
> way forward would be towards parallel decompression.

Just tested pbzip2 on its own, without dpkg, commandline being:
    time (pbzip2 -cdfr <TARBALL|tar xf -)
* 5.018s
File size: 54841864 (without ar and control).

So it's incredibly slow for very weak compression.

> > But, the dlopen idea shows a potential victim: bzip2.  Let's kill it.
> >
> > Stats for Buster's packages:
> > .deb format:
> >
> > With not a single package in the archive still using bz2,
>
> You were only looking at binary packages,
> for source packages bz2 is still pretty common.

Well yeah, but that's dpkg-dev, where size of the toolchain matters little.
I don't think anyone is going to build packages on a machine without
adequate storage.  On the other hand, runtime often means a tiny router or a
massively oversubscribed container hosting.

But my main point was not to help bitty boxes, but to slow the growth of
bloat somehow.  When we add libraries, it's good to retire outdated ones
sometimes.

> > removing support
> > would be reasonable.  It'd be okay to give a clear error message telling the
> > user to install libbz2-1.0 (dlopen) or bzip2 (pipe) -- so folks can still
> > unpack historic .debs if need be.
>
> It would be neither reasonable nor okay to create such hassle for users
> for no benefits at all.
>
> And if the tiny 75 kB libbz2 would be considered a problem,
> the huge 650 kB libzstd would obviously never be an option
> for packages in the archive.

It's already in, thus the effective cost is not 650kB but 0.  On the other
hand, the utility of libbz2 is only unpacking very old .debs.  That's
something useful, but in no way needed on every machine.

I just checked Stretch: not a single .bz2, either control nor data.  I'm not
going to download all of Jessie just to check -- but even assuming something
was left by Jessie's time, by Bullseye trying to install such a .deb will
mean mixing packages 3 releases apart.

Also, many other tools keep depending on libbz2, so it'll likely remain
present on most systems (even if gpgv (transitively-Required) also drops the
dependency).  And if it declines in popularity -- it'll likely remain in
the archive for a long long time, just like ncompress and arj do.
Compressors are easy to keep on life support, and important enough that
none which have seen some real use would be dropped.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity.  You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so.  I recommend Skepticism
⠈⠳⣄⠀⠀⠀⠀ (funeral doom metal).

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
In reply to this post by Ansgar Burchardt-8
On Wed, May 08, 2019 at 10:35:58PM +0200, Ansgar wrote:

> Adam Borowski writes:
> > I've recently did some research on how can we improve the speed of unpacking
> > packages.  There's a lot of other stages that can be improved, but let's
> > talk about the .deb format.
> >
> > First, the 0.939 format, as described in "man deb-old".  While still being
> > accepted by dpkg, it had been superseded before even the very first stable
> > release.  Why?  It has at least two upsides over 2.0:
>
> Switching to a different binary format will break various tools.

The 0.939 format is already supported by most tools.

No one sane digs into insides of the format, using a small number of
low-level tools, thus we can reuse it with little effort.

Of course, adding a new compressor _does_ break compat, but we added four
compressors to 2.0 over the years already, and the sky didn't fall.

> If we want to do this, I wonder if we shouldn't take the chance to move
> away from tar?

Any seekable format significantly reduces compression, although this can
be reduced by managing split points.

> We have various applications that only want to extract single members of
> the package (changelog, NEWS, copyright, ...); tar is a really bad
> format for such an operation.  Other formats (zip, 7z, ...) are more
> suited for them.

Perhaps such files could be considered metadata and moved to the control
tarball?  Or merely just moved forward -- remember that tarballs are
unordered.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢰⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ NADIE anticipa la inquisición de españa!
⠈⠳⣄⠀⠀⠀⠀

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Paul Wise via nm
In reply to this post by Adam Borowski-3
On Thu, May 9, 2019 at 1:38 AM Adam Borowski wrote:

> Thus, even though we'd want to stick with xz for the official archive, speed
> gains from zstd are so massive that it's tempting to add support for it,
> at least for non-official uses, possibly also for common Build-Depends.

Could we use custom zstd dictionaries on a per-architecture basis to
further reduce the size of zstd packages, possibly allowing it to beat
xz?

--
bye,
pabs

https://wiki.debian.org/PaulWise

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Michael Stone-2
On Thu, May 09, 2019 at 07:37:55AM +0800, Paul Wise wrote:
>On Thu, May 9, 2019 at 1:38 AM Adam Borowski wrote:
>
>> Thus, even though we'd want to stick with xz for the official archive, speed
>> gains from zstd are so massive that it's tempting to add support for it,
>> at least for non-official uses, possibly also for common Build-Depends.
>
>Could we use custom zstd dictionaries on a per-architecture basis to
>further reduce the size of zstd packages, possibly allowing it to beat
>xz?

In theory, sure. Have any test results? My gut tells me that wouldn't
buy much but numbers matter more.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Andrey Rahmatullin-3
In reply to this post by Adam Borowski-3
On Thu, May 09, 2019 at 12:02:26AM +0200, Adam Borowski wrote:
> > We have various applications that only want to extract single members of
> > the package (changelog, NEWS, copyright, ...); tar is a really bad
> > format for such an operation.  Other formats (zip, 7z, ...) are more
> > suited for them.
>
> Perhaps such files could be considered metadata
Yes please.
I think there are various proposal about this, though they IIRC are mostly
about not putting them into /usr/share but into a more suitable location.

--
WBR, wRAR

signature.asc (911 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Ansgar Burchardt-8
In reply to this post by Adam Borowski-3
Adam Borowski writes:

> On Wed, May 08, 2019 at 10:35:58PM +0200, Ansgar wrote:
>> Adam Borowski writes:
>> > I've recently did some research on how can we improve the speed of unpacking
>> > packages.  There's a lot of other stages that can be improved, but let's
>> > talk about the .deb format.
>> >
>> > First, the 0.939 format, as described in "man deb-old".  While still being
>> > accepted by dpkg, it had been superseded before even the very first stable
>> > release.  Why?  It has at least two upsides over 2.0:
>>
>> Switching to a different binary format will break various tools.
>
> The 0.939 format is already supported by most tools.
>
> No one sane digs into insides of the format, using a small number of
> low-level tools, thus we can reuse it with little effort.
>
> Of course, adding a new compressor _does_ break compat, but we added four
> compressors to 2.0 over the years already, and the sky didn't fall.

Well, it causes minor breakage which is fairly easy to fix.  A different
container format instead of tar would require more involved changes in
tools, so I'm not 100% convinced of my idea myself ;-)  The thread just
looked like the right time to consider such changes.

>> If we want to do this, I wonder if we shouldn't take the chance to move
>> away from tar?
>
> Any seekable format significantly reduces compression, although this can
> be reduced by managing split points.

Well, depending on how much splitting you do, the loss in compression
should be small enough to not care about?

>> We have various applications that only want to extract single members of
>> the package (changelog, NEWS, copyright, ...); tar is a really bad
>> format for such an operation.  Other formats (zip, 7z, ...) are more
>> suited for them.
>
> Perhaps such files could be considered metadata and moved to the control
> tarball?  Or merely just moved forward -- remember that tarballs are
> unordered.

I don't think that is a good idea: if someone wants to use another file
in a similar way, he couldn't and would have to fall back to the worse
solution.

As an example: I have a config-diff script which compares conffiles with
the pristine version included in the *.deb; it wants to access /etc/*.
(Though ideally dpkg would keep the pristine version accessible below
/usr; that would also be useful for other uses.)

Also dpkg keeps metadata in /var, but changelogs, NEWS, copyright
documentation isn't variable state data and should be below /usr...  The
same is really true for lists of files and maintainer scripts though.

Ansgar

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Mike Hommey
In reply to this post by Michael Stone-2
On Wed, May 08, 2019 at 09:01:27PM -0400, Michael Stone wrote:

> On Thu, May 09, 2019 at 07:37:55AM +0800, Paul Wise wrote:
> > On Thu, May 9, 2019 at 1:38 AM Adam Borowski wrote:
> >
> > > Thus, even though we'd want to stick with xz for the official archive, speed
> > > gains from zstd are so massive that it's tempting to add support for it,
> > > at least for non-official uses, possibly also for common Build-Depends.
> >
> > Could we use custom zstd dictionaries on a per-architecture basis to
> > further reduce the size of zstd packages, possibly allowing it to beat
> > xz?
>
> In theory, sure. Have any test results? My gut tells me that wouldn't buy
> much but numbers matter more.

Another option is to use filters like xz does.
e.g.
https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/x86.c;h=0b14807e900cdf4a85dc513c281892c2309bb454;hb=4ed339606156bd313ed99237485cb8ed0362d64f
https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/arm.c;h=181d0e3b223220fa24cb5feb638b231357326905;hb=4ed339606156bd313ed99237485cb8ed0362d64f
https://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/armthumb.c;h=eab4862dd76dd03d4c0a0d8e4af7385866d9197d;hb=4ed339606156bd313ed99237485cb8ed0362d64f
etc.

Mike

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Anthony DeRobertis
In reply to this post by Adam Borowski-3


On May 8, 2019 9:43:50 PM UTC, Adam Borowski <[hidden email]> wrote:

>I just checked Stretch: not a single .bz2, either control nor data.
>I'm not
>going to download all of Jessie just to check -- but even assuming
>something
>was left by Jessie's time, by Bullseye trying to install such a .deb
>will
>mean mixing packages 3 releases apart.

dpkg-deb is used to examine debs too, and considering Jessie is still LTS and Wheezy is ELTS, you may well want to examine packages from several releases ago on a current system. I have a weird case at work where I need to examine packages from as far back as Sarge and Etch through Buster, but I'd fully understand not supporting that.

Some local packages can be long-lived, too. E.g., at work I have one that installs an internal CA. That package hasn't needed changing in a while, it drops a file and calls update-ca-certificates. Wouldn't be a huge deal to rebuild it, of course.

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
On Thu, May 09, 2019 at 08:10:00AM +0000, Anthony DeRobertis wrote:

> On May 8, 2019 9:43:50 PM UTC, Adam Borowski <[hidden email]> wrote:
>
> >I just checked Stretch: not a single .bz2, either control nor data.  I'm
> >not going to download all of Jessie just to check -- but even assuming
> >something was left by Jessie's time, by Bullseye trying to install such a
> >.deb will mean mixing packages 3 releases apart.
>
> dpkg-deb is used to examine debs too, and considering Jessie is still LTS
> and Wheezy is ELTS, you may well want to examine packages from several
> releases ago on a current system.  I have a weird case at work where I
> need to examine packages from as far back as Sarge and Etch through
> Buster, but I'd fully understand not supporting that.

Yeah -- and on any non-minimal system, unpacking such debs would work
without any action (be it via dlopen or exec|pipe).  libbz2 has enough
dependencies that it won't get out of default installs anytime soon.  On
minimal systems you'd get an error message telling you to install the
optional library.

> Some local packages can be long-lived, too.  E.g., at work I have one that
> installs an internal CA.  That package hasn't needed changing in a while,
> it drops a file and calls update-ca-certificates.  Wouldn't be a huge deal
> to rebuild it, of course.

You can "ar t foobar.deb" to see what kind of compression it uses.  I don't
think bzip2 was ever the default, though -- thus it's likely to happen only
in largest or overoptimized packages.


Meow!
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀

Reply | Threaded
Open this post in threaded view
|

Re: .deb format: let's use 0.939, zstd, drop bzip2

Adam Borowski-3
In reply to this post by Ansgar Burchardt-8
On Thu, May 09, 2019 at 09:22:56AM +0200, Ansgar wrote:
> Also dpkg keeps metadata in /var, but changelogs, NEWS, copyright
> documentation isn't variable state data and should be below /usr...  The
> same is really true for lists of files and maintainer scripts though.

It's a mess:

* Most of the control tarball (to be exact: every file other than "control")
  goes to /var/lib/dpkg/info/$PACKAGE.$FILENAME; they're all (I've verified
  across all .debs in Buster) plain files of either mode 644 or 755.
* Except for "control" which is sort of concatenated and dumped into
  /var/lib/dpkg/status, with "Status:" added.
* On the other hand, /var/lib/dpkg/info/$PACKAGE.list is generated from
  the list of files in the data tarball.

Knowing $PACKAGE requires reading the control tarball: if Multi-Arch is
"same" (and no other value), $PACKAGE is "Package:Architecture", "Package"
otherwise.

The data tarball is unpacked into the filesystem mostly as-is, but you still
need to obey diverts, replaces, and symlinks.


Meow!

(I'm reverse-engineering dpkg instead of reading the specs on purpose: in
order to change the spec, what matters is current practice rather than the
letter of documentation, often written 25 years ago before we settled on a
subset of features.)
--
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Did ya know that typing "test -j8" instead of "ctest -j8"
⢿⡄⠘⠷⠚⠋⠀ will make your testsuite pass much faster, and fix bugs?
⠈⠳⣄⠀⠀⠀⠀

123