Delta debs repository layout v2

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Delta debs repository layout v2

Julian Andres Klode-4
This is an advanced proposal for integrating deltas into the archive. It has
been designed after some feedback and some more research.

Criteria
========
* We don't want to have Deltas indexes like Packages files
  -> we are unlikely to need most of the deltas
  -> about same size as Packages file (when both compressed)
* We don't want to try to fetch deltas for each upgrade. We'll only be generating
  deltas for about 2/3rd of all upgrades (as far as current results tell us.
* We don't want to have untrusted deltas/files
* We don't want to generate/verify too many signatures

Terminology
===========
* upgrade: A tuple of two .deb; an old one, and a new one.
* delta: A deb containing only ddelta files or similar instead of full files
* bloom filter: A bitarray of $n$ bytes indexed by $k$ hash functions. Has
  no false negatives, but might have false positives.

Layout
======
This approach tries to solve the problem by introducing per-source-package
Delta indices in the pool. This solves the problem of having to download meta
info about all deltas in the archive.

* pool/$component/$prefix/$sourcename/Deltas

  An inline signed index containing, for each delta for that source package,
  the following information:

    * Package   - the name of the package
    * Old-ID    - the id of the package being upgraded from
    * New-ID    - the id of the version being upgraded to
    * Size      - the size of the delta
    * SHA256    - the hash of the delta
    * Algorithm - algorithm used in deltas, currently ddelta

  Including the algorithm is essential, so we can switch algorithms at a later
  point and have old dpkg versions download a delta they understand or no delta.

  Deltas are stored with a fixed pathname to avoid duplicating information:

    pool/$component/$prefix/$sourcename/$binaryname_$oldid_$newid_$algorith.deltadeb

  The concrete layout of the Deltas file is unspecified so far. A minimal solution is:

    SHA256:
     8a09ba1ca92fe5405de7bd90fb06a1ce6cc502917df88f2990d3701a6cc75fa1 590012 apt_1e9bfd18a899677e70fda42f5192fd169c4a778d7c7426016d1bf0a0d9b5dbc7_1955901c3d7dd72c451274fa72f42539e573a9e883c4e05ad283e2a30a62f13b_ddelta.deltadeb

  this encodes all the information in a fairly minimal way (thus we don't need any compression),
  and is compatible with the existing Release file format. Note that we will know the name(s) of the file's we want
  to download beforehand, we just need to figure out if they exist and get signed hashes
  and sizes for them.

  The package name is in fact optional, but it might be worth including it to be able
  to figure out which package the delta belongs to.

  It might be worth including the distribution in the filename, and call it
  e.g. Deltas.bookworm-security instead of Deltas. That said, there might not be
  enough releases at a time for it to be worth it. NEEDSINVESTIGATION

However, if we only produce deltas for 2/3 of the upgrades, we still have 1/3 of
false positives for which we will needlessly download Deltas indices. Depending on
the connection's latency, this might be bad. To solve this, we can introduce a new
file:

* dists/$dist/$component/binary-$arch/Deltas.bloom

  A bloom filter of size $n$ bits. For each delta from $old_id to
  $new_id, construct a delta id $old_id | $new_id (the concatenation). Split
  up the delta_id into chunks of $b$ bits. Use each $chunk \mod n$ as an index
  into the bloom filter.

  It seems reasonable to expect us to achieve a bloom filter with a false
  positive rate of about 1%. Experimental results for xenial-updates/main/amd64 have
  shown a 0.75% false positive rate for a bloom filter of $n=98317$ bits
  and $b=32$ bit chunks for 9132 accepted out of 13500 total deltas (that's about
  12 KB for a Packages.xz of 800 KB).

  This does not include the delta algorithm, hence it's probably best not to have
  deltas with different algorithms for the same upgrade.

  The bloom filter is signed by the InRelease file.

Delta upgrade process
=====================

update
------
Acquire Deltas.bloom for each Packages file

upgrade/install - download
--------------------------
  for each upgrade in changes:
    if upgrade in Deltas.bloom and applicable(upgrade):
      queue upgrade's Deltas index
    else:
      queue full deb

  for each deltas index:
    if downloaded and delta in index:
      queue delta
    else:
      queue deb

  for each delta:
    if downloaded failed:
      queue deb

where applicable(upgrade):
  """Verify that an upgrade is applicable.

  For an upgrade to be applicable, the files on the file system must match the files
  shipped in the package. A reasonable approximation is to ensure that the file has
  not been modified later than the install time and that the size is the same."""

  for each file in installed_version:
    if file.mtime >= installed_version.install_time or file.mtime > file.ctime or file.size != expected_size:
      return False
  return True


Other decisions
===============
* ddelta should grow a CRC for data blocks it copies/diffs so it does not silently
  succeed given a broken "old" file

More TODO
=========
Gotta update the wiki page with this stuff.

--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en