Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Florian Zumbiehl-2
Package: e2fsprogs
Version: 1.43.4-2
Severity: critical

e2fsck potentially moves blocks around in sparse files when running with
-E bmap2extent, in particular when the blocks are physically adjacent on
the underlying block device, but have a hole in between in the file.

This script reproduces the problem on my system (run as root in an empty
directory!):

#!/bin/bash -ex
rm -f image
mkdir -p m
umount m || true
dd if=/dev/zero bs=1M seek=99 count=1 of=image
mkfs.ext3 -E nodiscard -b 4096 image
mount -o loop image m
echo -n a > m/x
echo -n b | dd of=m/x bs=1 seek=8k
ls -i m/x
ino=$(stat -c%i m/x)
sha1sum m/x
umount m
debugfs image -R "stat <$ino>" | cat
tune2fs -O extent image
e2fsck -fE bmap2extent image || true
debugfs image -R "stat <$ino>" | cat
mount -o loop image m
sha1sum m/x
umount m
rm image
rmdir m

The e2fsck invocation turns this part of the debugfs output:

BLOCKS:
(0):24576, (2):24577

into this:

EXTENTS:
(0-1):24576-24577

With version 1.44.5-1~bpo9+1, the test above does not produce corruption on
my system, but I have not investigated whether that is just coincidence or
because the bug has been fixed. As this silently corrupts files, I would
think it should get some fix in stretch, and be it by disabling the
feature.

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Florian Zumbiehl-2
Hi,

> It's an unfortunate bug, but we've lived with it for about ten years,
> and it was only noticed in 2017.  The reality is very few people
> actually try to convert an indirect mapped file system the new
> extent-mapped system, because you get much more of the benefits of
> using a more modern version of ext4 by doing a backup, reformat, and
> restore of the file system.
>
> Given that so few people have noticed, I don't believe this qualifies
> as even "important", since doing the conversion is not a fundamental
> aspect of e2fsck, so it doesn't qualify as a significant impact on the
> usability of the package.  For similar reasons, I don't believe the
> release team would agree to a new release of e2fsprogs to stretch ---
> especially since (a) it's fixed in buster, (b) the fix is also
> available in stretch-backports, and (c) in three weeks, Stretch will
> become oldstable and Buster will become the new stable release of
> Debian.

Now, I would agree that the *functionality* isn't important, but what I
don't agree with is that a bug that silently corrupts files isn't.

For one, you write that the bug was only noticed in 2017. But really, the
bug was only *found* in 2017--I doubt you have any idea how many people
*noticed* the bug, that is to say, its effects, but weren't able to figure
out what the cause was? If a bug just makes fsck crash, that's bad, but at
least the data is still fine. If a bug destroys the filesystem, that's bad,
but at least you notice, do a restore, and everything is fine.

But this corrupts files in a way that you don't necessarily notice: fsck
doesn't complain, it often will corrupt few enough files that you don't
notice immediately, once you do notice, you may not make the connection
that that conversion that you did possibly months or years ago was the
reason, and you may not have old enough backups anymore to restore from or
to compare against. So, not only do you not know how many people noticed
effects of the bug but had no clue what the cause was, you know even less
how many people have corrupted and are corrupting their files and have yet
to find out.

That fixed versions are available in buster and as a backport is of no use
to anyone walking into this trap who doesn't know that they are about to
corrupt their files. As far as I am concerned, that is more than enough
reason to not fix the conversion in stretch--but please, at least make
e2fsck bail out with an error message when someone asks it to corrupt the
file system. That should be an easy and low-risk fix, and if you had done
that when the bug was discovered, you would at least have saved me a lot of
time, as I am still trying to figure out what's been corrupted--and as
noone should be using this version of the code that is almost guaranteed to
corrupt your data, replacing it with an error message is not a regression
either.

Using the backport for the conversion, doing a dump-and-restore instead, or
waiting until buster are all perfectly fine solutions for the problem of
converting the filesystem. But silently corrupting files and hoping the
user will notice is not a sensible solution for letting the user know that
they should not be using the e2fsck in stretch to do the conversion--IMO.

Regards, Florian

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Theodore Y. Ts'o
On Thu, Jun 13, 2019 at 11:19:13PM +0200, Florian Zumbiehl wrote:

>
> That fixed versions are available in buster and as a backport is of no use
> to anyone walking into this trap who doesn't know that they are about to
> corrupt their files. As far as I am concerned, that is more than enough
> reason to not fix the conversion in stretch--but please, at least make
> e2fsck bail out with an error message when someone asks it to corrupt the
> file system. That should be an easy and low-risk fix, and if you had done
> that when the bug was discovered, you would at least have saved me a lot of
> time, as I amh still trying to figure out what's been corrupted--and as
> noone should be using this version of the code that is almost guaranteed to
> corrupt your data, replacing it with an error message is not a regression
> either.

I'm very sorry that you lost data.

The problem is not the difficulty in backporting the change --- the
fix is really just a one-line change.  Most of the complexity in the
commit is in adding regression test.

The problem is actually making any kind of change to stretch at this
point.  The release team strongly discourages package updates, unless
the bug is ***really*** serious.  This is especially true for packages
like e2fsck which are built into the installer, since taking an
e2fsprogs package involves respinning the installer.

Given how niche the bmap2extent option is, and combine the judgement
that it's highly unlikely that people conservative enough to use
Debian Obsolete^H^H^H^H^H^H^H Stable would be using an exotic e2fsck
option with how many hoops the Debian Release Team makes people doing
stable updates (I feel actively discouraged from going down that
route) and I feel extremely demotatived towards trying to extract out
just the bug reports and backport them to ancient versions of
e2fsprogs.  Red Hat pays masochists vast amounts money to do that.
It's just not sort of thing that volunteers are likely to do, and if
you look at the very small number of packages that are updated to
Debian stretch (not even all security fixes), compared to what Red Hat
releases in its updates, it's just the way life is.

In my mind, this is why Debian Backports exists.  If you want the
latest bug fixes (including one whhich avoids data corruption when
doing off-line shrinks using resize2fs), and/or you are doing anything
at all exotic, my only recommendation is to use the Debian Backports
version of e2fsprogs.

Bottom line, for *this* particular bug (as opposed to the various
resize2fs shrink fixes) preparing an update to 1.43.4-2 is pretty
simple.  And if I had any optimism that the Release Team would be
willing to take a very late update to Debian Stretch at the point when
it's just about become oldstable, I might even be willing to do the
upload.  But it's not something where I'm feeling particularly
motivated to convince the release team that this is a change they
should be taking, and respinning the Stretch installer at this point.

Best regards,

                                - Ted

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Florian Zumbiehl-2
Hi,

[...]
> In my mind, this is why Debian Backports exists.  If you want the
> latest bug fixes (including one whhich avoids data corruption when
> doing off-line shrinks using resize2fs), and/or you are doing anything
> at all exotic, my only recommendation is to use the Debian Backports
> version of e2fsprogs.

Well, the big problem with that argument is that it presumes people know
what's exotic? The man page doesn't say it's exotic. The program doesn't
warn you that what you are doing is exotic. The only person who kinda knows
what's exotic is you. But you don't do what would be necessary to make
users aware of it. Plus, I don't even see how you would really know, given
that there probably is no telemetry in e2fsck.

Now that I had noticed by pure luck that a file had been corrupted during a
recent conversion, I went back and found files that were corrupted during
conversions that I did four years ago. Now, that was before you were aware
of the bug, so nothing you could have done to prevent that, but it shows
how long the damage caused by this bug can stay undiscovered/how difficult
it is for you to have an accurate view of the impact of this bug.

> Bottom line, for *this* particular bug (as opposed to the various
> resize2fs shrink fixes) preparing an update to 1.43.4-2 is pretty
> simple.

Now, I don't know any details about this other bug, but simply bailing out
in the risky case presumably shouldn't be that difficult either?! I mean, I
can see why you wouldn't want to spend lots of time porting fixes to old
versions that they don't readily apply on, and I wouldn't really see any
value in that anyway. But preventing damage does not require porting the
fix, bailing out does the job as well?!

>          And if I had any optimism that the Release Team would be
> willing to take a very late update to Debian Stretch at the point when
> it's just about become oldstable, I might even be willing to do the
> upload.  But it's not something where I'm feeling particularly
> motivated to convince the release team that this is a change they
> should be taking, and respinning the Stretch installer at this point.

That reads more like you think the bug actually should be fixed, but the
Release Team is likely to create tons of unnecessary work, and possibly
preventing it anyway, that you don't feel like wasting your time with that?

What is the problem with respinning the installer? That sounds like
something that should be automated?! But also, fixing this only in the
non-installer package sure would be better than nothing, if fixing the
installer is a big problem.

Regards, Florian

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Rhonda D'Vine-2
In reply to this post by Theodore Y. Ts'o
    Hi,

* Theodore Ts'o <[hidden email]> [2019-06-14 03:44:03 CEST]:

> On Thu, Jun 13, 2019 at 11:19:13PM +0200, Florian Zumbiehl wrote:
> > That fixed versions are available in buster and as a backport is of no use
> > to anyone walking into this trap who doesn't know that they are about to
> > corrupt their files. As far as I am concerned, that is more than enough
> > reason to not fix the conversion in stretch--but please, at least make
> > e2fsck bail out with an error message when someone asks it to corrupt the
> > file system. That should be an easy and low-risk fix, and if you had done
> > that when the bug was discovered, you would at least have saved me a lot of
> > time, as I amh still trying to figure out what's been corrupted--and as
> > noone should be using this version of the code that is almost guaranteed to
> > corrupt your data, replacing it with an error message is not a regression
> > either.
>
> I'm very sorry that you lost data.
>
> The problem is not the difficulty in backporting the change --- the
> fix is really just a one-line change.  Most of the complexity in the
> commit is in adding regression test.
>
> The problem is actually making any kind of change to stretch at this
> point.  The release team strongly discourages package updates, unless
> the bug is ***really*** serious.  This is especially true for packages
> like e2fsck which are built into the installer, since taking an
> e2fsprogs package involves respinning the installer.

 Bugs that appear in stable have to get fixed in stable.  The release
team doesn't block bugfixes for stable, you are exaggerating here - and
data loss is obviously a serious problem, especially when it is just a
one-line change.  Your statement makes it sound like the stable release
team is a nutcase -- which it clearly isn't.  Please don't imply that.  :/

> In my mind, this is why Debian Backports exists.  If you want the
> latest bug fixes (including one whhich avoids data corruption when
> doing off-line shrinks using resize2fs), and/or you are doing anything
> at all exotic, my only recommendation is to use the Debian Backports
> version of e2fsprogs.

 No.  Debian Backports does *not* exist to get bug fixes onto a stable
system.  Backports exists for offering newer features for a stable
system.  Backports is not the place to fix bugs in stable.  You are
exxagerating both against the purpose of backports and against the
stable release team here, which isn't helpful. :/


> Bottom line, for *this* particular bug (as opposed to the various
> resize2fs shrink fixes) preparing an update to 1.43.4-2 is pretty
> simple.  And if I had any optimism that the Release Team would be
> willing to take a very late update to Debian Stretch at the point when
> it's just about become oldstable, I might even be willing to do the
> upload.

 Oldstable is supposed to be supported for at least a year after the
release of the next stable release, so there is not so much convincing
needed, in my experience and opinion.  It's not like it's close to
end-of-life of stretch, or stretch turning into LTS.  That's still far
in the future.  Can you at least send a mail to debian-release about it
instead of giving up even before trying?  :/

 Thanks,
Rhonda
--
Fühlst du dich mutlos, fass endlich Mut, los      |
Fühlst du dich hilflos, geh raus und hilf, los    | Wir sind Helden
Fühlst du dich machtlos, geh raus und mach, los   | 23.55: Alles auf Anfang
Fühlst du dich haltlos, such Halt und lass los    |

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Theodore Y. Ts'o
On Fri, Jun 28, 2019 at 11:47:46AM +0200, Rhonda D'Vine wrote:
>
>  Bugs that appear in stable have to get fixed in stable.  The release
> team doesn't block bugfixes for stable, you are exaggerating here - and
> data loss is obviously a serious problem, especially when it is just a
> one-line change.  Your statement makes it sound like the stable release
> team is a nutcase -- which it clearly isn't.  Please don't imply that.  :/

There are *very*, *very*, *very* large number of bug-fixes for
e2fsprogs in version 1.43.[3456789].  This isn't actually the first
data corruption bug since 1.43.2.  And this doesn't include the data
corruption bugs found in 1.44.x and 1.45.x that apply to 1.43.2; that
would far more effort to track.

So there's actually a huge problem hiding here, which is that it's
just way too much effort to backport all of the fixes.  The problem is
that the release team isn't willing to take the maintenance branch
updates, but rather wants every single change to be logged as a Debian
bug, and individually extracted as a patch, and applied manually to
debian/patches.

And that's more effort than I as volunteer am willing to do.  Red Hat
charges a ***huge*** sum of money for crazy enterprise customers who
want that kind of loving care of each bug fix being manually curated.
And that's because it's a huge amount of work.

If someone wants to volunteer to do that work, that's great.  But
that's what I would really love to find.  And if they only want to do
it for *this* bug, as opposed to people using resize2fs to shrink file
systems, which is actually far more likely to happen --- although, I'm
not sure how many customers using Debian Obsolete do that sort of
thing --- that's fine.  But when there are a huge number of bug fixes,
it's very hard to bring the motivation to do what Red Hat charges
$$$$$ for its customers to do, when the rest of the open source world
just says, update to the latest !@#!@?! version, already!

            - Ted

P.S.  This isn't an issue just for e2fsprogs.  Even firefox-esr is
only supported for less than a year before the answer is, "Sorry, you
want the new fixes, you have to take some new features."  It's a huge
engineering effort to disentangle bug fixes and backport them to aging
code bases.

Reply | Threaded
Open this post in threaded view
|

Bug#930484: e2fsck corrupts sparse files with -E bmap2extent

Theodore Y. Ts'o
By the way, here are the official criteria from the Debian Developer's
Reference Manual[1]:

   Extra care should be taken when uploading to stable. Basically, a
   package should only be uploaded to stable if one of the following
   happens:

   * a truly critical functionality problem

   * the package becomes uninstallable

   * a released architecture lacks the package

[1] https://www.debian.org/doc/manuals/developers-reference/ch05.en.html#upload-stable

This bug just doesn't qualify.   "critical" is defined as:

    makes unrelated software on the system (or the whole system)
    break, or causes serious data loss, or introduces a security hole
    on systems where you install the package.[2]

Where as the lower-priority bug "grave" is defined as:

    makes the package in question unusable or mostly so, or causes
    data loss, or introduces a security hole allowing access to the
    accounts of users who use the package.[2]

[2] https://www.debian.org/Bugs/Developer


This bug is at *best* grave*, because it only corrupts a small number
of files (those that have holes --- hence, not "serious" data loss)
and only if the user uses a fairly exotic feature which you have to
manually specify, by hand, from a root shell.  Given how unlikely it
will be for people to run into this, it could easily be argued that
this would be "important" or "normal".

And per the developer's reference manual, the release team doesn't get
out of bed for stable updates for anything lower that "truly critical
functionality problems".  I will note that these are criteria more
strict than what Red Hat uses for RHEL, but <shrug>.  There's a reason
why I only use Debian Stable^H^H^H^H^H Obsolete for really simple
services, such as DNS, IMAP, Web servers, etc.  If I'm going to use
any kind of advanced or exotic features, the very restrictive
definition of allowable bug fixes that are currently defined for
Stable make it (IMHO) a bad idea.

If you want bug fixes that aren't quite at the "truly critical
functionality problem" level, the only answer we have today is Debian
Backports.

Personally, I think Debian as an organization has drawn the line for
Stable inclusion in completely the wrong place.  By being so concerned
about not letting any regressions through, to the point that debdeffs
have to be submitted to the Debian Release Team for their low-level,
detailed approval, and making Maintainers jump through hoops, bugs
aren't getting fixed --- even "important" or "grave" bugs.  But Debian
is a volunteer organization, and I've never cared enough to really
push on this.

Best regards,

                                        - Ted

P.S.  I actually go through a fair amount of work to make sure the
e2fsprogs Debian source package will continue to build on stable and
oldstable, although recent initiatives to force the use of newer
debhelper compat levels hav made this more difficult.  So if people
want to build the very latest version of e2fsprogs on older Debian
releases, this is not hard, and this is what I as the upstream
maintainer would recommend.  It's not what the Debian Release Team
considers acceptable, but: ¯\_(ツ)_/¯