Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Sebastian Andrzej Siewior
On 2018-10-29 23:33:34 [+0100], Kurt Roeckx wrote:

> On Mon, Oct 29, 2018 at 09:58:20PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2018-10-29 18:22:08 [+0100], Kurt Roeckx wrote:
> > > So I believe this is not an openssl issue, but something in the
> > > order that the kernel's RNG is initialized and openssh is started.
> > > Potentionally the RNG isn't initialized at all and you actually
> > > have to wait for the kernel to get it's random data from the slow
> > > way.
> > >
> > > So I'm reassigning this to systemd and openssh-server, I have no
> > > idea where the problem really is.
> >
> > I see it, too. So during boot someone invokes "sshd -t" which invokes
>
> That's:
> ExecStartPre=/usr/sbin/sshd -t
>
> > getrandom(, 32, 0)
> > and this blocks.
>
> And did systemd-random-seed.service get run before that?

Yes, but it does not matter from what I can see in the code. On my
system this writes 512 to /dev/urandom at timestamp 11.670639. But sshd
does this:

  sshd-2638  [004] .......    22.445819: __x64_sys_getrandom: 1| 32 0
sshd asks for 32 bytes (flags = 0)

  sshd-2638  [004] .......    22.445824: __x64_sys_getrandom: 2
-> crng_ready() is not true so we wait_for_random_bytes()

  sshd-3164  [004] .......   117.577454: __x64_sys_getrandom: 3
-> "crng init done", sshd's getrandom() resumed.

The problem is that the entropy is added but the entropy count is not
increased. So we wait.

Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
/dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
the entropy count after it was written. Those two are documented in
random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
It does also the job, too.

Ted, is there any best practise what to do with the seed which as
extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

> Kurt

Sebastian

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Theodore Y. Ts'o
On Tue, Oct 30, 2018 at 01:18:08AM +0100, Sebastian Andrzej Siewior wrote:
> Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
> /dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
> the entropy count after it was written. Those two are documented in
> random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
> It does also the job, too.
>
> Ted, is there any best practise what to do with the seed which as
> extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
> speed up init or just write to back to urandom and issue RNDRESEEDCRNG?

The reason why writing to /dev/[u]random via something like:

    cat /var/lib/random/seed > /dev/random

Dosn't bump the the entropy counter is because it's possible that an
attacker could read /var/lib/random/seed.  Even if the seed file is
refreshed on shutdown, (a) the attacker could have read the file while
the system is down, or (b) the system could have crashed so the seed
file was not refreshed and the attacker could have read the file
before the crash.

If you are using a VM, if the host has virtio-rng, using a kernel that
has virtio-rng support will solve the problem.  For qemu, this means
you can enable via something like this:

         -object rng-random,filename=/dev/urandom,id=rng0 \
         -device virtio-rng-pci,rng=rng0

If you are using Google Compute Engine, I can't comment about future
product features, but I would encourage you to file a feature request
bug with Google requesting virtio-rng support ASAP.

On any VM (cloud or on-prem), since you have to trust the host
*anyway*, with v4.19, you can add random.trust_cpu=on to the boot
command-line, or build the kernel with CONFIG_RANDOM_TRUST_CPU.

For the Debian 4.18 kernel, this can be backported via commits
39a8883a2b98 and 9b25436662d5.

                                        - Ted

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Kurt Roeckx
On Tue, Oct 30, 2018 at 10:15:44AM -0400, Theodore Y. Ts'o wrote:

> On Tue, Oct 30, 2018 at 01:18:08AM +0100, Sebastian Andrzej Siewior wrote:
> > Using ioctl(/dev/urandom, RNDADDENTROPY, ) instead writting to
> > /dev/urandom would do the trick. Or using RNDADDTOENTCNT to increment
> > the entropy count after it was written. Those two are documented in
> > random(4). Or RNDRESEEDCRNG could be used to force crng to be reseeded.
> > It does also the job, too.
> >
> > Ted, is there any best practise what to do with the seed which as
> > extrected from /dev/urandom on system shutdown? Using RNDADDTOENTCNT to
> > speed up init or just write to back to urandom and issue RNDRESEEDCRNG?
>
> The reason why writing to /dev/[u]random via something like:
>
>     cat /var/lib/random/seed > /dev/random
>
> Dosn't bump the the entropy counter is because it's possible that an
> attacker could read /var/lib/random/seed.  Even if the seed file is
> refreshed on shutdown, (a) the attacker could have read the file while
> the system is down, or (b) the system could have crashed so the seed
> file was not refreshed and the attacker could have read the file
> before the crash.

So are you saying that the /var/lib/random/seed is untrusted, and
should never be used, and we should always wait for fresh entropy?

Anyway, I think if an attacker somehow has access to that file,
you have much more serious problems.


Kurt

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Theodore Y. Ts'o
On Tue, Oct 30, 2018 at 07:37:23PM +0100, Kurt Roeckx wrote:
>
> So are you saying that the /var/lib/random/seed is untrusted, and
> should never be used, and we should always wait for fresh entropy?
>
> Anyway, I think if an attacker somehow has access to that file,
> you have much more serious problems.

So it's complicated.  It's not a binary trusted/untrusted sort of
thing.  We should definitely use it, and the fact we have it saved us
(at least after the system is installed) when there is a kernel bug
such as CVE-2018-1108 where we screwed up and treated the DMI table as
100% random and counted it towards required 256 bits of entropy needed
to consider the CRNG to be fully initialized.

If the attacker has access to the file, whether or not it matters
really depends on how the rest of the system is put together.  So for
example, if you have secure boot (via a secured bootloader and a
signed kernel), and the root file system is protected using dm-verity,
the fact that seed file might be compromisable by an external attacker
is bad, but it's not necessarily catastrophic.  (This is essential the
situation for ChromeOS and modern Android handsets, BTW.)

OTOH, there are definitely scenarios where you are correct, and if the
attacker has access to the files, you probably are toast, and so
therefore relying on it makes sense.  Whether or not you think that is
more or less safer than relying on RDRAND is going to be a judgement
call, and very much depends on your assumptions of the threat
environment.

(Suppose in the future the Chinese come up with a 100% chinese made
CPU, that has a RDRAND equivalent; the US military might not be
comfortable relying on that CPU or its RDRAND unit, but the Chinese
Military might be perfectly comfortable relying on it; what a
Debian-provided kernel should when we're trying to be a "Universal
Operating System" is a very interesting question --- and that's why
random.trust_cpu is a boot command line option.)

In any case, if Debian wants to ship a program which reads a seed file
and uses it to initialize the random pull assuming that it's
trustworthy via the RNDADDENTROPY ioctl, that's not an insane thing to
do.  My recommendation would be to make it be configurable, however,
just as whether we trust RDRAND should be trusted (in isolation) to
initialize the CRNG.

The point is that everyone is going to have a different opinion about
what entropy source is fully trusted, by itself, to initialize the
kernel's CRNG.  We should mix in everything; but what we should
consider as trustworthy enough to give entropy credit is going to vary
from one sysadmin/system designer/system security officer to another.

Personally, I'm comfortable to run my personal kernel with
CONFIG_RANDOM_TRUST_CPU.  I'm not willing to impose my beliefs on the
all Linux users, however.

Cheers,

                                        - Ted

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Sebastian Andrzej Siewior
On October 30, 2018 8:51:36 PM UTC, "Theodore Y. Ts'o" <[hidden email]> wrote:
>
>So it's complicated.  It's not a binary trusted/untrusted sort of
>thing.  

What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?

>Cheers,
>
> - Ted


--
Sebastian

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Theodore Y. Ts'o
On Wed, Oct 31, 2018 at 11:21:59AM +0000, Sebastian Andrzej Siewior wrote:
> On October 30, 2018 8:51:36 PM UTC, "Theodore Y. Ts'o" <[hidden email]> wrote:
> >
> >So it's complicated.  It's not a binary trusted/untrusted sort of
> >thing.  
>
> What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?

No, that's for debugging purposes only.

When there is sufficient entropy added (either through a hw_random
subsystem, or because RDRAND is trusted, or the RNDADDENTORPY ioctl),
the crng is automatically reseeded by credit_entropy_bits().  So it's
not needed to use RNDRESEEDCRNG.

                                        - Ted

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Sebastian Andrzej Siewior
On 2018-10-31 18:41:06 [-0400], Theodore Y. Ts'o wrote:
> On Wed, Oct 31, 2018 at 11:21:59AM +0000, Sebastian Andrzej Siewior wrote:
> > On October 30, 2018 8:51:36 PM UTC, "Theodore Y. Ts'o" <[hidden email]> wrote:
> > >
> > >So it's complicated.  It's not a binary trusted/untrusted sort of
> > >thing.  
> >
> > What about RNDRESEEDCRNG? Would it be reasonable to issue it after writing the seed as part of the boot process?
>
> No, that's for debugging purposes only.

Okay. I'm asking because it has been added to the kernel, marked stable
and the man page has not been updated. So it did not look like a
debugging interface :)

> When there is sufficient entropy added (either through a hw_random
> subsystem, or because RDRAND is trusted, or the RNDADDENTORPY ioctl),
> the crng is automatically reseeded by credit_entropy_bits().  So it's
> not needed to use RNDRESEEDCRNG.

Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
bare metal with nothing fancy I have:
[    3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
[   10.363377] r8169 0000:05:00.0 eth0: link up
[   41.966375] random: crng init done

which means I have to wait about half a minute until I can ssh into. And
there is no way to speed it up?
You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
it configureable and not default, correct?

>
> - Ted

Sebastian

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Theodore Y. Ts'o
On Thu, Nov 01, 2018 at 11:18:14PM +0100, Sebastian Andrzej Siewior wrote:
> Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
> bare metal with nothing fancy I have:
> [    3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
> [   10.363377] r8169 0000:05:00.0 eth0: link up
> [   41.966375] random: crng init done
>
> which means I have to wait about half a minute until I can ssh into. And
> there is no way to speed it up?

So that surprises me.  Can you tell me more about the hardware?  Is it
something like a Rasberry Pi?  Or is it an x86 server or desktop?  In
my experience for most x86 platforms this isn't an issue.

The main reason why I've talked about VM system is because this is
where it where most of the problems that people ahve reported to me.

Here's the problem: if we "speed it up" inappropriately, you're
risking the security of the ssh.  If people who are making a print
server or Wifi Rounter who screw it up, they're the ones who are at
fault.  (And this isn't hypothetical.  See https://factorable.net)

So if I make a blanket recommendation, and it causes Debian to ship
some kind of default that causes Debian users to be insecure, I'm
going to be feel really bad.  This is why I'm very cautious about what
I say.  If you want to do whatever you want on your own system, hey
consulting adults can do whatever they want.  :-)

> You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
> it configureable and not default, correct?

I'd want to see a full design doc, or a git repository, or set of
changes before I give it an unqualified endorsement, but there *are*
configurations where such a thing would be sane.

That's the problem with security recommendations.  It's much like a
lawyer giving legal advice.  They're very careful about doing that in
an unstructured circumstances.  If it gets taken in the wrong way,
they could be legally liable and people might blame/sue them.

And then on top of that, there are the political considerations.
Suppose I told you, "just use RDRAND and be happy".  Some people who
sure that RDRAND has been backdoored would claim that I'm in the
pocket of the NSA and/or Intel.  That's why all I'm going to say is,
"I'm comfortable turning RDRAND on my own systems; you can do what you
want."

Cheers,

                                                - Ted

P.S.  Although if I were going to generate a high-value key, I *would*
plug in my handy-dandy Chaos Key[1] first.  Keith gave a
presentation[2] about it at Debconf 16.

[1] https://keithp.com/blogs/chaoskey/
[2] https://debconf16.debconf.org/talks/94/

And certainly if you were doing something where you had millions of
dollars at risk, or where the EU might fine you into oblivion for
millions of Euros due to some privacy exposure of your users, I
certainly would recommend that you spend the $40 USD to get a Chaos
Key and just be *done* with it.

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Kurt Roeckx
On Thu, Nov 01, 2018 at 07:50:35PM -0400, Theodore Y. Ts'o wrote:

> On Thu, Nov 01, 2018 at 11:18:14PM +0100, Sebastian Andrzej Siewior wrote:
> > Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
> > bare metal with nothing fancy I have:
> > [    3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
> > [   10.363377] r8169 0000:05:00.0 eth0: link up
> > [   41.966375] random: crng init done
> >
> > which means I have to wait about half a minute until I can ssh into. And
> > there is no way to speed it up?
>
> So that surprises me.  Can you tell me more about the hardware?  Is it
> something like a Rasberry Pi?  Or is it an x86 server or desktop?  In
> my experience for most x86 platforms this isn't an issue.

The original poster had:
Architecture: amd64 (x86_64)
Kernel: Linux 4.18.0-2-amd64 (SMP w/2 CPU cores)

But I'm not sure if that's a real machine or some virtual host,
I'm going to guess it's a virtual host.

Anyway, on my laptop I get:
[   12.675935] random: crng init done

If the TPM is enabled, I also have an /etc/hwrng, but rng-tools is
started later after the init is done.

On my desktop (with a chaos key attached)
[    3.844484] random: crng init done
[    5.312406] systemd[1]: systemd 239 running in system mode.


Kurt

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Theodore Y. Ts'o
On Fri, Nov 02, 2018 at 01:24:25AM +0100, Kurt Roeckx wrote:
> Anyway, on my laptop I get:
> [   12.675935] random: crng init done
>
> If the TPM is enabled, I also have an /etc/hwrng, but rng-tools is
> started later after the init is done.
>
> On my desktop (with a chaos key attached)
> [    3.844484] random: crng init done
> [    5.312406] systemd[1]: systemd 239 running in system mode.

Starting with the 3.17 kernel, the kernel will automatically pull from
hardware random number generators without needing to install a user
space daemon, such as rng-tools.  For most hardware devices, it is not
enabled by default, so you have to enable by adding something like
"rng_core.default_quality=700" to the kernel boot line.

There are *two* devices which are an exception to this rule.  The
first is virtio_rng, since the assumption is if you are using a VM,
you had better trust the host infrastructure or you have much worse
problems.  The second is the driver for the Chaos Key.  That appears
to be because the author of the driver for the Chaos Key wasn't aware
of the general policy that hardware rng's shouldn't be trusted by
default, and the driver was coded violating that policy.

This is why (with a chaos key attached) you see the "crng init done"
message so early, *before* the root file system is mounted.  (The root
file system gets mounted after the "systemd running in system mode"
message is logged.)

This is better than relying on rng-toonls, since we can initialize the
CRNG must earlier in the boot process.  (It should have been the case
that this would only happen if you configured by setting the
rng_core.default_quality parameter, but see above about how the Chaos
Key driver is currently violating policy.)

In the future I should change the kernel so you can explicitly specify
something like tpm.rng_quality=500 and chaos_key.rng_quality=1000 on
the boot command line.  That way the system administrator can be very
explicit about which hwrng they trust; right now what we have is not
ideal since it's not clear which hwrng the system administrator wanted
to configure as trusted, and if you have more than one hwnrg in the
system (say, a closed source, proprietary tpm, and an open hardware
Chaos Key) you can't say which one you want to have trusted.

Cheers,

                                                - Ted

Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Rasmus Villemoes
In reply to this post by Theodore Y. Ts'o
On 2018-10-30 21:51, Theodore Y. Ts'o wrote:
> On Tue, Oct 30, 2018 at 07:37:23PM +0100, Kurt Roeckx wrote:
>>
>> So are you saying that the /var/lib/random/seed is untrusted, and
>> should never be used, and we should always wait for fresh entropy?
>>
[...]
>
> In any case, if Debian wants to ship a program which reads a seed file
> and uses it to initialize the random pull assuming that it's
> trustworthy via the RNDADDENTROPY ioctl, that's not an insane thing to
> do.  My recommendation would be to make it be configurable, however,
> just as whether we trust RDRAND should be trusted (in isolation) to
> initialize the CRNG.

This thread finally prompted me to look into getting systemd to
optionally credit the seed file, and it seems like that might make it in
in some form:

https://github.com/systemd/systemd/pull/10621

Rasmus
Reply | Threaded
Open this post in threaded view
|

Re: Bug#912087: openssh-server: Slow startup after the upgrade to 7.9p1

Sebastian Andrzej Siewior
In reply to this post by Theodore Y. Ts'o
On 2018-11-01 19:50:35 [-0400], Theodore Y. Ts'o wrote:

> On Thu, Nov 01, 2018 at 11:18:14PM +0100, Sebastian Andrzej Siewior wrote:
> > Okay. So you wrote what can be done for a system with HW-RNG/kvm. On
> > bare metal with nothing fancy I have:
> > [    3.544985] systemd[1]: systemd 239 running in system mode. (+PAM…
> > [   10.363377] r8169 0000:05:00.0 eth0: link up
> > [   41.966375] random: crng init done
> >
> > which means I have to wait about half a minute until I can ssh into. And
> > there is no way to speed it up?
>
> So that surprises me.  Can you tell me more about the hardware?  Is it
> something like a Rasberry Pi?  Or is it an x86 server or desktop?  In
> my experience for most x86 platforms this isn't an issue.

another boot on the same box:
|  dmesg |grep -i random
| [    0.000000] random: get_random_bytes called from start_kernel+0x94/0x52e with crng_init=0
| [    1.774332] random: fast init done
| [    7.318640] random: systemd: uninitialized urandom read (16 bytes read)
| [    7.318925] random: systemd: uninitialized urandom read (16 bytes read)
| [    7.338074] random: systemd: uninitialized urandom read (16 bytes read)
| [   68.791389] random: crng init done
| [   68.791397] random: 7 urandom warning(s) missed due to ratelimiting

This is a headless i7-Sandy Bridge. A small rootfs partition and there
are hardly any daemons comming up. It waits for a remote login. Running
Debian unstable (incl. kernel).

> The main reason why I've talked about VM system is because this is
> where it where most of the problems that people ahve reported to me.
Yes. Thanks for that. I have another box which I use as a desktop
machine (basically a terminal). It is older than the i7 but I unlock the
crypted root disk as part of the boot process and I assume that due to
this it initializes in less than 10secs. Same goes for my notebook.  But
the i7 has just two cables…

> So if I make a blanket recommendation, and it causes Debian to ship
> some kind of default that causes Debian users to be insecure, I'm
> going to be feel really bad.  This is why I'm very cautious about what
> I say.  If you want to do whatever you want on your own system, hey
> consulting adults can do whatever they want.  :-)

I have a few other headless boxes but those are newer and support
RDRAND. I assume that this makes a difference because otherwise I don't
see a difference (and they don't take long to init).

> > You did not oppose RNDADDTOENTCNT/RNDADDENTROPY but you wanted to make
> > it configureable and not default, correct?
>
> I'd want to see a full design doc, or a git repository, or set of
> changes before I give it an unqualified endorsement, but there *are*
> configurations where such a thing would be sane.
>
> That's the problem with security recommendations.  It's much like a
> lawyer giving legal advice.  They're very careful about doing that in
> an unstructured circumstances.  If it gets taken in the wrong way,
> they could be legally liable and people might blame/sue them.
>
> And then on top of that, there are the political considerations.
> Suppose I told you, "just use RDRAND and be happy".  Some people who
> sure that RDRAND has been backdoored would claim that I'm in the
> pocket of the NSA and/or Intel.  That's why all I'm going to say is,
> "I'm comfortable turning RDRAND on my own systems; you can do what you
> want."

Okay, okay. Let sum that up:
- openssh uses openssl's random number generator which now uses
  getrandom().
- getrandom() blocks until the random pool is initializes. Can be
  checked in dmesg:
  [  TIME.STAMP] random: crng init done
  This wasn't the case earlier where /dev/urandom was used.
- random entropy like interrupts or HW random support (<ad> chaos
  key</ad>) will speed the initalisation process up.
- emulated hardware / KVM can take long to init but it helps if a hw-rnd
  device is added as part of qemu setup.
- it is possible to manually increase the entropy count and/or tell the
  random pool to init asap but it shouldn't be done because it will
  probably lead to weak random pool and probably used in wrong setups.

> Cheers,
>
> - Ted
>
> P.S…
> I
> certainly would recommend that you spend the $40 USD to get a Chaos
> Key and just be *done* with it.

I do own a Nitrokey which can create random. That is not the problem. I
just have one devel box which requires me to wait a minute before I can
login and I have to figure out how to deal with it.

Sebastian