What belongs in the Debian cloud kernel?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

What belongs in the Debian cloud kernel?

Noah Meyerhans-3
For buster, we generate a cloud kernel for amd64.  For sid/bullseye,
we'll also support a cloud kernel for arm64.  At the moment, the cloud
kernel is the only used in the images we generate for Microsoft Azure
and Amazon EC2.  It's used in the GCE images we generate as well, but
I'm not sure anybody actually uses those.  We generate two OpenStack
images, one that uses the cloud kernel and another uses the generic
kernel.

There are open bugs against the cloud kernel requesting that
configuration options be turned on there. [1][2][3]  These, IMO,
highlight a need for some documentation around what is in scope for the
cloud kernel, and what is not.  This will help us answer requests such
as these more consistently, and it will also help our users better
understand whether they can expect the cloud kernel to meet their needs
or not.

At the moment, the primary optimization applied to the cloud kernel
focuses on disk space consumed.  We disable compilation of drivers that
we feel are unlikely to ever appear in a cloud environment.  By doing
so, we reduce the installed size of the kernel package by roughly 70%.
There are other optimization we may apply (see [4] for examples), but we
don't yet.

Should we simply say "yes" to any request to add functionality to the
cloud kernel?  None of the drivers will add *that* much to the size of
the image, and if people are asking for them, then they've obviously got
a use case for them.  Or is this a slipperly slope that diminishes the
value of the cloud kernel?  I can see both sides of the argument, so I'd
like to hear what others have to say.

If we're not going to say "yes" to all requests, what criteria should we
use to determine whether or not to enable a feature?  It's rather not
leave it as a judgement call.

noah

1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232
4. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947759

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Tom Ladd
Hi!

I'd be happy to work on creating documentation for the Debian cloud kernel, especially for OpenStack.

I"ve worked with Debian as my primary OS for several years now, and have also been exploring OpenStack.

I'm also honing my Technical Writer skills through on online course. Writing this documentation would give me an opportunity to combine all my interests in a single project.

How/when do I start?

Thank you,

Tom Ladd

On Wed, Apr 1, 2020, 12:15 PM Noah Meyerhans <[hidden email]> wrote:
For buster, we generate a cloud kernel for amd64.  For sid/bullseye,
we'll also support a cloud kernel for arm64.  At the moment, the cloud
kernel is the only used in the images we generate for Microsoft Azure
and Amazon EC2.  It's used in the GCE images we generate as well, but
I'm not sure anybody actually uses those.  We generate two OpenStack
images, one that uses the cloud kernel and another uses the generic
kernel.

There are open bugs against the cloud kernel requesting that
configuration options be turned on there. [1][2][3]  These, IMO,
highlight a need for some documentation around what is in scope for the
cloud kernel, and what is not.  This will help us answer requests such
as these more consistently, and it will also help our users better
understand whether they can expect the cloud kernel to meet their needs
or not.

At the moment, the primary optimization applied to the cloud kernel
focuses on disk space consumed.  We disable compilation of drivers that
we feel are unlikely to ever appear in a cloud environment.  By doing
so, we reduce the installed size of the kernel package by roughly 70%.
There are other optimization we may apply (see [4] for examples), but we
don't yet.

Should we simply say "yes" to any request to add functionality to the
cloud kernel?  None of the drivers will add *that* much to the size of
the image, and if people are asking for them, then they've obviously got
a use case for them.  Or is this a slipperly slope that diminishes the
value of the cloud kernel?  I can see both sides of the argument, so I'd
like to hear what others have to say.

If we're not going to say "yes" to all requests, what criteria should we
use to determine whether or not to enable a feature?  It's rather not
leave it as a judgement call.

noah

1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232
4. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947759

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Ross Vandegrift-2
In reply to this post by Noah Meyerhans-3
On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
> Should we simply say "yes" to any request to add functionality to the
> cloud kernel?  None of the drivers will add *that* much to the size of
> the image, and if people are asking for them, then they've obviously got
> a use case for them.  Or is this a slipperly slope that diminishes the
> value of the cloud kernel?  I can see both sides of the argument, so I'd
> like to hear what others have to say.

I don't think just saying "yes" automatically is the best approach.  But
I'm not sure we can come up with a clear set of rules.  Evaluating the
use cases will involve judgment calls about size vs functionality.  I
guess I think that's okay.


The first two bugs are about nested virtualization.  I like the idea of
deciding to support that or not.  I don't know much about nested virt,
so I don't have a strong opinion.  It seems pretty widely supported on
our platforms.  I don't know if it raises performance or security
concerns.  So these seem okay to me, as long as we decide to support
nested virt, and there aren't major cons that I'm unaware of.


Can you share more about the KSM use case?  I'm worried about raising
security concerns for this one.  KSM has had a history of enabling
attacks that are sorta serious, but also sorta theoretical.  This might
cause upset from infosec folks that freak out about any vulnerability -
even when they don't really understand the magnitude of the risk.

I tried to understand the current state of KSM security.  But I couldn't
easily find a recent summary, and I'm not an expert on the issues.  Here
are the older links I looked at:
- https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-2877
- https://access.redhat.com/blogs/766093/posts/1976303
- https://staff.aist.go.jp/k.suzaki/EuroSec2011-suzaki.pdf
- https://www.usenix.org/system/files/conference/woot15/woot15-paper-barresi.pdf

These sound mostly impractical to me, but they do enable scary sounding
threats (read/write across vmm and hypervisor boundaries).  That makes
me nervous, but someone who understands the issues could convince me
that these aren't worth worrying about.

Ross

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Noah Meyerhans-3
On Thu, Apr 02, 2020 at 10:55:16AM -0700, Ross Vandegrift wrote:
> I don't think just saying "yes" automatically is the best approach.  But
> I'm not sure we can come up with a clear set of rules.  Evaluating the
> use cases will involve judgment calls about size vs functionality.  I
> guess I think that's okay.

You certainly may be right.  I wasn't able to convince myself either
way, which is why I posted for additional opinions.

> The first two bugs are about nested virtualization.  I like the idea of
> deciding to support that or not.  I don't know much about nested virt,
> so I don't have a strong opinion.  It seems pretty widely supported on
> our platforms.  I don't know if it raises performance or security
> concerns.  So these seem okay to me, as long as we decide to support
> nested virt, and there aren't major cons that I'm unaware of.

IMO nested virtualization is not something I'd want to see in a
"production" environment.  Hardware-assisted isolation between VMs is
critical for hosting mixed-trust workloads (e.g. VMs owned and
controlled by unrelated parties without a mutual trust relationship).
Current hardware virtualization extensions, e.g. Intel VTx, only have a
concept of a single level of virtualization.  Nested virtualization is
implemented by trapping and emulating the CPU extensions, and by doing a
bunch of mapping of nested guest state to allow it to effectively run as
a peer VM of the parent guest in hardware.  Some details at [1].  So not
only is it painfully complex, but it's also quite slow.

This is not to say that there aren't any legitimate use cases for nested
virtualization.  Only that I'm not sure it's something we want to be
optimizing for.

> Can you share more about the KSM use case?  I'm worried about raising
> security concerns for this one.  KSM has had a history of enabling
> attacks that are sorta serious, but also sorta theoretical.  This might
> cause upset from infosec folks that freak out about any vulnerability -
> even when they don't really understand the magnitude of the risk.

I don't have any direct experience with KSM.  I can certainly see how it
could help with certain classes of workload, though, if it's known that
multiple processes with mostly identical state are running.

I'm not sure I'd focus too much on the security implications of KSM,
though, since it's widely enabled in Debian's generic kernel and kernels
distributed by other distros.  I don't want to cargo-cult it, but
neither do I want to ignore prior art.  I don't think there's any reason
to drop support for applications making use of KSM in our cloud kernels,
though.  I can't think of any reason why the feature would be less
useful in a cloud environment, and it could certainly save money by
allowing the use of smaller instances.

noah

1. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/nested-vmx.rst

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Ross Vandegrift-2
On Thu, Apr 02, 2020 at 04:52:07PM -0400, Noah Meyerhans wrote:
> I'm not sure I'd focus too much on the security implications of KSM,
> though, since it's widely enabled in Debian's generic kernel and kernels
> distributed by other distros.  I don't want to cargo-cult it, but
> neither do I want to ignore prior art.

If it's that widely available, then I think that's a good indicator that
the issues aren't driving practical attacks.  So I think we shouldn't
refuse it due to the security questions.

Ross

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Noah Meyerhans-3
In reply to this post by Noah Meyerhans-3
On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
> There are open bugs against the cloud kernel requesting that
> configuration options be turned on there. [1][2][3]

<snip>

> 1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
> 2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
> 3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232

So, the discussion thus far has focused on these specific requests more
than I had hoped.  So for now, so we can deal with the current requests,
here's what happens if we enable them:

These are the kernel .config changes:
+CONFIG_VHOST_SCSI=m
+CONFIG_KSM=y
+CONFIG_NET_9P=m
+CONFIG_NET_9P_VIRTIO=m
+# CONFIG_NET_9P_XEN is not set
+# CONFIG_NET_9P_DEBUG is not set
+CONFIG_TARGET_CORE=m
+CONFIG_TCM_IBLOCK=m
+CONFIG_TCM_FILEIO=m
+CONFIG_TCM_PSCSI=m
+CONFIG_TCM_USER2=m
+# CONFIG_LOOPBACK_TARGET is not set
+CONFIG_ISCSI_TARGET=m
+# CONFIG_XEN_SCSI_BACKEND is not set
+CONFIG_9P_FS=m
+CONFIG_9P_FSCACHE=y
+CONFIG_9P_FS_POSIX_ACL=y
+CONFIG_9P_FS_SECURITY=y
+CONFIG_XXHASH=y

Because CONFIG_KSM changes statically linked code, it results in a size
increase of roughly 12 kB of the compressed kernel.  The uncompressed
kernel increases by about 852 kB in size.  The boot time appears to be
unchanged.  I don't like the size increase, but this feature is enabled
everywhere else and apparently does break some users if it's disabled,
so we should enable it.

The kernel package installed size increases by roughly 2 MB due to the
additional modules we generate for 9P and VHOST_SCSI.

So, I think the answer for these specific requests can be affirmative.
The cost is small enough that if these features are useful to somebody,
then we might as well enable them.

noah

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Thomas Goirand-3
In reply to this post by Ross Vandegrift-2
On 4/2/20 7:55 PM, Ross Vandegrift wrote:

> On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
>> Should we simply say "yes" to any request to add functionality to the
>> cloud kernel?  None of the drivers will add *that* much to the size of
>> the image, and if people are asking for them, then they've obviously got
>> a use case for them.  Or is this a slipperly slope that diminishes the
>> value of the cloud kernel?  I can see both sides of the argument, so I'd
>> like to hear what others have to say.
>
> I don't think just saying "yes" automatically is the best approach.  But
> I'm not sure we can come up with a clear set of rules.  Evaluating the
> use cases will involve judgment calls about size vs functionality.  I
> guess I think that's okay.
>
>
> The first two bugs are about nested virtualization.  I like the idea of
> deciding to support that or not.  I don't know much about nested virt,
> so I don't have a strong opinion.  It seems pretty widely supported on
> our platforms.  I don't know if it raises performance or security
> concerns.  So these seem okay to me, as long as we decide to support
> nested virt, and there aren't major cons that I'm unaware of.

There's a big problem when activating nested virt. I have read that Live
migration of VMs can become impossible (ie: for all VMs that are also
host OS for virtualization). As much as I understand, this is because of
the difficulty to support nested MMU. I'm not sure if the situation has
changed or not, but last time I checked this was the case. Ben, do you
know if this has evolved?

So, when I'm being asked about it, my answer from an OpenStack operator
point of view, is always a big "NO !". I want to be able to service my
compute nodes. This means being able to live-migrate the workload away,
otherwise, customers may notice.

Cheers,

Thomas Goirand (zigo)

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Thomas Goirand-3
In reply to this post by Noah Meyerhans-3
On 4/4/20 1:34 AM, Noah Meyerhans wrote:

> On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
>> There are open bugs against the cloud kernel requesting that
>> configuration options be turned on there. [1][2][3]
>
> <snip>
>
>> 1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
>> 2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
>> 3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232
>
> So, the discussion thus far has focused on these specific requests more
> than I had hoped.  So for now, so we can deal with the current requests,
> here's what happens if we enable them:
>
> These are the kernel .config changes:
> +CONFIG_VHOST_SCSI=m
> +CONFIG_KSM=y
> +CONFIG_NET_9P=m
> +CONFIG_NET_9P_VIRTIO=m
> +# CONFIG_NET_9P_XEN is not set
> +# CONFIG_NET_9P_DEBUG is not set
> +CONFIG_TARGET_CORE=m
> +CONFIG_TCM_IBLOCK=m
> +CONFIG_TCM_FILEIO=m
> +CONFIG_TCM_PSCSI=m
> +CONFIG_TCM_USER2=m
> +# CONFIG_LOOPBACK_TARGET is not set
> +CONFIG_ISCSI_TARGET=m
> +# CONFIG_XEN_SCSI_BACKEND is not set
> +CONFIG_9P_FS=m
> +CONFIG_9P_FSCACHE=y
> +CONFIG_9P_FS_POSIX_ACL=y
> +CONFIG_9P_FS_SECURITY=y
> +CONFIG_XXHASH=y
>
> Because CONFIG_KSM changes statically linked code, it results in a size
> increase of roughly 12 kB of the compressed kernel.  The uncompressed
> kernel increases by about 852 kB in size.  The boot time appears to be
> unchanged.  I don't like the size increase, but this feature is enabled
> everywhere else and apparently does break some users if it's disabled,
> so we should enable it.
>
> The kernel package installed size increases by roughly 2 MB due to the
> additional modules we generate for 9P and VHOST_SCSI.
>
> So, I think the answer for these specific requests can be affirmative.
> The cost is small enough that if these features are useful to somebody,
> then we might as well enable them.
>
> noah

Thanks for taking the time to investigate.

+1 to what you wrote.

Cheers,

Thomas Goirand (zigo)

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Noah Meyerhans-3
In reply to this post by Thomas Goirand-3
On Sat, Apr 04, 2020 at 10:17:20AM +0200, Thomas Goirand wrote:

> > The first two bugs are about nested virtualization.  I like the idea of
> > deciding to support that or not.  I don't know much about nested virt,
> > so I don't have a strong opinion.  It seems pretty widely supported on
> > our platforms.  I don't know if it raises performance or security
> > concerns.  So these seem okay to me, as long as we decide to support
> > nested virt, and there aren't major cons that I'm unaware of.
>
> There's a big problem when activating nested virt. I have read that Live
> migration of VMs can become impossible (ie: for all VMs that are also
> host OS for virtualization). As much as I understand, this is because of
> the difficulty to support nested MMU. I'm not sure if the situation has
> changed or not, but last time I checked this was the case. Ben, do you
> know if this has evolved?

Remember, nested virtualization works today; nothing we have done would
have prevented that.  The question is about whether or not we care about
enabling features to support use cases that only arise when nested
virtualization is in use.

The reason nested virtualization breaks live migration is that it shares
state between the VM and the underlying hypervisor.  The VM is, in a
sense, no longer self contained.  The nested VMs state is tracked by the
parent VM in a VMCS structure, as shown in the nested-vmx.rst doc I
linked previously, and the values in that struct need to be mapped to a
corresponding list in the hypervisor.  Migration would entail some
coordination between the hypervisor and the outer VM, as the shared
state would need to be kept in sync throughout the process.

The sharing of state between the VM and the hypervisor hints at some of
the potential security concerns around nested virtualization in
mixed-trust environments.

> So, when I'm being asked about it, my answer from an OpenStack operator
> point of view, is always a big "NO !". I want to be able to service my
> compute nodes. This means being able to live-migrate the workload away,
> otherwise, customers may notice.

Whether or not you support nested virt on your infrastructure is a
deployment choice, not a choice Debian needs to make.

noah

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Thomas Goirand-3
On 4/4/20 5:42 PM, Noah Meyerhans wrote:
>> So, when I'm being asked about it, my answer from an OpenStack operator
>> point of view, is always a big "NO !". I want to be able to service my
>> compute nodes. This means being able to live-migrate the workload away,
>> otherwise, customers may notice.
>
> Whether or not you support nested virt on your infrastructure is a
> deployment choice, not a choice Debian needs to make.

Sure! There are plenty of use cases (like for example, when you trust
your users and don't care about live migration) where it make sense.

Thomas

Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Laurence Parry-2
In reply to this post by Noah Meyerhans-3
> For buster, we generate a cloud kernel for amd64. For sid/bullseye,
> we'll also support a cloud kernel for arm64. At the moment, the cloud
> kernel is the only used in the images we generate for Microsoft Azure
> and Amazon EC2. It's used in the GCE images we generate as well, but
> I'm not sure anybody actually uses those.

I use those, though I'm unsure if it's a level of usage that you'd consider significant (I also use x32 to a similar extent :-).

I run a Munin master for 17 nodes on an f1-micro with buster-backports cloud-amd64 - proxied via App Engine to get 1GB/day out for viewing graphs, rather than 1GB/month/region. Works well enough that I didn't immediately feel the need to pare it down to a bare minimum and roll my own like I do with the regular kernels. (I may try to e.g. avoid SMP overhead or cut fs features to increase inode/dentry slab density; but not sure I can compile it locally on a 20% of Skylake core with 1GB RAM, especially when it's already over half its available resources to generate/store graphs.)

My initrd (dep, xz) seems to have gone up from 4.66 MB on disk in 5.4.0-0.bpo.3-cloud-amd64 to 5.12 MB for bpo.4, but the kernel memory line suggests RAM was minimally impacted by 4KB.

As for the more general question: to me, a 'cloud' machine is just a VM offered via a cloud provider, supporting the management interfaces of such providers out of the box. It may be a relatively 'large' machine, hosting guests of its own. I wouldn't expect to see drivers for old directly-attached hardware, but legacy or research filesystems and protocols and drivers used for access to enterprise external storage seems reasonable, as cloud may be used to migrate older workloads or handle research projects. "Bare metal cloud" may be outside its remit - though, these will tend to be newer hardware.

Best regards,
--
Laurence "GreenReaper" Parry - Inkbunny administrator
https://www.greenreaper.co.uk/
Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Ben Hutchings-3
In reply to this post by Ross Vandegrift-2
On Fri, 2020-04-03 at 09:28 -0700, Ross Vandegrift wrote:
> On Thu, Apr 02, 2020 at 04:52:07PM -0400, Noah Meyerhans wrote:
> > I'm not sure I'd focus too much on the security implications of KSM,
> > though, since it's widely enabled in Debian's generic kernel and kernels
> > distributed by other distros.  I don't want to cargo-cult it, but
> > neither do I want to ignore prior art.
>
> If it's that widely available, then I think that's a good indicator that
> the issues aren't driving practical attacks.  So I think we shouldn't
> refuse it due to the security questions.

Enabling CONFIG_KSM only means that the feature is available.  It's not
active by default, so it should have no security impact unless the
administrator chooses to enable it (through /sys/kernel/mm/ksm/run).

Ben.

--
Ben Hutchings
73.46% of all statistics are made up.


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: What belongs in the Debian cloud kernel?

Ben Hutchings-3
In reply to this post by Laurence Parry-2
On Sun, 2020-04-05 at 00:26 +0100, Laurence Parry wrote:
[...]
> My initrd (dep, xz) seems to have gone up from 4.66 MB on disk in
> 5.4.0-0.bpo.3-cloud-amd64 to 5.12 MB for bpo.4, but the kernel memory line
> suggests RAM was minimally impacted by 4KB.
[...]

The size of the initramfs has no impact on memory usage after booting -
everything in it is deleted before switching to the real init system.

Ben.

--
Ben Hutchings
73.46% of all statistics are made up.



signature.asc (849 bytes) Download Attachment