Handling irqbalance in virtual environments

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Handling irqbalance in virtual environments

Bastian Blank
Moin

It turns out we got again problems with irqbalance.

It was added as recommends of the main image in 3.16, as it was reported
that older kernels move all interrupts to CPU 0 without help.[1]

In the meantime the kernel can do balancing on it's own.  In 4.9, I've
seen it working with aacraid, each queue gets hard pinned to it's own
CPU from 0 to $NRCPUS.  In 4.19 I've seen the same working properly with
virtio-net.

With 4.19, even on real hardware, where interrupts have an affinity for
all cpus, each interrupt is actually delivered to different cpu.

Random example for this, it even selects only one thread of each core:

|  26:    0    0    0    0   92    0    0    0  IR-PCI-MSI 3670017-edge      eno1-TxRx-0
|  27:    0    0    0    0    0  167    0    0  IR-PCI-MSI 3670018-edge      eno1-TxRx-1
|  28:    0    0    0    0    0    0  467    0  IR-PCI-MSI 3670019-edge      eno1-TxRx-2
|  29:    0    0    0    0    0    0    0  454  IR-PCI-MSI 3670020-edge      eno1-TxRx-3

Now irqbalance comes to re-do the existing pinning, and the result is not
longer correct but $RANDOM for the hard queue-to-cpu case of virtio.

At least Google considers the work irqbalance does to "correct" the existing
balancing a large problem.

I'm not sure how to go forward.  I have a workaround pending for our
cloud images to hard exclude the installation of irqbalance.[2]

Regards,
Bastian

[1]: https://bugs.debian.org/577788
[2]: https://salsa.debian.org/cloud-team/debian-cloud-images/merge_requests/81
--
Youth doesn't excuse everything.
                -- Dr. Janice Lester (in Kirk's body), "Turnabout Intruder",
                   stardate 5928.5.

Reply | Threaded
Open this post in threaded view
|

Re: Handling irqbalance in virtual environments

Bastian Blank
On Fri, Apr 12, 2019 at 10:53:47AM +0200, Bastian Blank wrote:
> With 4.19, even on real hardware, where interrupts have an affinity for
> all cpus, each interrupt is actually delivered to different cpu.

It seems a lot of this comes from
https://lore.kernel.org/patchwork/cover/801590/

Regards,
Bastian

--
Violence in reality is quite different from theory.
                -- Spock, "The Cloud Minders", stardate 5818.4

Reply | Threaded
Open this post in threaded view
|

Re: Handling irqbalance in virtual environments

Ben Hutchings-3
In reply to this post by Bastian Blank
On Fri, 2019-04-12 at 10:53 +0200, Bastian Blank wrote:

> Moin
>
> It turns out we got again problems with irqbalance.
>
> It was added as recommends of the main image in 3.16, as it was reported
> that older kernels move all interrupts to CPU 0 without help.[1]
>
> In the meantime the kernel can do balancing on it's own.  In 4.9, I've
> seen it working with aacraid, each queue gets hard pinned to it's own
> CPU from 0 to $NRCPUS.  In 4.19 I've seen the same working properly with
> virtio-net.
>
> With 4.19, even on real hardware, where interrupts have an affinity for
> all cpus, each interrupt is actually delivered to different cpu.
>
> Random example for this, it even selects only one thread of each core:
>
> >  26:    0    0    0    0   92    0    0    0  IR-PCI-MSI 3670017-edge      eno1-TxRx-0
> >  27:    0    0    0    0    0  167    0    0  IR-PCI-MSI 3670018-edge      eno1-TxRx-1
> >  28:    0    0    0    0    0    0  467    0  IR-PCI-MSI 3670019-edge      eno1-TxRx-2
> >  29:    0    0    0    0    0    0    0  454  IR-PCI-MSI 3670020-edge      eno1-TxRx-3
>
> Now irqbalance comes to re-do the existing pinning, and the result is not
> longer correct but $RANDOM for the hard queue-to-cpu case of virtio.
Then let's drop the recommendation.

Ben.

> At least Google considers the work irqbalance does to "correct" the existing
> balancing a large problem.
>
> I'm not sure how to go forward.  I have a workaround pending for our
> cloud images to hard exclude the installation of irqbalance.[2]
>
> Regards,
> Bastian
>
> [1]: https://bugs.debian.org/577788
> [2]: https://salsa.debian.org/cloud-team/debian-cloud-images/merge_requests/81
--
Ben Hutchings
Hoare's Law of Large Problems:
   Inside every large problem is a small problem struggling to get out.



signature.asc (849 bytes) Download Attachment