Bug#404143: Fans unreliable under load, permanent memory leak
On Fri, 22 Dec 2006, Marc 'HE' Brockschmidt wrote:
> [hidden email] writes:
> > I'm more than willing to help test a kernel package, but I'll be on
> > [VAC] from 2006-12-23 to 2007-01-03 inclusive. So, please do not
> > release Etch just now :)
> I have ordered an nx6325, which should arrive directly after
> Christmas. I would also be happy to test a fixed kernel. Due to this
> being an overheating problem, I would prefer if you could provide kernel
> images, so that I don't have to compile it.
> BOFH #34:
> (l)user error
could you please send in the output of:
* This bug will not cause hardware damage. The hard thermal cutoff
temperature is well below the temperature at which actual damage will
* It's not clear that the vendor DSDT is broken. It's an unusual
interpretation of the spec, but not necessarily an invalid one - sadly,
the ACPI specification is not entirely clear on every point.
The patch is /probably/ safe, and we've been shipping it in Ubuntu. On
the other hand, previous versions did cause problems on certain other
items of hardware. It's not clear what the best option is, but it's
certainly not a regression over Sarge.
This bug is there merely to remind the kernel team not to release etch
without the patches :) However I'm not sure which upstream version of
linux, if any, contains the patches in the (long) trail of comments.
So, it might be necessary to wait for a few days until the patches
arrive in Linus' tree.
- under load, the fans fail to turn on when the temperature reaches
and then exceeds the normal threshold, which is 58Â°C.
- there is a permanent memory leak in the kernel, even when the system
is idle. The leak is visible by looking at
$ grep Slab: /proc/meminfo and
$ grep Acpi-State /proc/slabinfo
- if overheating, shut down the computer and let it cool down; or
let it shut itself down to prevent a fire hazard.
- if the only problem is the memory leak, reboot.
Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
Today I had to reboot my HP Compaq nx6325 because the kernel was
eating 1.8 Gb out of the 1.9 Gb of RAM in the system, after about 9
days of uptime. Then I started a hourly cron job to monitor
/proc/meminfo and /proc/slabinfo as described above:
2006-06-21T20:06:10: Slab: 30296 kB
2006-17-21T20:17:01: Slab: 37756 kB
2006-17-21T21:17:01: Slab: 48116 kB
2006-17-21T22:17:01: Slab: 55764 kB
2006-17-21T23:17:01: Slab: 69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab: 10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab: 9676 kB
2006-30-21T23:30:26: Acpi-State 0 0 80 48 1 : tunables 120 60 8 : slabdata 0 0 0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab: 10584 kB
2006-34-21T23:34:23: Acpi-State 0 0 80 48 1 : tunables 120 60 8 : slabdata 0 0 0
2006-17-22T00:17:01: Slab: 15424 kB
2006-17-22T00:17:01: Acpi-State 23088 23088 80 48 1 : tunables 120 60 8 : slabdata 481 481 0
2006-17-22T01:17:01: Slab: 29956 kB
2006-17-22T01:17:01: Acpi-State 59136 59136 80 48 1 : tunables 120 60 8 : slabdata 1232 1232 0
I'm more than willing to help test a kernel package, but I'll be on
[VAC] from 2006-12-23 to 2007-01-03 inclusive. So, please do not
release Etch just now :)
I've commited Steve's fix to this bug which includes the description of the
issue. Furthermore, I've updated his patch to reflect some additional models
(mentioned in the kernel's bugzilla but not in Debian's) and to point also to
the other ACPI issue (Kernel's #7122 and Debian's #400488)