Bug#404143: Fans unreliable under load, permanent memory leak

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Ludovic Brenta-2
Package: linux-image-2.6.18-3-amd64
Version: 2.6.18-7
Severity: grave
Justification: hardware overheating hazard; requires periodic reboots

(This is not the same bug as #400488 (upstream #7122))

This bug affects several amd64 notebooks from HP, notably the nx6125
and the nx6325; there may be other affected machines as well.

Kernel team, please apply the patches for
http://bugzilla.kernel.org/show_bug.cgi?id=5534

This bug is there merely to remind the kernel team not to release etch
without the patches :) However I'm not sure which upstream version of
linux, if any, contains the patches in the (long) trail of comments.
So, it might be necessary to wait for a few days until the patches
arrive in Linus' tree.

Symptoms:
- under load, the fans fail to turn on when the temperature reaches
  and then exceeds the normal threshold, which is 58°C.
- there is a permanent memory leak in the kernel, even when the system
  is idle.  The leak is visible by looking at
  $ grep Slab: /proc/meminfo         and
  $ grep Acpi-State /proc/slabinfo

Workaround:
- if overheating, shut down the computer and let it cool down; or
  let it shut itself down to prevent a fire hazard.
- if the only problem is the memory leak, reboot.

Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
release.

The memory leak is described at:

http://www.mail-archive.com/linux-acpi@.../msg03119.html

Today I had to reboot my HP Compaq nx6325 because the kernel was
eating 1.8 Gb out of the 1.9 Gb of RAM in the system, after about 9
days of uptime.  Then I started a hourly cron job to monitor
/proc/meminfo and /proc/slabinfo as described above:

2006-06-21T20:06:10: Slab:            30296 kB
2006-17-21T20:17:01: Slab:            37756 kB
2006-17-21T21:17:01: Slab:            48116 kB
2006-17-21T22:17:01: Slab:            55764 kB
2006-17-21T23:17:01: Slab:            69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:            10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab:             9676 kB
2006-30-21T23:30:26: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:            10584 kB
2006-34-21T23:34:23: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
2006-17-22T00:17:01: Slab:            15424 kB
2006-17-22T00:17:01: Acpi-State         23088  23088     80   48    1 : tunables  120   60    8 : slabdata    481    481      0
2006-17-22T01:17:01: Slab:            29956 kB
2006-17-22T01:17:01: Acpi-State         59136  59136     80   48    1 : tunables  120   60    8 : slabdata   1232   1232      0

I'm more than willing to help test a kernel package, but I'll be on
[VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
release Etch just now :)

--
Ludovic Brenta.



--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
severity 404143 important
tags 404143 upstream
stop

On Fri, Dec 22, 2006 at 01:51:36AM +0100, [hidden email] wrote:
> Package: linux-image-2.6.18-3-amd64
> Version: 2.6.18-7
> Severity: grave
> Justification: hardware overheating hazard; requires periodic reboots
>
> (This is not the same bug as #400488 (upstream #7122))
>
> This bug affects several amd64 notebooks from HP, notably the nx6125
> and the nx6325; there may be other affected machines as well.

yes this is a known problem of 2.6.18.
the real cause is that HP is shipping broken BIOS in those models.
 
> Kernel team, please apply the patches for
> http://bugzilla.kernel.org/show_bug.cgi?id=5534
>
> This bug is there merely to remind the kernel team not to release etch
> without the patches :) However I'm not sure which upstream version of
> linux, if any, contains the patches in the (long) trail of comments.
> So, it might be necessary to wait for a few days until the patches
> arrive in Linus' tree.

big nack,
acpi has a huge potential destabilisation.
at this time of the game adding acpi patches is pron to regression
at unexpected corners.

etch will get in a point release a newer kernel,
those laptops will have to get one on backports soon after release.
 

> Symptoms:
> - under load, the fans fail to turn on when the temperature reaches
>   and then exceeds the normal threshold, which is 58°C.
> - there is a permanent memory leak in the kernel, even when the system
>   is idle.  The leak is visible by looking at
>   $ grep Slab: /proc/meminfo         and
>   $ grep Acpi-State /proc/slabinfo
>
> Workaround:
> - if overheating, shut down the computer and let it cool down; or
>   let it shut itself down to prevent a fire hazard.
> - if the only problem is the memory leak, reboot.
>
> Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> release.
>
> The memory leak is described at:
>
> http://www.mail-archive.com/linux-acpi@.../msg03119.html
>
> Today I had to reboot my HP Compaq nx6325 because the kernel was
> eating 1.8 Gb out of the 1.9 Gb of RAM in the system, after about 9
> days of uptime.  Then I started a hourly cron job to monitor
> /proc/meminfo and /proc/slabinfo as described above:
>
> 2006-06-21T20:06:10: Slab:            30296 kB
> 2006-17-21T20:17:01: Slab:            37756 kB
> 2006-17-21T21:17:01: Slab:            48116 kB
> 2006-17-21T22:17:01: Slab:            55764 kB
> 2006-17-21T23:17:01: Slab:            69904 kB
> -- Reboot with acpi=noirq: only one CPU found --
> 2006-24-21T23:24:10: Slab:            10444 kB
> -- Reboot with pci=noacpi: only one CPU found --
> 2006-30-21T23:30:26: Slab:             9676 kB
> 2006-30-21T23:30:26: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
> -- Reboot with no options: OK, both CPUs found --
> 2006-34-21T23:34:23: Slab:            10584 kB
> 2006-34-21T23:34:23: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
> 2006-17-22T00:17:01: Slab:            15424 kB
> 2006-17-22T00:17:01: Acpi-State         23088  23088     80   48    1 : tunables  120   60    8 : slabdata    481    481      0
> 2006-17-22T01:17:01: Slab:            29956 kB
> 2006-17-22T01:17:01: Acpi-State         59136  59136     80   48    1 : tunables  120   60    8 : slabdata   1232   1232      0
>
> I'm more than willing to help test a kernel package, but I'll be on
> [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> release Etch just now :)
>
> --
> Ludovic Brenta.

anyway this bug report is helpfull as documentation.
happy vacation

--
maks

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Marc 'HE' Brockschmidt-3
In reply to this post by Ludovic Brenta-2
[hidden email] writes:
> I'm more than willing to help test a kernel package, but I'll be on
> [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> release Etch just now :)

I have ordered an nx6325, which should arrive directly after
Christmas. I would also be happy to test a fixed kernel. Due to this
being an overheating problem, I would prefer if you could provide kernel
images, so that I don't have to compile it.

Marc
--
BOFH #34:
(l)user error

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Marc 'HE' Brockschmidt-3
In reply to this post by Maximilian Attems-3
severity 404143 serious
thanks

maximilian attems <[hidden email]> writes:
> severity 404143 important
> tags 404143 upstream
> stop
[...]
> big nack,
> acpi has a huge potential destabilisation.
> at this time of the game adding acpi patches is pron to regression
> at unexpected corners.
>
> etch will get in a point release a newer kernel,
> those laptops will have to get one on backports soon after release.

Sorry, I don't accept this. We are talking about an *overheating*
problem, which means *broken* hardware. There needs to be at least a fix
documented in the release-notes.

Marc
--
BOFH #357:
I'd love to help you -- it's just that the Boss won't let me near
the computer.

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Andreas Barth
In reply to this post by Ludovic Brenta-2
severity 404143 critical
thanks

* Bastian Blank ([hidden email]) [061222 01:27]:
> On Fri, Dec 22, 2006 at 01:51:36AM +0100, [hidden email] wrote:
> > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > release.
>
> Failing for you don't makes it unsuitable.

That is a true statement by itself. This bug however has the potential
to damage hardware. Which is a critical bug.


Cheers,
Andi
--
  http://home.arcor.de/andreas-barth/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Bastian Blank
In reply to this post by Marc 'HE' Brockschmidt-3
On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> Sorry, I don't accept this. We are talking about an *overheating*
> problem, which means *broken* hardware. There needs to be at least a fix
> documented in the release-notes.

Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
really expect that an interpreter (in this case the ACPI interpreter)
accepts any garbage?

Bastian

--
Deflector shields just came on, Captain.


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Marc 'HE' Brockschmidt-3
Bastian Blank <[hidden email]> writes:
> On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
>> Sorry, I don't accept this. We are talking about an *overheating*
>> problem, which means *broken* hardware. There needs to be at least a fix
>> documented in the release-notes.
> Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> really expect that an interpreter (in this case the ACPI interpreter)
> accepts any garbage?

Other OSes don't destroy the hardware. There is a patch for Linux not to
- I don't see why Debian should release with a kernel that destroys
hardware, without even giving users a warning. Not everyone who buys a
notebook is aware of ACPI problems, and we shouldn't expect all users to
do so.

Fix it or document it, I don't care. But the current state is not
releasable.

Marc
--
BOFH #241:
_Rosin_ core solder? But...

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
In reply to this post by Andreas Barth
On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:

> severity 404143 critical
> thanks
>
> * Bastian Blank ([hidden email]) [061222 01:27]:
> > On Fri, Dec 22, 2006 at 01:51:36AM +0100, [hidden email] wrote:
> > > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > > release.
> >
> > Failing for you don't makes it unsuitable.
>
> That is a true statement by itself. This bug however has the potential
> to damage hardware. Which is a critical bug.

Euh, it seems to me more that the hardware has a bug which causes normal
operation to damage it.

As thus, i think that any damage done would be under the responsability of the
manufacturer to repare or fix. This seems to be both the position of Bastian
and Maximilian, and it seems reasonable.

So, users of such hardware, please bother your vendor to either exchange it
for a not broken one, or at least provide a bios upgrade which fixes the
brokeness.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
In reply to this post by Marc 'HE' Brockschmidt-3
On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:

> Bastian Blank <[hidden email]> writes:
> > On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> >> Sorry, I don't accept this. We are talking about an *overheating*
> >> problem, which means *broken* hardware. There needs to be at least a fix
> >> documented in the release-notes.
> > Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> > really expect that an interpreter (in this case the ACPI interpreter)
> > accepts any garbage?
>
> Other OSes don't destroy the hardware. There is a patch for Linux not to
> - I don't see why Debian should release with a kernel that destroys
> hardware, without even giving users a warning. Not everyone who buys a
> notebook is aware of ACPI problems, and we shouldn't expect all users to
> do so.
>
> Fix it or document it, I don't care. But the current state is not
> releasable.

we are not talking about "a" patch.
what you need is an backport of the 2.6.19 acpi release to 2.6.18.

acpi linux releases are tested as one release and you open a can of worm
once you start picking acpi patches. only mjg59 is insane enough to do
that. anyway the fix for those broken aml tables has a big dependency
so the backport is insane.

i looked at it 2 month ago and dropped the case, we are shortly before
release. i restate those broken hardware needs a newer kernel fullstop.

--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
In reply to this post by Andreas Barth
severity 404143 important
thanks

On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:
> severity 404143 critical
> thanks
>
>
> This bug however has the potential to damage hardware. Which is a
> critical bug.

yes, but it is a very specific affected hardware range.
upstream did not issue a fix for the stable serie 2.6.18.X,
because it's not possible.

--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Marc 'HE' Brockschmidt-3
In reply to this post by Maximilian Attems-3
severity 404143 critical
thanks

maximilian attems <[hidden email]> writes:
> On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
>> Fix it or document it, I don't care. But the current state is not
>> releasable.
> we are not talking about "a" patch.
> what you need is an backport of the 2.6.19 acpi release to 2.6.18.

Read again what I wrote. I will not allow Debian to release with a
Kernel that may damage hardware without even a notice in the release
notes. If you are not able to fix it, note that you have provided a
broken kernel.

Marc
--
Fachbegriffe der Informatik - Einfach erklärt
205: BSD
       Berkeley Spongiform Derivative (Felix Deutsch)

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
In reply to this post by Maximilian Attems-3
On Fri, Dec 22, 2006 at 12:09:45PM +0100, maximilian attems wrote:

> On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > Bastian Blank <[hidden email]> writes:
> > > On Fri, Dec 22, 2006 at 10:30:57AM +0100, Marc 'HE' Brockschmidt wrote:
> > >> Sorry, I don't accept this. We are talking about an *overheating*
> > >> problem, which means *broken* hardware. There needs to be at least a fix
> > >> documented in the release-notes.
> > > Garbage-in, garbage-out. The BIOS of that machines is broken. Do you
> > > really expect that an interpreter (in this case the ACPI interpreter)
> > > accepts any garbage?
> >
> > Other OSes don't destroy the hardware. There is a patch for Linux not to
> > - I don't see why Debian should release with a kernel that destroys
> > hardware, without even giving users a warning. Not everyone who buys a
> > notebook is aware of ACPI problems, and we shouldn't expect all users to
> > do so.
> >
> > Fix it or document it, I don't care. But the current state is not
> > releasable.
>
> we are not talking about "a" patch.
> what you need is an backport of the 2.6.19 acpi release to 2.6.18.
>
> acpi linux releases are tested as one release and you open a can of worm
> once you start picking acpi patches. only mjg59 is insane enough to do
> that. anyway the fix for those broken aml tables has a big dependency
> so the backport is insane.
>
> i looked at it 2 month ago and dropped the case, we are shortly before
> release. i restate those broken hardware needs a newer kernel fullstop.

Well, this would mean that we could provide a semi-official set of newer
kernels for etch. We would, once etch is released, provide a backportet kernel
of the new unstable kernel, as well as a etch-installing d-i for them.

This would allow users to install a stable etch, but including a newer kernel,
which is what probably most of us are doing anyway.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Ludovic Brenta-2
In reply to this post by Sven Luther
Sven Luther writes:

> Euh, it seems to me more that the hardware has a bug which causes
> normal operation to damage it.
>
> As thus, i think that any damage done would be under the
> responsability of the manufacturer to repare or fix. This seems to
> be both the position of Bastian and Maximilian, and it seems
> reasonable.
>
> So, users of such hardware, please bother your vendor to either
> exchange it for a not broken one, or at least provide a bios upgrade
> which fixes the brokeness.

No, the problem is not in the BIOS, it is in the kernel and it is
described at length in the upstream bug report.  If I understand this
description correctly, the kernel is not compliant with the ACPI
specification in that it handles all ACPI events in a single thread,
whereas the ACPI spec only says that the *interpreter* must be
single-threaded.  Also, there is a deadlock situation in the kernel
which is clearly a kernel, not BIOS, bug.

--
Ludovic Brenta.



--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Ludovic Brenta-2
In reply to this post by Ludovic Brenta-2
forward 400488 http://bugzilla.kernel.org/show_bug.cgi?id=7122
forward 404143 http://bugzilla.kernel.org/show_bug.cgi?id=5534
thanks

When I said there's a memory leak, that's not technically true.  What
happens is that ACPI events get piled up in a queue and never
processed, due to a deadlock in Linux' ACPI subsystem.  Thus the
memory is not exactly "lost" but the net effect is the same as for a
genuine memory leak.

Now here is some additional information; my hourly cron job has
monitored the slab allocation for some more time and the bug appears
even more severe than I first thought.  Notice how the slab allocation
jumped from 64M to 1G between 6:17 and 7:17?  The only thing happening
at that time in the system was the execution of the daily crontabs at
6:47.  These are the stock (unmodified) Debian crontabs for apt,
aptitude, apt-show-versions, bsdmainutils, dlocate, find, logrotate,
man-db, modutils, prelink, standard, sysklogd, tetex-bin, zz-backup2l.

2006-06-21T20:06:10: Slab:            30296 kB
2006-17-21T20:17:01: Slab:            37756 kB
2006-17-21T21:17:01: Slab:            48116 kB
2006-17-21T22:17:01: Slab:            55764 kB
2006-17-21T23:17:01: Slab:            69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:            10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab:             9676 kB
2006-30-21T23:30:26: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:            10584 kB
2006-34-21T23:34:23: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
2006-17-22T00:17:01: Slab:            15424 kB
2006-17-22T00:17:01: Acpi-State         23088  23088     80   48    1 : tunables  120   60    8 : slabdata    481    481      0
2006-17-22T01:17:01: Slab:            29956 kB
2006-17-22T01:17:01: Acpi-State         59136  59136     80   48    1 : tunables  120   60    8 : slabdata   1232   1232      0
2006-17-22T02:17:01: Slab:            37764 kB
2006-17-22T02:17:01: Acpi-State         95088  95088     80   48    1 : tunables  120   60    8 : slabdata   1981   1981      0
2006-17-22T03:17:01: Slab:            45544 kB
2006-17-22T03:17:01: Acpi-State        130992 130992     80   48    1 : tunables  120   60    8 : slabdata   2729   2729      0
2006-17-22T04:17:01: Slab:            53328 kB
2006-17-22T04:17:01: Acpi-State        166944 166944     80   48    1 : tunables  120   60    8 : slabdata   3478   3478      0
2006-17-22T05:17:01: Slab:            61120 kB
2006-17-22T05:17:01: Acpi-State        202896 202896     80   48    1 : tunables  120   60    8 : slabdata   4227   4227      0
2006-17-22T06:17:01: Slab:            68904 kB
2006-17-22T06:17:01: Acpi-State        238800 238800     80   48    1 : tunables  120   60    8 : slabdata   4975   4975      0
2006-17-22T07:17:01: Slab:          1152624 kB
2006-17-22T07:17:01: Acpi-State        274656 274656     80   48    1 : tunables  120   60    8 : slabdata   5722   5722      0
2006-17-22T08:17:01: Slab:          1160376 kB
2006-17-22T08:17:01: Acpi-State        310608 310608     80   48    1 : tunables  120   60    8 : slabdata   6471   6471      0
2006-17-22T09:17:01: Slab:          1168168 kB
2006-17-22T09:17:01: Acpi-State        346464 346464     80   48    1 : tunables  120   60    8 : slabdata   7218   7218      0
2006-17-22T10:17:01: Slab:          1175892 kB
2006-17-22T10:17:01: Acpi-State        382176 382176     80   48    1 : tunables  120   60    8 : slabdata   7962   7962      0
2006-17-22T11:17:01: Slab:          1183660 kB
2006-17-22T11:17:01: Acpi-State        417984 417984     80   48    1 : tunables  120   60    8 : slabdata   8708   8708      0
2006-17-22T12:17:01: Slab:          1191400 kB
2006-17-22T12:17:01: Acpi-State        453744 453744     80   48    1 : tunables  120   60    8 : slabdata   9453   9453      0
2006-17-22T13:17:01: Slab:          1202924 kB
2006-17-22T13:17:01: Acpi-State        489696 489696     80   48    1 : tunables  120   60    8 : slabdata  10202  10202      0



--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Andreas Barth
In reply to this post by Sven Luther
* Sven Luther ([hidden email]) [061222 11:34]:

> On Fri, Dec 22, 2006 at 10:54:50AM +0100, Andreas Barth wrote:
> > severity 404143 critical
> > thanks
> >
> > * Bastian Blank ([hidden email]) [061222 01:27]:
> > > On Fri, Dec 22, 2006 at 01:51:36AM +0100, [hidden email] wrote:
> > > > Consequence: linux-image-2.6.18-3-amd63 (=2.6.18-7) is unsuitable for
> > > > release.
> > >
> > > Failing for you don't makes it unsuitable.
> >
> > That is a true statement by itself. This bug however has the potential
> > to damage hardware. Which is a critical bug.
>
> Euh, it seems to me more that the hardware has a bug which causes normal
> operation to damage it.
>
> As thus, i think that any damage done would be under the responsability of the
> manufacturer to repare or fix. This seems to be both the position of Bastian
> and Maximilian, and it seems reasonable.
>
> So, users of such hardware, please bother your vendor to either exchange it
> for a not broken one, or at least provide a bios upgrade which fixes the
> brokeness.

If a bios upgrade is a solution, the kernel could e.g. refuse to run
with a broken bios unless forced to ("runs if forced to" so that people
can do a bios upgrade)? (And of course, write about that in the release
notes).

I'm not saying the fix needs to happen in the kernel. But I do say that
if we must not ship software where we know that hardware damages could
happen on a certain platform - this is not a question of "who did the
mistake", but on protecting our users.



Cheers,
Andi
--
  http://home.arcor.de/andreas-barth/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
In reply to this post by Marc 'HE' Brockschmidt-3
On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:

> severity 404143 critical
> thanks
>
> maximilian attems <[hidden email]> writes:
> > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> >> Fix it or document it, I don't care. But the current state is not
> >> releasable.
> > we are not talking about "a" patch.
> > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
>
> Read again what I wrote. I will not allow Debian to release with a
> Kernel that may damage hardware without even a notice in the release
> notes. If you are not able to fix it, note that you have provided a
> broken kernel.

Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
kernel, to solve this issue.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Marc Brockschmidt-4
Sven Luther <[hidden email]> writes:

> On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
>> maximilian attems <[hidden email]> writes:
>>> On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
>>>> Fix it or document it, I don't care. But the current state is not
>>>> releasable.
>>> we are not talking about "a" patch.
>>> what you need is an backport of the 2.6.19 acpi release to 2.6.18.
>> Read again what I wrote. I will not allow Debian to release with a
>> Kernel that may damage hardware without even a notice in the release
>> notes. If you are not able to fix it, note that you have provided a
>> broken kernel.
> Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> kernel, to solve this issue.
Let's try again: Fix it *OR* explain in the release notes that the
kernel in etch is broken for some hardware.

Marc
--
Fachbegriffe der Informatik - Einfach erklärt
79: Usenet
       Ich habe zuviel Freizeit. (Florian Kuehnert)

attachment0 (194 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Ludovic Brenta-2
In reply to this post by Ludovic Brenta-2
Some more information.

1) On my machine, reading the temperature using, say, yacpi, causes
   one processor to process all the pending ACPI events.  On a
   uniprocessor machine, the machine would appear to hang for several
   seconds; not so on my dual-core machine :)

2) The lare slab usage (1.1 Gb) was in part due to the XFS cache data;
   all three of my machine's filesystems are XFS.  So the Acpi-State
   line in /proc/slabinfo is the really meaningful one.

Here is my complete log so far, with annotations.

2006-06-21T20:06:10: Slab:            30296 kB
2006-17-21T20:17:01: Slab:            37756 kB
2006-17-21T21:17:01: Slab:            48116 kB
2006-17-21T22:17:01: Slab:            55764 kB
2006-17-21T23:17:01: Slab:            69904 kB
-- Reboot with acpi=noirq: only one CPU found --
2006-24-21T23:24:10: Slab:            10444 kB
-- Reboot with pci=noacpi: only one CPU found --
2006-30-21T23:30:26: Slab:             9676 kB
2006-30-21T23:30:26: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
-- Reboot with no options: OK, both CPUs found --
2006-34-21T23:34:23: Slab:            10584 kB
2006-34-21T23:34:23: Acpi-State             0      0     80   48    1 : tunables  120   60    8 : slabdata      0      0      0
2006-17-22T00:17:01: Slab:            15424 kB
2006-17-22T00:17:01: Acpi-State         23088  23088     80   48    1 : tunables  120   60    8 : slabdata    481    481      0
2006-17-22T01:17:01: Slab:            29956 kB
2006-17-22T01:17:01: Acpi-State         59136  59136     80   48    1 : tunables  120   60    8 : slabdata   1232   1232      0
2006-17-22T02:17:01: Slab:            37764 kB
2006-17-22T02:17:01: Acpi-State         95088  95088     80   48    1 : tunables  120   60    8 : slabdata   1981   1981      0
2006-17-22T03:17:01: Slab:            45544 kB
2006-17-22T03:17:01: Acpi-State        130992 130992     80   48    1 : tunables  120   60    8 : slabdata   2729   2729      0
2006-17-22T04:17:01: Slab:            53328 kB
2006-17-22T04:17:01: Acpi-State        166944 166944     80   48    1 : tunables  120   60    8 : slabdata   3478   3478      0
2006-17-22T05:17:01: Slab:            61120 kB
2006-17-22T05:17:01: Acpi-State        202896 202896     80   48    1 : tunables  120   60    8 : slabdata   4227   4227      0
2006-17-22T06:17:01: Slab:            68904 kB
2006-17-22T06:17:01: Acpi-State        238800 238800     80   48    1 : tunables  120   60    8 : slabdata   4975   4975      0
2006-17-22T07:17:01: Slab:          1152624 kB
2006-17-22T07:17:01: Acpi-State        274656 274656     80   48    1 : tunables  120   60    8 : slabdata   5722   5722      0
2006-17-22T08:17:01: Slab:          1160376 kB
2006-17-22T08:17:01: Acpi-State        310608 310608     80   48    1 : tunables  120   60    8 : slabdata   6471   6471      0
2006-17-22T09:17:01: Slab:          1168168 kB
2006-17-22T09:17:01: Acpi-State        346464 346464     80   48    1 : tunables  120   60    8 : slabdata   7218   7218      0
2006-17-22T10:17:01: Slab:          1175892 kB
2006-17-22T10:17:01: Acpi-State        382176 382176     80   48    1 : tunables  120   60    8 : slabdata   7962   7962      0
2006-17-22T11:17:01: Slab:          1183660 kB
2006-17-22T11:17:01: Acpi-State        417984 417984     80   48    1 : tunables  120   60    8 : slabdata   8708   8708      0
2006-17-22T12:17:01: Slab:          1191400 kB
2006-17-22T12:17:01: Acpi-State        453744 453744     80   48    1 : tunables  120   60    8 : slabdata   9453   9453      0
2006-17-22T13:17:01: Slab:          1202924 kB
2006-17-22T13:17:01: Acpi-State        489696 489696     80   48    1 : tunables  120   60    8 : slabdata  10202  10202      0
-- Start yacpi, monitoring the temperature every second.
-- Note how the slab allocation drops by ~100M and then stays constant.
2006-17-22T14:17:01: Slab:          1097584 kB
2006-17-22T14:17:01: Acpi-State           109    144     80   48    1 : tunables  120   60    8 : slabdata      3      3      0
2006-17-22T15:17:01: Slab:          1097532 kB
2006-17-22T15:17:01: Acpi-State            45     96     80   48    1 : tunables  120   60    8 : slabdata      2      2      0
2006-17-22T16:17:01: Slab:          1097536 kB
2006-17-22T16:17:01: Acpi-State            75    144     80   48    1 : tunables  120   60    8 : slabdata      3      3      0
2006-17-22T17:17:01: Slab:          1097668 kB
2006-17-22T17:17:01: Acpi-State           141    144     80   48    1 : tunables  120   60    8 : slabdata      3      3      0
-- Stop the yacpi monitoring.
2006-17-22T18:17:01: Slab:          1098904 kB
2006-17-22T18:17:01: Acpi-State          5808   5808     80   48    1 : tunables  120   60    8 : slabdata    121    121      0
-- At this point the Acpi-State has started increasing again, but is still
-- small.  Most of the slab allocations are in the XFS caches (all three
-- filesystems on this computer are XFS).
-- To make sure the memory can be released, start a fairly large compilation
-- using both CPUs and 2x370 M of RAM.  Just before compilation:
2006-48-22T18:48:56: Slab:          1103244 kB
2006-48-22T18:48:56: Acpi-State         24528  24528     80   48    1 : tunables  120   60    8 : slabdata    511    511      0
-- A couple of minutes into the compilation, the fans have still not turned on
-- and the CPU is getting so hot it burns my hand.  Restart yacpi, monitoring
-- temperature every second.  The temp is 85°C (dangerous!!) One CPU starts
-- processing the backlog of ACPI events, the other continues the compilation.
-- Fans start.  Temperature drops to 71°C and stays there.
2006-00-22T19:00:44: Slab:           861828 kB
2006-00-22T19:00:44: Acpi-State            74     96     80   48    1 : tunables  120   60    8 : slabdata      2      2      0
-- End of compilation.  During the final packaging stages, the temperature has
-- dropped to 57°C as the CPUs were less used.  Stop the yacpi monitoring.
2006-07-22T19:07:13: Slab:           865660 kB
2006-07-22T19:07:13: Acpi-State            73     96     80   48    1 : tunables  120   60    8 : slabdata      2      2      0
2006-17-22T19:17:01: Slab:           865028 kB
2006-17-22T19:17:01: Acpi-State            71    144     80   48    1 : tunables  120   60    8 : slabdata      3      3      0
2006-17-22T20:17:01: Slab:           871224 kB
2006-17-22T20:17:01: Acpi-State         34704  34704     80   48    1 : tunables  120   60    8 : slabdata    723    723      0
2006-17-22T21:17:01: Slab:           879112 kB
2006-17-22T21:17:01: Acpi-State         69552  69552     80   48    1 : tunables  120   60    8 : slabdata   1449   1449      0
2006-17-22T22:17:01: Slab:           887908 kB
2006-17-22T22:17:01: Acpi-State        104784 104784     80   48    1 : tunables  120   60    8 : slabdata   2183   2183      0
2006-17-22T23:17:01: Slab:           896024 kB
2006-17-22T23:17:01: Acpi-State        139920 139968     80   48    1 : tunables  120   60    8 : slabdata   2915   2916      0



Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Andreas Barth
In reply to this post by Sven Luther
* Sven Luther ([hidden email]) [061222 05:42]:

> On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
> > maximilian attems <[hidden email]> writes:
> > > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > >> Fix it or document it, I don't care. But the current state is not
> > >> releasable.
> > > we are not talking about "a" patch.
> > > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> >
> > Read again what I wrote. I will not allow Debian to release with a
> > Kernel that may damage hardware without even a notice in the release
> > notes. If you are not able to fix it, note that you have provided a
> > broken kernel.
>
> Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> kernel, to solve this issue.

Sven, stop this! I can remember well how you promised that moving to
2.6.18 will magically solve almost all of our issues - 6 (or more)
release critical bugs against 2.6.18 don't show that this has worked so
well. Please try helping us on solutions rather then breaking things
again.


Please try to look at it from another perspective:

Consider you have bought such a laptop, and you install Debian. You have
even read the release notes first.  Everything works well.  Until one
day you notice your laptop gets too warm, and eventually even breaks
because of this.  On deeper research, you notice that this issue was
well-known to Debian, but they refused to deal with it at all. How would
you feel as a user? I think this is an unacceptable perspective.


Ok, what can we do?
1. ignore the problem,
2. document it in the release notes and README.Debian of the kernel,
3. prevent the kernel running on such buggy laptops [is this possible?],
4. backport ACPI from 2.6.19, or use 2.6.19,
5. isolate a smaller fix and apply it.

I personally consider options 1 and 4 to be unacceptable. Option 5 would
be the best, but I have yet to see that this is possible (or rather,
someone knowledgeable enough has time to do it).

So, we should at least document it inside of the release notes, and
README.Debian, and, if possible without being to invasive, get some
check inside the kernel to print a big warning on bootup, or even refuse
to work until some special parameter is used.


How does this proposal sound to the kernel team?



Cheers,
Andi
--
  http://home.arcor.de/andreas-barth/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
On Sat, Dec 23, 2006 at 11:50:40AM +0100, Andreas Barth wrote:

> * Sven Luther ([hidden email]) [061222 05:42]:
> > On Fri, Dec 22, 2006 at 12:53:09PM +0100, Marc 'HE' Brockschmidt wrote:
> > > maximilian attems <[hidden email]> writes:
> > > > On Fri, Dec 22, 2006 at 11:28:29AM +0100, Marc 'HE' Brockschmidt wrote:
> > > >> Fix it or document it, I don't care. But the current state is not
> > > >> releasable.
> > > > we are not talking about "a" patch.
> > > > what you need is an backport of the 2.6.19 acpi release to 2.6.18.
> > >
> > > Read again what I wrote. I will not allow Debian to release with a
> > > Kernel that may damage hardware without even a notice in the release
> > > notes. If you are not able to fix it, note that you have provided a
> > > broken kernel.
> >
> > Cool, let's delay etch a couple of weeks and move to a (now released) 2.6.19
> > kernel, to solve this issue.
>
> Sven, stop this!

Why ? /me guesses that even though debian is about free software, there are
many who feel that freedom of speach is to be banned. Do you also follow that
line of thought ? Was it not enough that some people felt that i should be
burned on the stack for having send mails while i was not at my best ?

Really, this kind of behavior is disgusting.

> I can remember well how you promised that moving to
> 2.6.18 will magically solve almost all of our issues - 6 (or more)
> release critical bugs against 2.6.18 don't show that this has worked so
> well. Please try helping us on solutions rather then breaking things
> again.

I did not promise anything such. I simply stated at that time, that there
where many RC issues which where already fixed in the 2.6.18 tree, and which
would be a pain to backport to the 2.6.17 tree. Quite a different thing, don't
you think ?

I personally will need to maintain 2.6.19+ backports to etch, because there is
no sane way to get Efika support in 2.6.18 without lot of work.

> Please try to look at it from another perspective:
>
> Consider you have bought such a laptop, and you install Debian. You have
> even read the release notes first.  Everything works well.  Until one
> day you notice your laptop gets too warm, and eventually even breaks
> because of this.  On deeper research, you notice that this issue was
> well-known to Debian, but they refused to deal with it at all. How would
> you feel as a user? I think this is an unacceptable perspective.

Bah. hardware which can be broken by software is broken. That said, if in fact
this is not a bug of the bios as was first mentioned here, but that the linux
support is not able to cope with some not usual but legal features of acpi,
then it is another matter.

But you should *NEVER* try to stop discussion about the subject, or bash on
someone for writing a single sentence as i did.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

123