Bug#404143: Fans unreliable under load, permanent memory leak

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Frederik Schueler-2

Hi *,

this is indeed a severe issue which requires all our attention and care
to solve or circumvent in order for nobodies boxes to get any harm, you
know how expensive these laptops are.

I basically see 3 solutions/workarounds:

1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
of the fans - better a noisy laptop until I upgrade the kernel than a
fried box.

2. port 2.6.19 ACPI - noop because way too much work, unless someone
"crazy enough" to accomplish this task.

3. go for 2.6.19

Documenting arbitrary breakage in the release notes is not a solution,
just consider how well manuals are usually read (if at all). Users will
end with damaged hardware and blame us for it.

We released woody with disabled ide dma due to somewhat similar issues
(boxes hanging), so disabling ACPI in 2.6.18 and going for a 2.6.19
based 4.0r1 ASAP seems the best thing to me personally, but this is of
course up for discussion.

Best regards
Frederik Schueler

--
ENOSIG

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:

>
> Hi *,
>
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
>
> I basically see 3 solutions/workarounds:
>
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.
>
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> "crazy enough" to accomplish this task.
>
> 3. go for 2.6.19

As said, i can imagine another solution.

  4. Provide both a stable 2.6.18, and a easily usable backported 2.6.19
  (or newer) kernel, which would be built for etch, but built out of our
  trunk/unstable/testing archive.

Then we can add a bit of logic into d-i's base-installer, so that the kernel
installation step detects the laptops which have this problem (do we know how
to detect them ?), and inform the user and install the newer kernel.

Alternatively, we can go 1, create a -noacpi flavour usable on those laptops,
and install that flavour in d-i. This would probably be the easiest solution.

> Documenting arbitrary breakage in the release notes is not a solution,
> just consider how well manuals are usually read (if at all). Users will
> end with damaged hardware and blame us for it.

/me agrees.

> We released woody with disabled ide dma due to somewhat similar issues
> (boxes hanging), so disabling ACPI in 2.6.18 and going for a 2.6.19
> based 4.0r1 ASAP seems the best thing to me personally, but this is of
> course up for discussion.

I have been thinking of another solution, but since i am kind of ignored or
this is a subject a certain amount of the powers-who-be don't want me to
mention, i doubt it will be gaining much momentum. I am going to propose a
talk at fosdem about these ideas, where issues and everything else can be
discussed.

The idea goes as follows :

  1) We take the kernel out of the main debian archive, into a separate kernel
  pool. This pool would hold the kernel and all assorted modules or
  abi-depending packages. This pool would hold per-abi subpools
  (dists/kernel/2.6.18-3, dists/kernel/2.6.19-1, etc).

  2) Eventually, we have some symlink or mirroring logic which would allow the
  chosen kernel to be accesible from the main archives. This means we can
  prepare kernels in this kernel pool, test it, and once it is ready, do a
  one-pule moving of those packages (without rebuild) into the main pools.

  3) This pool will include both kernel .debs and .udebs. A further
  improvement would allow to split the d-i initramfs into two, having a single
  copy of the non-kernel specific stuff, and a per-flavour copy of the kernel
  initramfs stuff. This way, we move together the kernel and the module
  .udebs, and can easily switch d-i to change kernel version, or even build
  various d-i for various kernel versions. Furthermore this would avoid d-i
  trying to import 2.6.18-3 modules when you build a local 2.6.19-1 kernel,
  and simplify the whole .udeb version checking and downloading logic.

Well, there is more to it, and i will present that at fosdem, but i hope this
already gave you all a taste of what could be, and that these ideas will not
be rejected out of hand, just because they come from me.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Moritz Mühlenhoff-2
In reply to this post by Frederik Schueler-2
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:

>
> Hi *,
>
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
>
> I basically see 3 solutions/workarounds:
>
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.

Do you intent to disable ACPI entirely for all systems?

It appears to me that the affected HP models could be disabled on a per-case
basis using drivers/acpi/blacklist.c

Cheers,
        Moritz


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Frans Pop-3
In reply to this post by Frederik Schueler-2
On Sunday 24 December 2006 03:07, Frederik Schueler wrote:
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> "crazy enough" to accomplish this task.

Did you see that Bas Zoetekouw managed [1, #400488] to solve the problem
for his box by applying some selected patches from upstream?
Wouldn't that be an option?

I'd suggest asking other people that see the same issues to also test a
kernel with these patches and decide based on the results.

[1] http://lists.debian.org/debian-kernel/2006/12/msg00768.html

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
On Sun, Dec 24, 2006 at 02:48:27PM +0100, Frans Pop wrote:
> On Sunday 24 December 2006 03:07, Frederik Schueler wrote:
> > 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> > "crazy enough" to accomplish this task.
>
> Did you see that Bas Zoetekouw managed [1, #400488] to solve the problem
> for his box by applying some selected patches from upstream?
> Wouldn't that be an option?

I thought i saw Maximilian say that there are indeed some patches, but that
the risk to destabilize the whole ACPI subsystem was too great this near to
the etch release. This is exactly the same kind of argument you are using in
d-i, don't you think ?

> I'd suggest asking other people that see the same issues to also test a
> kernel with these patches and decide based on the results.

No, what we would need is huge testing of these patches by people *WHO DIDN'T
SEE THE SAME ISSUES* to make sure there is no regression.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Frederik Schueler-2
In reply to this post by Moritz Mühlenhoff-2
Hello,

On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> Do you intent to disable ACPI entirely for all systems?
>
> It appears to me that the affected HP models could be disabled on a per-case
> basis using drivers/acpi/blacklist.c

This looks like a good idea to me, do we know which models are affected?

OTOH, I doubt we have a complete list of affected models, and who knows
what problems may arise for yet to be released laptops...

Best regards
Frederik Schueler

--
ENOSIG

signature.asc (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
On Sun, Dec 24, 2006 at 03:31:15PM +0100, Frederik Schueler wrote:

> Hello,
>
> On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > Do you intent to disable ACPI entirely for all systems?
> >
> > It appears to me that the affected HP models could be disabled on a per-case
> > basis using drivers/acpi/blacklist.c
>
> This looks like a good idea to me, do we know which models are affected?
>
> OTOH, I doubt we have a complete list of affected models, and who knows
> what problems may arise for yet to be released laptops...

indeed this is a good way.
acpi patches have known side-effects so i would nack any hand-picking
of those.

do we have a report from an affected laptop that booting with noacpi
solves the thermal issues?

i don't agreee with the fuzz about this bug report nor with the severity.
for the sarge release kernel-image 2.6.8 did not boot on a wide range
of market available intel boards and there were overheating bug reports.
completly disabling acpi seems like an overreaction, based on the fact
that the affected laptops are quite specific. on the other hand i'm
delighted to see discussions about the linux-image upgrade in a stable
revision.

happy christmas

--
maks




--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Moritz Mühlenhoff-2
In reply to this post by Frederik Schueler-2
Frederik Schueler wrote:

> Hello,
>
> On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > Do you intent to disable ACPI entirely for all systems?
> >
> > It appears to me that the affected HP models could be disabled on a per-case
> > basis using drivers/acpi/blacklist.c
>
> This looks like a good idea to me, do we know which models are affected?
> OTOH, I doubt we have a complete list of affected models,

Since HP supports Debian officially now, I'm sure Dann or someone else from
HP can provide us a list of affected models.

If not, we can contact Len Brown to get the ACPI-OEM-ID for HP and
blacklist all HP models.

> and who knows what problems may arise for yet to be released laptops...

Well, even Debian can't predict the future :-)
Plus, we can still address these in point updates.

Cheers,
        Moritz


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Martin Michlmayr
* Moritz Muehlenhoff <[hidden email]> [2006-12-24 15:57]:
> Since HP supports Debian officially now

not on laptops.

> I'm sure Dann or someone else from HP can provide us a list of
> affected models.

--
Martin Michlmayr
http://www.cyrius.com/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Sven Luther
In reply to this post by Maximilian Attems-3
On Sun, Dec 24, 2006 at 03:42:46PM +0100, maximilian attems wrote:

> On Sun, Dec 24, 2006 at 03:31:15PM +0100, Frederik Schueler wrote:
> > Hello,
> >
> > On Sun, Dec 24, 2006 at 02:02:58PM +0100, Moritz Muehlenhoff wrote:
> > > Do you intent to disable ACPI entirely for all systems?
> > >
> > > It appears to me that the affected HP models could be disabled on a per-case
> > > basis using drivers/acpi/blacklist.c
> >
> > This looks like a good idea to me, do we know which models are affected?
> >
> > OTOH, I doubt we have a complete list of affected models, and who knows
> > what problems may arise for yet to be released laptops...
>
> indeed this is a good way.
> acpi patches have known side-effects so i would nack any hand-picking
> of those.
>
> do we have a report from an affected laptop that booting with noacpi
> solves the thermal issues?

Ah, neat, there is the noacpi option.

We could simply add this flag to affected laptops by d-i. No need to touch the
kernel or otherwise.

Friendly,

Sven Luther


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Frans Pop-3
In reply to this post by Sven Luther
On Sunday 24 December 2006 15:22, you wrote:
> This is exactly the same kind of
> argument you are using in d-i, don't you think ?

There is a difference between being conservative with fixes for minor
issues and fixes for issues that can fry peoples hardware, don't you
think?

Of course care is needed for such changes and I would certainly encourage
a careful review and possibly some contact with upstream maintainers to
get a better feeling for feasibility and possible risks.

The sooner some action is taken on this, the earlier a kernel could be
uploaded (or made available for testing) and a call for testing be done
on the appropriate lists. If patches do cause regressions there would
still be time to revert them. After all, this is an RC issue and the
release will wait for it.

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Jurij Smakov
In reply to this post by Frederik Schueler-2
On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:

>
> Hi *,
>
> this is indeed a severe issue which requires all our attention and care
> to solve or circumvent in order for nobodies boxes to get any harm, you
> know how expensive these laptops are.
>
> I basically see 3 solutions/workarounds:
>
> 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> of the fans - better a noisy laptop until I upgrade the kernel than a
> fried box.
>
> 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> "crazy enough" to accomplish this task.
I have reviewed the information available on the thermal problems with
HP laptops, and it appears that there is a fairly conservative set of
patches which takes care of the problems (thanks to Bas for pointing
most of the out). I might have missed some upstream bugs, so please
let me know if there is anything else available on the issue. Below is
the summary, describing the relevant patches:

Bug #5534: No thermal events until acpi -t - HP nx6125
------------------------------------------------------
Summary: thermal events generated by the ACPI subsystem do not get
processed by the kernel because both the interrupt due to a thermal
event and event handler are managed by the same thread (kacpid). The
solution is to create a separate thread for the handler, so that the
processing of thermal events may happen asynchronously.

I have identified the following patches which appear to finally resolve
the problem:

#8951 from comment #159 Don't defer release of the global lock.
                                (applies to drivers/acpi/events/evmisc.c)
#8952 from comment #160 Create another workqueue for notify()
                                execution.
                                (applies to drivers/acpi/osl.c)

These patches presumably solve the problem, but the problem persists after
suspend/resume cycle. Followup patches which are supposed to improve the
situation include:

#9631 from comment #171 Improved version of #8952, which prevents
                                flooding of certain machines with thermal
                                events (Linus owns one of those, so he was
                                very unhappy :-)
#9746 from comment #180 Some further improvements. AFAICT, supersedes
                                #9631 and #8952.

So, it looks like we need #8951 and #9746 from this bug. Both apply cleanly
to our 2.6.18-8 source.

Bug #7122: Thermal management problems - HPC nx6325
---------------------------------------------------
Summary: the fans do not come on properly after resume/suspend cycle. Looks
like the reason for the problem is that the ACPI logic which turns on the
fans cannot cope with the fact that it might be needed to execute the
"power on" method for fans a few times before they actually turn on.

The following patches appear to be relevant:

#9254 from comment #37 Reset number of resource references on resume
                                and make power on/off routines more strict and
                                robust.
#9255 from comment #38 Make ACPI suspend handlers to occur before
                                _PTS/_GTS methods and ACPI resume handlers to
                                occur after _WAK method.
#9263 from comment #41 A modification of #9254 to apply to 2.6.19-rc1-mm1

#9355 from comment #48 Implement power resource references as a list,
                                so if two devices using the same power resource,
                                it cannot be disabled by two subsequent calls from
                                a single device. Supersedes #9254 and #9263.
#9337 from comment #52 Improved final version of #9355.

We need #9255 and #9337 from this bug. They apply cleanly to 2.6.18-8.

Bug 7570: S3: fan doesn't work properly after resume
----------------------------------------------------
Summary: one of the four fans is not turned on after suspend/resume cycle.

Relevant patch:

#9802 from comment #8 'force_power_state' flag being set, disables the
                                check if the required power state is the same as
                                the current one. In that case the list of power
                                resources being enabled is the same as the list of
                                power resources being disabled, and follows to
                                consequent enabling and disabling of these resources.

This patch may be included, even though the issue it fixes is not as critical
as the other ones. Applies fine to 2.6.18-8 too.

So far I have not tried building the kernel with this patches, but I think this is
a reasonable way to resolve the problem, as the resulting cumulative patch (attached)
is only 19K.

Best regards,
--
Jurij Smakov                                           [hidden email]
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC

cumulative.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
On Tue, Dec 26, 2006 at 06:09:02PM -0800, Jurij Smakov wrote:

> On Sun, Dec 24, 2006 at 03:07:55AM +0100, Frederik Schueler wrote:
> >
> > Hi *,
> >
> > this is indeed a severe issue which requires all our attention and care
> > to solve or circumvent in order for nobodies boxes to get any harm, you
> > know how expensive these laptops are.
> >
> > I basically see 3 solutions/workarounds:
> >
> > 1. the brutal one: deactivate ACPI in 2.6.18, have the bios keep control
> > of the fans - better a noisy laptop until I upgrade the kernel than a
> > fried box.
> >
> > 2. port 2.6.19 ACPI - noop because way too much work, unless someone
> > "crazy enough" to accomplish this task.
>
> I have reviewed the information available on the thermal problems with
> HP laptops, and it appears that there is a fairly conservative set of
> patches which takes care of the problems (thanks to Bas for pointing
> most of the out). I might have missed some upstream bugs, so please
> let me know if there is anything else available on the issue. Below is
> the summary, describing the relevant patches:

i nack the mentioned patches!

backports are risky, again as you see for the net-r8169-1.patch,
that is a "localized" driver enhancement with big slow down consequences
#400524 and #403782. yes upstream has a fix for that and it should
land soon, but still no one else bothered yet.

the acpi patches may solve the troubles with those stupid HP laptops,
but they have _certainly_ side effects.
if you look at the acpi commits of this day you see that they broke
a toshiba laptop.


back to the facts
* the sarge kernel was released with *huge* thermal problems
  and without any userspace help for early loading
* the etch 2.6.18 linux acpi supports *many* thermal boxes
  thermal hooks load modules at earliest possible stage
* acpi releases have regression tests that are only run
  for the complete release itself

the sanest way is to disable acpi for the affected laptops
and push a newer linux in a point release.
playing with acpi fire is not appropriate for a stable release.

 
--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Jurij Smakov
On Wed, Dec 27, 2006 at 03:40:58AM +0100, maximilian attems wrote:

> > I have reviewed the information available on the thermal problems with
> > HP laptops, and it appears that there is a fairly conservative set of
> > patches which takes care of the problems (thanks to Bas for pointing
> > most of the out). I might have missed some upstream bugs, so please
> > let me know if there is anything else available on the issue. Below is
> > the summary, describing the relevant patches:
>
> i nack the mentioned patches!

Well, that's one in favor and one vote against then.
 
> backports are risky, again as you see for the net-r8169-1.patch,
> that is a "localized" driver enhancement with big slow down consequences
> #400524 and #403782. yes upstream has a fix for that and it should
> land soon, but still no one else bothered yet.

That's because slower networking will not break your hardware.

> the acpi patches may solve the troubles with those stupid HP laptops,
> but they have _certainly_ side effects.
> if you look at the acpi commits of this day you see that they broke
> a toshiba laptop.

Do you have a reference to that? And we do have a possibility to test
the changes pretty extensively by uploading to unstable plus
specifically asking people to test.
 

> back to the facts
> * the sarge kernel was released with *huge* thermal problems
>   and without any userspace help for early loading
> * the etch 2.6.18 linux acpi supports *many* thermal boxes
>   thermal hooks load modules at earliest possible stage
> * acpi releases have regression tests that are only run
>   for the complete release itself
>
> the sanest way is to disable acpi for the affected laptops
> and push a newer linux in a point release.

Do you have a patch which does that? If that would exist, I might
reconsider my position.

> playing with acpi fire is not appropriate for a stable release.

It's all about cost/benefit analysis. In my eyes the benefits of
introducing these patches significantly outweighs the possible
problems, given the proper testing.

Best regards,
--
Jurij Smakov                                           [hidden email]
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
>  
> > backports are risky, again as you see for the net-r8169-1.patch,
> > that is a "localized" driver enhancement with big slow down consequences
> > #400524 and #403782. yes upstream has a fix for that and it should
> > land soon, but still no one else bothered yet.
>
> That's because slower networking will not break your hardware.

why was that fact never rc for sarge?
#259481, #262383
 
> > the acpi patches may solve the troubles with those stupid HP laptops,
> > but they have _certainly_ side effects.
> > if you look at the acpi commits of this day you see that they broke
> > a toshiba laptop.
>
> Do you have a reference to that? And we do have a possibility to test
> the changes pretty extensively by uploading to unstable plus
> specifically asking people to test.

the dsdt of those hp notebooks is quite strange,
if you follow mjg59 posts you have read a funny story:
http://mjg59.livejournal.com/67443.html

the reference is easily readable in the git-commits-mail,
if you interested in a 2006 tarball, i can send it.

check b976fe19acc565e5137e6f12af7b6633a23e6b7c
it reverts your proposed patch.
 
> > and push a newer linux in a point release.
>
> Do you have a patch which does that? If that would exist, I might
> reconsider my position.
 
no that is a release manager position. ;)
but i assume you mean a patch for drivers/acpi/blacklist.c
that should be fairly easy to create once we get dmidecode
output of the bug reporter.

fully untested:

diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
index f9c972b..669d81d 100644
--- a/drivers/acpi/blacklist.c
+++ b/drivers/acpi/blacklist.c
@@ -69,6 +69,9 @@ static struct acpi_blacklist_item acpi_blacklist[] __initdata = {
  "Incorrect _ADR", 1},
  {"ASUS\0\0", "P2B-S   ", 0, ACPI_DSDT, all_versions,
  "Bogus PCI routing", 1},
+ /* HP nx6125 */
+ {"Hewlett-Packard ", "68DTT Ver. F.0", 0xE0000, ACPI_DSDT, all_versions,
+ "Bogus fan support", 1},
 
  {""}
 };

> > playing with acpi fire is not appropriate for a stable release.
>
> It's all about cost/benefit analysis. In my eyes the benefits of
> introducing these patches significantly outweighs the possible
> problems, given the proper testing.

fully agreed.
the cost analysis of acpi patches seems quite high,
that's why we currently have the policy not to add any.
i hate to do name dropping, but that goes back to hch.

best regards

--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Jurij Smakov
In reply to this post by Jurij Smakov
On Tue, Dec 26, 2006 at 06:09:02PM -0800, Jurij Smakov wrote:

> So far I have not tried building the kernel with this patches, but I think this is
> a reasonable way to resolve the problem, as the resulting cumulative patch (attached)
> is only 19K.

Sorry, I made this patch reversed by mistake. Use the one attached to
this message, or apply the old one with 'patch -R' :-P
--
Jurij Smakov                                           [hidden email]
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC

cumulative.patch (19K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Jurij Smakov
In reply to this post by Maximilian Attems-3
On Wed, Dec 27, 2006 at 04:22:45AM +0100, maximilian attems wrote:

> On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
> >  
> > > backports are risky, again as you see for the net-r8169-1.patch,
> > > that is a "localized" driver enhancement with big slow down consequences
> > > #400524 and #403782. yes upstream has a fix for that and it should
> > > land soon, but still no one else bothered yet.
> >
> > That's because slower networking will not break your hardware.
>
> why was that fact never rc for sarge?
> #259481, #262383

Discussing why it was not RC for Sarge seems pretty irrelevant to me.
It's up to release managers what is RC, and Etch release managers have
stated repeatedly that this issue is RC. I happen to agree with their
position.

> the dsdt of those hp notebooks is quite strange,
> if you follow mjg59 posts you have read a funny story:
> http://mjg59.livejournal.com/67443.html
>
> the reference is easily readable in the git-commits-mail,
> if you interested in a 2006 tarball, i can send it.
>
> check b976fe19acc565e5137e6f12af7b6633a23e6b7c
> it reverts your proposed patch.

>From the comments in patch #9746:

 First attempt to create a new thread was done by Peter Wainwright
 He created a bunch of threads, which were stealing work from a kacpid workqueue.
 This patch appeared in 2.6.15 kernel shipped with Ubuntu 6.06 LTS.

 Second attempt was done by me, I created a new thread for each Notify
 event. This worked OK on HP nx machines, but broke Linus' Compaq
 n620c, by producing threads with a speed what they stopped the machine
 completely. Thus this patch was reverted from 18-rc2 as I remember.
 I re-made the patch to create second workqueue just for notify events,
 thus hopping it will not break Linus' machine. Patch was tested on the
 same HP nx machines in #5534 and #7122, but I did not received reply
 from Linus on a test patch sent to him.
 Patch went to 19-rc and was rejected with much fanfare again.
 There was 4th patch, which inserted schedule_timeout(1) into deferred
 execution of kacpid, if we had any notify requests pending, but Linus
 decided that it was too complex (involved either changes to workqueue
 to see if it's empty or atomic inc/dec).
 Now you see last variant which adds yield() to every GPE execution.
 http://bugzilla.kernel.org/show_bug.cgi?id=5534

Obviously, this version of the patch is not the one which was
reverted. It has already went through some pretty stringent review and
incremental improvement.

> fully agreed.
> the cost analysis of acpi patches seems quite high,
> that's why we currently have the policy not to add any.
> i hate to do name dropping, but that goes back to hch.

I'm not aware of any such policy. We have backported a fair amount of
fixes from newer upstream releases, I don't see what qualifies ACPI as
some magic which should not be touched.
--
Jurij Smakov                                           [hidden email]
Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
On Tue, 26 Dec 2006, Jurij Smakov wrote:

> On Wed, Dec 27, 2006 at 04:22:45AM +0100, maximilian attems wrote:
<snipp>
> > why was that fact never rc for sarge?
> > #259481, #262383
>
> Discussing why it was not RC for Sarge seems pretty irrelevant to me.
> It's up to release managers what is RC, and Etch release managers have
> stated repeatedly that this issue is RC. I happen to agree with their
> position.

a broken dsdt is a vendor fault.

for sarge the affected range was across all boxes,
here this affects 2 specific hp laptop models.
 

> > the dsdt of those hp notebooks is quite strange,
> > if you follow mjg59 posts you have read a funny story:
> > http://mjg59.livejournal.com/67443.html
> >
> > the reference is easily readable in the git-commits-mail,
> > if you interested in a 2006 tarball, i can send it.
> >
> > check b976fe19acc565e5137e6f12af7b6633a23e6b7c
> > it reverts your proposed patch.
>
> >From the comments in patch #9746:
>
>  First attempt to create a new thread was done by Peter Wainwright
>  He created a bunch of threads, which were stealing work from a kacpid workqueue.
>  This patch appeared in 2.6.15 kernel shipped with Ubuntu 6.06 LTS.
>
>  Second attempt was done by me, I created a new thread for each Notify
>  event. This worked OK on HP nx machines, but broke Linus' Compaq
>  n620c, by producing threads with a speed what they stopped the machine
>  completely. Thus this patch was reverted from 18-rc2 as I remember.
>  I re-made the patch to create second workqueue just for notify events,
>  thus hopping it will not break Linus' machine. Patch was tested on the
>  same HP nx machines in #5534 and #7122, but I did not received reply
>  from Linus on a test patch sent to him.
>  Patch went to 19-rc and was rejected with much fanfare again.
>  There was 4th patch, which inserted schedule_timeout(1) into deferred
>  execution of kacpid, if we had any notify requests pending, but Linus
>  decided that it was too complex (involved either changes to workqueue
>  to see if it's empty or atomic inc/dec).
>  Now you see last variant which adds yield() to every GPE execution.
>  http://bugzilla.kernel.org/show_bug.cgi?id=5534
>
> Obviously, this version of the patch is not the one which was
> reverted. It has already went through some pretty stringent review and
> incremental improvement.

again i'm highly skeptic about the patch quality.
the semantics of yield() changed fundamentally from 2.4 to 2.6.
afaik only b0rked code in 2.6 needs yield().
 

> > fully agreed.
> > the cost analysis of acpi patches seems quite high,
> > that's why we currently have the policy not to add any.
> > i hate to do name dropping, but that goes back to hch.
>
> I'm not aware of any such policy. We have backported a fair amount of
> fixes from newer upstream releases, I don't see what qualifies ACPI as
> some magic which should not be touched.
> --
> Jurij Smakov                                           [hidden email]
> Key: http://www.wooyd.org/pgpkey/                      KeyID: C99E03CC

the high risk of unwanted/unrelated side effects of the acpi subsys.

--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Steve Langasek
In reply to this post by Jurij Smakov
On Tue, Dec 26, 2006 at 06:52:06PM -0800, Jurij Smakov wrote:
> On Wed, Dec 27, 2006 at 03:40:58AM +0100, maximilian attems wrote:

> > > I have reviewed the information available on the thermal problems with
> > > HP laptops, and it appears that there is a fairly conservative set of
> > > patches which takes care of the problems (thanks to Bas for pointing
> > > most of the out). I might have missed some upstream bugs, so please
> > > let me know if there is anything else available on the issue. Below is
> > > the summary, describing the relevant patches:

> > i nack the mentioned patches!

> Well, that's one in favor and one vote against then.

I'm going to have to side with maks on this.  The last thing we need at this
point of the release is a complex backported patch, targetted or not, that's
going to require a lot of third-party testing before we can even establish
whether it's caused regressions for other systems.

I think that leaves the best option as ACPI blacklisting, in the kernel, for
those models known to have problems.  I think this is strictly better than
trying to have the kernel give a warning when it detects such a model; it's
more likely to reach the target audience than a note in the release notes;
and it's far less of a support burden overall than trying to add in a
special 2.6.19 kernel in and pretend that support for it could be at all
comparable to that of the main kernel for the release.

Cheers,
--
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
[hidden email]                                   http://www.debian.org/


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

Reply | Threaded
Open this post in threaded view
|

Bug#404143: Fans unreliable under load, permanent memory leak

Maximilian Attems-3
In reply to this post by Marc 'HE' Brockschmidt-3
hello,

On Fri, 22 Dec 2006, Marc 'HE' Brockschmidt wrote:

> [hidden email] writes:
> > I'm more than willing to help test a kernel package, but I'll be on
> > [VAC] from 2006-12-23 to 2007-01-03 inclusive.  So, please do not
> > release Etch just now :)
>
> I have ordered an nx6325, which should arrive directly after
> Christmas. I would also be happy to test a fixed kernel. Due to this
> being an overheating problem, I would prefer if you could provide kernel
> images, so that I don't have to compile it.
>
> Marc
> --
> BOFH #34:
> (l)user error

could you please send in the output of:
dmidecode
acpidump

thanks

--
maks


--
To UNSUBSCRIBE, email to [hidden email]
with a subject of "unsubscribe". Trouble? Contact [hidden email]

123