Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-5
(adding debian-kernel, context: external aborts on qnap/marvell systems
with 1G of RAM, avoided with VMSPLIT_3G_OPT=y).

On Sat, 2018-06-02 at 21:31 +0200, Andrew Lunn wrote:

> On Sat, Jun 02, 2018 at 09:48:47PM +0300, Timo Jyrinki wrote:
> > 2018-06-02 18:55 GMT+03:00 Ian Campbell <[hidden email]>:
> > > You need to append a dtb and then encode in u-boot's uImage format.
> > > e.g.
> > >
> > >    cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
> > >    sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage
> >
> > Thank you! Now it's all coming back to me, I'm not sure if I've played
> > with these since Neo FreeRunner times.
> >
> > So the good news is that with this kernel
> > kernel-kirkwood-ts219-6282-split3gopt from
> > https://people.debian.org/~timo/qnap/ (initrd from
> > http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
> > I'm getting full 1GB RAM without the errors!
>
> Cool. Thanks for testing.
>
> Now, the question is, is this an O.K. workaround?

Hard to say for sure. IIRC the downside of the VMSPLIT_3G_OPT
workaround is a slightly smaller virtual address space (from 3G down to
 2.75G) for the userspace part of a process, which would mean that
applications which really needed the full space would suffer.

There are some use case which need this, linking large packages comes
immediately to mind, but I don't think Debian runs any armel buildd's
on armel (they are running as chroots on armhf systems).

With only 1G of physical RAM anything using the full 3G would be
already so far into swapping hell that it seems like it would be pretty
unusable. So maybe we can assert that it is unlikely that there is any
real world usage that would be impacted by this change.

Only other things which come to mind are applications which require a
full 3G of address space but which don't populate it all with RAM
somehow (v. sparse layouts for dynamical languages perhaps?) or which
are simply buggy with the smaller size (I don't know if there are
precedents on other archs or other arm flavours for this). These seem
unlikely to me, but frankly I'm basing that on no data at all.

Debian uses a Marvell specific kernel, so we don't need to worry about
the impact on other platforms.

> Or do we need to figure out why highmem breaks on Kirkwood?

I guess it would be nice from an upstream PoV to know what was going on
-- in particular in case there were to be other more subtle side
effects or corruption possible.

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
> With only 1G of physical RAM anything using the full 3G would be
> already so far into swapping hell that it seems like it would be pretty
> unusable. So maybe we can assert that it is unlikely that there is any
> real world usage that would be impacted by this change.

Hi Ian

That was what i was thinking. In theory, one of the kirkwood SoCs can
have 2GB of RAM. But i've not seen many 1G machines, let alone 2G.

> Debian uses a Marvell specific kernel, so we don't need to worry about
> the impact on other platforms.

That i was not sure about. Are there any plans to merge all ARM v5
kernels together? Then this would affect more machines.

> > Or do we need to figure out why highmem breaks on Kirkwood?
>
> I guess it would be nice from an upstream PoV to know what was going on
> -- in particular in case there were to be other more subtle side
> effects or corruption possible.

I might be able to hack together a 3.5/0.5G split, so forcing some of
the 512MB of RAM i have in my Kirkwood into highmem. Hopefully i can
then reproduce the issue.

     Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-5
On Sat, 2018-06-09 at 16:23 +0200, Andrew Lunn wrote:
> > Debian uses a Marvell specific kernel, so we don't need to worry
> about
> > the impact on other platforms.
>
> That i was not sure about. Are there any plans to merge all ARM v5
> kernels together?

Not AFAIK, marvell is the only armv5 flavour left in Debian and armel
is well past the point where more are likely to be added.

> > > Or do we need to figure out why highmem breaks on Kirkwood?
> >
> > I guess it would be nice from an upstream PoV to know what was going on
> > -- in particular in case there were to be other more subtle side
> > effects or corruption possible.
>
> I might be able to hack together a 3.5/0.5G split, so forcing some of
> the 512MB of RAM i have in my Kirkwood into highmem. Hopefully i can
> then reproduce the issue.

A 3.5/0.5 split is a good idea, hadn't occurred to me. None of my QNAP
boxes have more than 512M either.

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Martin Michlmayr
* Damien <[hidden email]> [2018-07-03 22:33]:
> Is there any plan to have this fixed kernel in Debian mainstream, or in a
> dpkg ?

I think we haven't quite established what the best course of action
is:

1) The config option change works, but some networking issues were
mentioned.  Someone needs to figure out whether that's related.

2) Andrew managed to reproduce the issue, so there's hope a real fix
will be found.  But maybe I'm getting my hope up too high ;)

--
Martin Michlmayr
https://www.cyrius.com/

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
On Tue, Jul 03, 2018 at 11:27:32PM +0200, Martin Michlmayr wrote:
> * Damien <[hidden email]> [2018-07-03 22:33]:
> > Is there any plan to have this fixed kernel in Debian mainstream, or in a
> > dpkg ?
>
> I think we haven't quite established what the best course of action
> is:
>
> 1) The config option change works, but some networking issues were
> mentioned.  Someone needs to figure out whether that's related.

I would be interested in knowing what the network issues were? They
might be a pointer to what is going wrong with high pages.
>
> 2) Andrew managed to reproduce the issue, so there's hope a real fix
> will be found.  But maybe I'm getting my hope up too high ;)

I can reproduce it. But none of the kernel debug tools helped me get
any further. I think the next step is to explain the problem to
Russell King and see if he has any ideas.

        Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Martin Michlmayr
* Andrew Lunn <[hidden email]> [2018-07-04 20:48]:
> > 1) The config option change works, but some networking issues were
> > mentioned.  Someone needs to figure out whether that's related.
>
> I would be interested in knowing what the network issues were? They
> might be a pointer to what is going wrong with high pages.

Copying Timo Jyrinki.
--
Martin Michlmayr
https://www.cyrius.com/

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Timo Jyrinki-4
In reply to this post by Andrew Lunn-2
2018-07-04 21:48 GMT+03:00 Andrew Lunn <[hidden email]>:
>> 1) The config option change works, but some networking issues were
>> mentioned.  Someone needs to figure out whether that's related.
>
> I would be interested in knowing what the network issues were? They
> might be a pointer to what is going wrong with high pages.

"Network doesn't work". I'm not sure what's going on, but the
installer isn't able to enable the network, even though the network
device exists and can be configured. Cable is connected similarly to
normal operation and lights are blinking (and obviously the system was
just booted with TFTP from u-boot too).

I tried now again, this time with the "zImage_new" which was the one
recompiled natively. It didn't make a difference as the symptoms
seemed similar, so I put some logs (slightly manually redacted for
possible unique identifiers) at:
https://people.debian.org/~timo/qnap/split3gopt-logs/

Syslog shows both installer and me trying to get life into the
network. I tried setting IP and default route manually and pinging the
router but nothing.

Adding to my earlier instructions, if one wants to test those kernels
built by me you now need to fetch the older initrd to go along with
them from: http://snapshot.debian.org/archive/debian/20180605T102632Z/dists/stretch/main/installer-armel/20170615%2Bdeb9u3/images/kirkwood/network-console/qnap/ts-21x/initrd

-Timo