Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
On Sun, Jul 23, 2017 at 10:25:41AM +0100, Ian Campbell wrote:

> Hello kirkwood folks,
>
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
>
> [   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
>
> and
>
> [   71.033784] Unhandled fault: external abort on linefetch (0x014) at 0xb6c73db0
> [   71.041037] pgd = ead9c000
> [   71.043747] [b6c73db0] *pgd=3fd72831
> [   84.144056] Unhandled fault: external abort on linefetch (0x014) at 0xb6d44db0
> [...]

Hi Ian

I have a 6282 system i can try to reproduce this on. It will probably
be a few days before i get around to it.

   Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-2
On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> I have a 6282 system i can try to reproduce this on. It will probably
> be a few days before i get around to it.

Thanks!

For some reason my original mail never made it to debian-arm or linux-
arm-kernel, suspiciously the mail which I attached _also_ doesn't
appear in the archives. I suspect something has decided (false +ve)
that it was spam or a virus or something and blocked it.

FTR below is the full text of my original mail. I'd attach boot-7.log
as well but I worry it might get nobbled again, let me know if anyone
wants it...

Ian.

Hello kirkwood folks,

We have been seeing reports on the Debian arm list about
instability/errors running Debian Stretch (4.9 based) on
various Kirkwood 6282 based QNAP systems. Errors are things like [0,
actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
though]:

[   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
[  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
[  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
[  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
[  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
[ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1

and

[   71.033784] Unhandled fault: external abort on linefetch (0x014) at
0xb6c73db0
[   71.041037] pgd = ead9c000
[   71.043747] [b6c73db0] *pgd=3fd72831
[   84.144056] Unhandled fault: external abort on linefetch (0x014) at
0xb6d44db0
[...]

Many of the affected systems were running Debian Jessie (3.16 based)
fine (as is my own 6282 based system). Some reports have been on
intermediate kernels during the Stretch development cycle, it appears
(again from [0]) that 4.3 was ok but 4.7 was not.

>From the reports it seems that 6281 SoCs are not affected, I only have
a spare 6281 to test on and can confirm that it appears to be fine when
running 4.9.

Some other reports:
https://lists.debian.org/debian-arm/2017/04/msg00056.html
  (might have been an unrelated failing disk though?)
https://lists.debian.org/debian-arm/2017/07/msg00010.html 
  which also includes a "corrupted status flag!!: 0" message making me
  wonder about possible RAM issues.
https://lists.debian.org/debian-arm/2017/07/msg00011.html
  Rob, author of [0], confirming 6281 is ok.
- In the attached mail (which was copied to debian-arm but didn't make
  it to the list archives for some reason so I think it is ok to 
  share) has the results of various experiments by Rob (of [0] fame) 
  including boot-7.log which is a full log with the error occuring.

I've had a look through the kernel git logs, both in the 4.3..4.7 range
for possible culprits and in the 4.9..now range for possible fixes but
couldn't spot anything obvious (I didn't spot very much at all touching
these processors, mostly it looks like changes for the newer Armada
platforms).

I'm afraid I've not been able to find someone to try with a newer
kernel, for my part my only 6282 based system is in "production" as
storage for a mythtv setup so it is tricky to experiment with.

Any ideas what may be going on here?

Cheers,
Ian.

[0] https://lists.debian.org/debian-arm/2016/10/msg00041.html

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
On Wed, Jul 26, 2017 at 05:18:05PM +0100, Ian Campbell wrote:
> On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> > I have a 6282 system i can try to reproduce this on. It will probably
> > be a few days before i get around to it.
>
> Thanks!
>
> For some reason my original mail never made it to debian-arm or linux-
> arm-kernel, suspiciously the mail which I attached _also_ doesn't
> appear in the archives.

I suspect it is because you used attachments. They are frowned upon.

  Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-2
On Wed, 2017-07-26 at 19:55 +0200, Andrew Lunn wrote:

> On Wed, Jul 26, 2017 at 05:18:05PM +0100, Ian Campbell wrote:
> > On Wed, 2017-07-26 at 17:22 +0200, Andrew Lunn wrote:
> > > I have a 6282 system i can try to reproduce this on. It will
> probably
> > > be a few days before i get around to it.
> > 
> > Thanks!
> > 
> > For some reason my original mail never made it to debian-arm or
> linux-
> > arm-kernel, suspiciously the mail which I attached _also_ doesn't
> > appear in the archives.
>
> I suspect it is because you used attachments. They are frowned upon.

Ah yes, that might explain it, I remember now that l-a-k frowns on
them. debian-arm is generally ok with them, but perhaps they were too
big in this case.

Thanks for the tip!

Ian.

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
In reply to this post by Ian Campbell-2
> Hello kirkwood folks,
>
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
>
> [   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
>
> and
>
> [   71.033784] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6c73db0
> [   71.041037] pgd = ead9c000
> [   71.043747] [b6c73db0] *pgd=3fd72831
> [   84.144056] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6d44db0
> [...]

So far, i've not been able to reproduce this. I have 6282 based QNAP
NAS box, with a single disk. Since this is a kernel hacking box, i
tftpboot and don't use an initrd. I've been using the
mvebu_v5_defconfig kernel configuration and i have tried v4.13-rc2,
v4.12, v4.10.0 and v3.9.30. And i have sid for user space.

       Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Rob J. Epping-3
Hi Andrew and list,

On 28-07-17 18:33, Andrew Lunn wrote:
> So far, i've not been able to reproduce this. I have 6282 based QNAP
> NAS box, with a single disk. Since this is a kernel hacking box, i
> tftpboot and don't use an initrd. I've been using the
> mvebu_v5_defconfig kernel configuration and i have tried v4.13-rc2,
> v4.12, v4.10.0 and v3.9.30. And i have sid for user space.

I'm one of the persons that reported the issue.

I have both a 6281 and a 6282 based device. The 6281 based device (QNAP
TS-219) is on Debian stretch state Juli 25th with kernel
linux-image-4.9.0-3-marvell 4.9.30-2+deb9u2 and initramfs mudules set to
most. The 6282 based device (QNAP TS-221) is stuck on jessie with kernel
linux-image-4.3.0-0.bpo.1-kikwood 4.3.5-1~bpo8+1.
Both devices are in use for personal use, so when the OS is up and
running there are processes active causing network and disk activity.

The way I test is by creating an initrd and vmlinuz from the 6281 device
for the 6282 device using the attached script putting the files on a FAT
based USB key mounted under /mnt and booting with the u-boot commands
printed. The command is not complete, it is missing the USB init parts.
Then I move de USB key and the disks over to the 6282 based device and
boot with the vmlinuz and initrd from the USB key.

Last test was done by installing (but not flashing) the same kernel
image on both systems and just moving initrd and vmlinuz over with the
USB key.

As you can see from the script I did try TFTP booting as well. I do
recall having the issues then as well, though it has been a while. Would
it be possible for you to try with a USB key?

Also are you just booting the kernel or are there processes active?
I did notice the last time the system felt sluggish but it took a while
for error messages to appear.

>        Andrew

GRTNX,
RobJE

mk-tftp-breis (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
In reply to this post by Andrew Lunn-2
On Sun, Jul 23, 2017 at 10:25:41AM +0100, Ian Campbell wrote:

> Hello kirkwood folks,
>
> We have been seeing reports on the Debian arm list about
> instability/errors running Debian Stretch (4.9 based) on
> various Kirkwood 6282 based QNAP systems. Errors are things like [0,
> actually one of the earlier pre-4.9 reports, same symptoms as with 4.9
> though]:
>
> [   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
>
> and
>
> [   71.033784] Unhandled fault: external abort on linefetch (0x014) at 0xb6c73db0
> [   71.041037] pgd = ead9c000
> [   71.043747] [b6c73db0] *pgd=3fd72831
> [   84.144056] Unhandled fault: external abort on linefetch (0x014) at 0xb6d44db0
> [...]

I've now tried the debian kernel configuration from sid which is for
4.11. That also has not provoked the issue.

So i'm thinking this has to be related to bits of hardware i'm not
using. I don't have anything on the PCIe bus, i don't have any USB
devices plugged in, i don't use the mtd devices, etc.

Could somebody who does have the issue describe their system? Could
they pull out all there USB devices and see if that stops the
issues. Remove the driver for PCIe devices, if possible.

 Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

R Epping - debian-arm
On 29-07-17 17:50, Andrew Lunn wrote:
> So i'm thinking this has to be related to bits of hardware i'm not
> using. I don't have anything on the PCIe bus, i don't have any USB
> devices plugged in, i don't use the mtd devices, etc.
>
> Could somebody who does have the issue describe their system? Could
> they pull out all there USB devices and see if that stops the
> issues. Remove the driver for PCIe devices, if possible.

This remark triggered me. Booting without USB and PCIe will be a
challenge I'll tackle another day, but looking at the differences in
hardware as observed by the kernel is easy.

Attached are two files containing the lshw output for both the kirkwood
and marvell kernel flavors. Except for the obvious differences like
versions, I find the below differences. I do not know if these
differences are related to the observed issues.

- IRQ is in the 3x range on marvell and 8x range in kirkwood.
- On marvell PCI bridges have additional capabilities: pciexpress and
cap_list
- usbhost:0 and usbhost:1 swapped between marvell and kirkwood.

For reverence I also added the 6281 lshw output. lshw versions between
devices are different.

>  Andrew

GRTNX,
RobJE

lshw-6281-marvell.txt (3K) Download Attachment
lshw-6282-kirkwood.txt (8K) Download Attachment
lshw-6282-marvell.txt (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Martin Michlmayr
Quite a few Debian users on QNAP are affected by this "external abort
on linefetch" issue.  Ian Campell raised this with Andrew Lunn
(upstream kernel) last year but Andrew couldn't reproduce it:
https://lists.debian.org/debian-arm/2017/07/msg00054.html

RobJE provided additional information but forgot to CC Andrew:
https://lists.debian.org/debian-arm/2017/07/msg00059.html

Timo Jyrinki is happy to run some tests.  He's affected and has a
serial console.  The bug is still there in the 4.9 kernel we're
shipping with Debian kernel.

Andrew, what information or access do you need so this can be tracked
down?

Thank you!

--
Martin Michlmayr
http://www.cyrius.com/

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Timo Jyrinki-4
2018-04-25 14:16 GMT+03:00 Martin Michlmayr <[hidden email]>:
> Timo Jyrinki is happy to run some tests.  He's affected and has a
> serial console.  The bug is still there in the 4.9 kernel we're
> shipping with Debian kernel.
>
> Andrew, what information or access do you need so this can be tracked
> down?

Yesterday I tried booting with mem=512M added to the u-boot's setenv
bootargs, and wasn't able to reproduce the problem. Booting again
without the parameter it was there again. I repeated a couple of times
with same results, although sometimes it took some time for the
problem to occur in the normal 1GB RAM use case so I'm not 100% sure
of how bullet proof the workaround is. I tried to use at least some
memory by starting Debian installer fetching, logging into it via ssh
etc.

Could someone else try it out? Double-check the parameter worked with
'free'. I'm tempted to make a backup of my current / + flash
partitions and dist-upgrade to stretch. On that note, what would be
the easiest way to set the mem=512M as the default for normal boots?

Andrew wasn't able to reproduce the problem on his 6282 machine. Would
it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
(https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)

-Timo

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
On Thu, May 24, 2018 at 12:40:06PM +0300, Timo Jyrinki wrote:

> 2018-04-25 14:16 GMT+03:00 Martin Michlmayr <[hidden email]>:
> > Timo Jyrinki is happy to run some tests.  He's affected and has a
> > serial console.  The bug is still there in the 4.9 kernel we're
> > shipping with Debian kernel.
> >
> > Andrew, what information or access do you need so this can be tracked
> > down?
>
> Yesterday I tried booting with mem=512M added to the u-boot's setenv
> bootargs, and wasn't able to reproduce the problem. Booting again
> without the parameter it was there again. I repeated a couple of times
> with same results, although sometimes it took some time for the
> problem to occur in the normal 1GB RAM use case so I'm not 100% sure
> of how bullet proof the workaround is. I tried to use at least some
> memory by starting Debian installer fetching, logging into it via ssh
> etc.
>
> Could someone else try it out? Double-check the parameter worked with
> 'free'. I'm tempted to make a backup of my current / + flash
> partitions and dist-upgrade to stretch. On that note, what would be
> the easiest way to set the mem=512M as the default for normal boots?
>
> Andrew wasn't able to reproduce the problem on his 6282 machine. Would
> it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
> (https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)

Hi Timo

root@qnap:~# cat /proc/meminfo
MemTotal:         511516 kB

So lets think about what this could mean...

Is the 1G implemented using two RAM chips? Do you have photos of your
board? Can you identify the chips? Does u-boot say anything useful
about the RAM?

Could the u-boot you have not be correctly initialising the second RAM
chip? Are you using the stock QNAP/marvell u-boot, or have you
upgraded u-boot?

Is there a hole in the address range between the two RAMs? The kernel
should be able to handle that, but i don't know if you have to tell
it, or if it can figure it out itself. Can you see anything about this
in the kernel logs, or u-boot?

Do we see the physical address being accessed when we get the abort?
Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
gone over the boarder between the end of the first RAM and the
beginning of the second RAM? Seems a bit unlikely....

   Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

R Epping - debian-arm
On 24-05-18 14:30, Andrew Lunn wrote:

> On Thu, May 24, 2018 at 12:40:06PM +0300, Timo Jyrinki wrote:
>> 2018-04-25 14:16 GMT+03:00 Martin Michlmayr <[hidden email]>:
>>> Timo Jyrinki is happy to run some tests.  He's affected and has a
>>> serial console.  The bug is still there in the 4.9 kernel we're
>>> shipping with Debian kernel.
>>>
>>> Andrew, what information or access do you need so this can be tracked
>>> down?
>>
>> Yesterday I tried booting with mem=512M added to the u-boot's setenv
>> bootargs, and wasn't able to reproduce the problem. Booting again
>> without the parameter it was there again. I repeated a couple of times
>> with same results, although sometimes it took some time for the
>> problem to occur in the normal 1GB RAM use case so I'm not 100% sure
>> of how bullet proof the workaround is. I tried to use at least some
>> memory by starting Debian installer fetching, logging into it via ssh
>> etc.
>>
>> Could someone else try it out? Double-check the parameter worked with
>> 'free'. I'm tempted to make a backup of my current / + flash
>> partitions and dist-upgrade to stretch. On that note, what would be
>> the easiest way to set the mem=512M as the default for normal boots?
>>
>> Andrew wasn't able to reproduce the problem on his 6282 machine. Would
>> it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
>> (https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
>
> Hi Timo
>
> root@qnap:~# cat /proc/meminfo
> MemTotal:         511516 kB
>
> So lets think about what this could mean...
>
> Is the 1G implemented using two RAM chips? Do you have photos of your
> board? Can you identify the chips? Does u-boot say anything useful
> about the RAM?
>
> Could the u-boot you have not be correctly initialising the second RAM
> chip? Are you using the stock QNAP/marvell u-boot, or have you
> upgraded u-boot?
>
> Is there a hole in the address range between the two RAMs? The kernel
> should be able to handle that, but i don't know if you have to tell
> it, or if it can figure it out itself. Can you see anything about this
> in the kernel logs, or u-boot?
>
> Do we see the physical address being accessed when we get the abort?
> Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
> gone over the boarder between the end of the first RAM and the
> beginning of the second RAM? Seems a bit unlikely....
>
>    Andrew

Timo's remark about memory triggered me.

I am not convinced it is related to u-boot or memory chips. Specifically
because kernel lenny 4.3.0-0.bpo.1-kirkwood (4.3.5-1~bpo8+1) does not
have these issues. For me the issues started after the flavour change
from kirkwood to marvell.

I tried running strecth 4.16.0-0.bpo.1-marvell (4.16.5-1~bpo9+1) with
mem=512M which was stable for more than 24 hours. Comparing dmesg output
one interesting line was missing in the 512M version:

        HighMem zone: 65536 pages, LIFO batch:15

With mem=768M also kernel boots with no bug and error reports. 768M is
the border where (according to dmesg) HighMem starts. With no mem= (i.e.
using the full 1024M) just booting already prints a lot of error
messages for me.

I think changes in handling HighMem between kirkwood and marvell
flavours are the cause, though have no way other than the test above to
confirm. Maybe information displayed in the error messages can help
confirm issue is related to HighMem?

When there is anything I can test please let me know.

GRTNX,
RobJE

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
1On Sun, May 27, 2018 at 01:39:35PM +0200, RobJE Debian ARM wrote:

> On 24-05-18 14:30, Andrew Lunn wrote:
> > On Thu, May 24, 2018 at 12:40:06PM +0300, Timo Jyrinki wrote:
> >> 2018-04-25 14:16 GMT+03:00 Martin Michlmayr <[hidden email]>:
> >>> Timo Jyrinki is happy to run some tests.  He's affected and has a
> >>> serial console.  The bug is still there in the 4.9 kernel we're
> >>> shipping with Debian kernel.
> >>>
> >>> Andrew, what information or access do you need so this can be tracked
> >>> down?
> >>
> >> Yesterday I tried booting with mem=512M added to the u-boot's setenv
> >> bootargs, and wasn't able to reproduce the problem. Booting again
> >> without the parameter it was there again. I repeated a couple of times
> >> with same results, although sometimes it took some time for the
> >> problem to occur in the normal 1GB RAM use case so I'm not 100% sure
> >> of how bullet proof the workaround is. I tried to use at least some
> >> memory by starting Debian installer fetching, logging into it via ssh
> >> etc.
> >>
> >> Could someone else try it out? Double-check the parameter worked with
> >> 'free'. I'm tempted to make a backup of my current / + flash
> >> partitions and dist-upgrade to stretch. On that note, what would be
> >> the easiest way to set the mem=512M as the default for normal boots?
> >>
> >> Andrew wasn't able to reproduce the problem on his 6282 machine. Would
> >> it be that he has QNAP TS-219P+ or similar that has only 512MB RAM?
> >> (https://www.cyrius.com/debian/kirkwood/qnap/ts-219/specs/)
> >
> > Hi Timo
> >
> > root@qnap:~# cat /proc/meminfo
> > MemTotal:         511516 kB
> >
> > So lets think about what this could mean...
> >
> > Is the 1G implemented using two RAM chips? Do you have photos of your
> > board? Can you identify the chips? Does u-boot say anything useful
> > about the RAM?
> >
> > Could the u-boot you have not be correctly initialising the second RAM
> > chip? Are you using the stock QNAP/marvell u-boot, or have you
> > upgraded u-boot?
> >
> > Is there a hole in the address range between the two RAMs? The kernel
> > should be able to handle that, but i don't know if you have to tell
> > it, or if it can figure it out itself. Can you see anything about this
> > in the kernel logs, or u-boot?
> >
> > Do we see the physical address being accessed when we get the abort?
> > Is it in the top 1/2 of the RAM? Could it be a DMA operation which has
> > gone over the boarder between the end of the first RAM and the
> > beginning of the second RAM? Seems a bit unlikely....
> >
> >    Andrew
>
> Timo's remark about memory triggered me.
>
> I am not convinced it is related to u-boot or memory chips. Specifically
> because kernel lenny 4.3.0-0.bpo.1-kirkwood (4.3.5-1~bpo8+1) does not
> have these issues. For me the issues started after the flavour change
> from kirkwood to marvell.
>
> I tried running strecth 4.16.0-0.bpo.1-marvell (4.16.5-1~bpo9+1) with
> mem=512M which was stable for more than 24 hours. Comparing dmesg output
> one interesting line was missing in the 512M version:
>
> HighMem zone: 65536 pages, LIFO batch:15
>
> With mem=768M also kernel boots with no bug and error reports. 768M is
> the border where (according to dmesg) HighMem starts. With no mem= (i.e.
> using the full 1024M) just booting already prints a lot of error
> messages for me.

Hi Rob

Since my QNAP only has 512M, there is not too much experimentation i
can do.

Could you try changing "Memory split" to "3G/1G user/kernel split (for
full 1G low memory)". You should then see that the lowmem in the
Virtual kernel memory layout table goes from starting at 0xc0000000 to
starting at 0xB0000000. I hope it will then not use high mem, and
still give you the full 1G of RAM.

    Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Tixy-2
On Mon, 2018-05-28 at 18:00 +0200, Andrew Lunn wrote:
> Hi Rob
>
> Since my QNAP only has 512M, there is not too much experimentation i
> can do.
>
> Could you try changing "Memory split" to "3G/1G user/kernel split (for
> full 1G low memory)".

Don't you mean change it to 2G/2G? That's what would be needed to let
the kernel map the whole 1GB of physical RAM in it's address region and
so not need the high memory mechanism.

>  You should then see that the lowmem in the
> Virtual kernel memory layout table goes from starting at 0xc0000000 to
> starting at 0xB0000000. I hope it will then not use high mem, and
> still give you the full 1G of RAM.
>

--
Tixy

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Andrew Lunn-2
On Tue, May 29, 2018 at 06:50:16AM +0100, Jonathan Medhurst wrote:

> On Mon, 2018-05-28 at 18:00 +0200, Andrew Lunn wrote:
> > Hi Rob
> >
> > Since my QNAP only has 512M, there is not too much experimentation i
> > can do.
> >
> > Could you try changing "Memory split" to "3G/1G user/kernel split (for
> > full 1G low memory)".
>
> Don't you mean change it to 2G/2G? That's what would be needed to let
> the kernel map the whole 1GB of physical RAM in it's address region and
> so not need the high memory mechanism.

Hi Jonathan

The comment says:

        config VMSPLIT_3G_OPT
                depends on !ARM_LPAE
                bool "3G/1G user/kernel split (for full 1G low memory)"

So i'm thinking that means it should support up to 1G of RAM using
this split. It puts the split at 0xB0000000, so it is more like
2.75G/1.25G.

2G/2G would also work, but that is a bigger change. And i don't know
how many devices are being supported by this one kernel. It should be
possible to build one kernel which runs on all ARM v5 machines, not
just Marvell ARM v5 machines. This is the sort of change which will
affect them all. So i wanted to keep the change as small as possible.

       Andrew

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Tixy-2
On Tue, 2018-05-29 at 13:51 +0200, Andrew Lunn wrote:

> On Tue, May 29, 2018 at 06:50:16AM +0100, Jonathan Medhurst wrote:
> > On Mon, 2018-05-28 at 18:00 +0200, Andrew Lunn wrote:
> > > Hi Rob
> > >
> > > Since my QNAP only has 512M, there is not too much
> > > experimentation i
> > > can do.
> > >
> > > Could you try changing "Memory split" to "3G/1G user/kernel split
> > > (for
> > > full 1G low memory)".
> >
> > Don't you mean change it to 2G/2G? That's what would be needed to
> > let
> > the kernel map the whole 1GB of physical RAM in it's address region
> > and
> > so not need the high memory mechanism.
>
> Hi Jonathan
>
> The comment says:
>
>         config VMSPLIT_3G_OPT
>                 depends on !ARM_LPAE
>                 bool "3G/1G user/kernel split (for full 1G low
> memory)"
>
> So i'm thinking that means it should support up to 1G of RAM using
> this split. It puts the split at 0xB0000000, so it is more like
> 2.75G/1.25G.

Ah, you are right, I thought you were suggesting VMSPLIT_3G. I didn't
notice that the kernel had sprouted an extra VMSPLIT_3G_OPT option a
couple of years ago.

-- 
Tixy

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Timo Jyrinki-4
In reply to this post by Andrew Lunn-2
2018-05-28 19:00 GMT+03:00 Andrew Lunn <[hidden email]>:
> Could you try changing "Memory split" to "3G/1G user/kernel split (for
> full 1G low memory)". You should then see that the lowmem in the
> Virtual kernel memory layout table goes from starting at 0xc0000000 to
> starting at 0xB0000000. I hope it will then not use high mem, and
> still give you the full 1G of RAM.

Someone could give newbie tips on making a bootable kernel that I
could load from u-boot. I tried compiling one Debian's kernel simply
with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
config.marvell under debian/, but with the vmlinuz generated I got
"Bad Magic Number" when I tried to load it with u-boot over TFTP.

Given that the installer-armel kernels that do boot over U-Boot have
also kernel variants 6281 and 6282 while the kernel from linux package
does not have variants, I'm certainly missing something useful (and my
free time is severely limited, I didn't yet find information what I'd
need on my own).

Regardless I've now modified the default bootargs in u-boot with
printenv bootargs -> setenv appending mem=768M -> saveenv, and
dist-upgraded to stretch. It's working flawlessly with 768MB RAM!

Now on stretch I could probably also just install the built deb
packages, but I'd rather do this memory corruption testing from a
"live" session over TFTP instead of booting my regular system with a
test kernel.

-Timo

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-5
On Sat, 2018-06-02 at 15:36 +0300, Timo Jyrinki wrote:

> 2018-05-28 19:00 GMT+03:00 Andrew Lunn <[hidden email]>:
> > Could you try changing "Memory split" to "3G/1G user/kernel split
> (for
> > full 1G low memory)". You should then see that the lowmem in the
> > Virtual kernel memory layout table goes from starting at 0xc0000000
> to
> > starting at 0xB0000000. I hope it will then not use high mem, and
> > still give you the full 1G of RAM.
>
> Someone could give newbie tips on making a bootable kernel that I
> could load from u-boot. I tried compiling one Debian's kernel simply
> with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
> config.marvell under debian/, but with the vmlinuz generated I got
> "Bad Magic Number" when I tried to load it with u-boot over TFTP.

You need to append a dtb and then encode in u-boot's uImage format.
e.g.

   cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
   sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage

Now the uImage file ought to be bootable with `bootm`, load it to
0x800000 and an initrd (if using one) to 0xa00000  then `bootm
0x800000`.

Be sure to pick the correct dtb variant for your board, it might boot
with the wrong one but you'll potentially be missing some peripherals
etc.

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Ian Campbell-5
On Sat, 2018-06-02 at 16:55 +0100, Ian Campbell wrote:

> On Sat, 2018-06-02 at 15:36 +0300, Timo Jyrinki wrote:
> > 2018-05-28 19:00 GMT+03:00 Andrew Lunn <[hidden email]>:
> > > Could you try changing "Memory split" to "3G/1G user/kernel split
> > (for
> > > full 1G low memory)". You should then see that the lowmem in the
> > > Virtual kernel memory layout table goes from starting at
> 0xc0000000
> > to
> > > starting at 0xB0000000. I hope it will then not use high mem, and
> > > still give you the full 1G of RAM.
> >
> > Someone could give newbie tips on making a bootable kernel that I
> > could load from u-boot. I tried compiling one Debian's kernel
> simply
> > with debuild in a stretch chroot, adding VMSPLIT_3G_OPT=y to
> > config.marvell under debian/, but with the vmlinuz generated I got
> > "Bad Magic Number" when I tried to load it with u-boot over TFTP.
>
> You need to append a dtb and then encode in u-boot's uImage format.
> e.g.
>
>    cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb
> > x
>    sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000
> -d x uImage

You don't need that `sudo` BTW unless uImage is in an root-only path.

Reply | Threaded
Open this post in threaded view
|

Re: "external abort on linefetch (0x814)" on Kirkwood 6282 SoC

Timo Jyrinki-4
In reply to this post by Ian Campbell-5
2018-06-02 18:55 GMT+03:00 Ian Campbell <[hidden email]>:
> You need to append a dtb and then encode in u-boot's uImage format.
> e.g.
>
>    cat arch/arm/boot/zImage arch/arm/boot/dts/kirkwood-ts419-6281.dtb > x
>    sudo mkimage -A arm -T kernel -O linux -C none -a 0x8000 -e 0x8000 -d x uImage

Thank you! Now it's all coming back to me, I'm not sure if I've played
with these since Neo FreeRunner times.

So the good news is that with this kernel
kernel-kirkwood-ts219-6282-split3gopt from
https://people.debian.org/~timo/qnap/ (initrd from
http://ftp.debian.org/debian/dists/stretch/main/installer-armel/current/images/kirkwood/network-console/qnap/ts-21x/)
I'm getting full 1GB RAM without the errors!

I do seem to have a problem with networking, not sure because of my
custom build somehow otherwise or if VMSPLIT_3G_OPT=y could affect it.

In the same directory I've also included the zImage, in case you want
to combine it with a different dtb than the kirkwood-ts219-6282 one
and create your own uImage.

-Timo

12