error since moving to marvell kernel - BUG: Bad rss-counter state

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

error since moving to marvell kernel - BUG: Bad rss-counter state

R Epping - debian-arm
Hello All,

Today I tried again to run linux-image-marvell (4.7.0-0.bpo.1 in this
case) on my QNAP TS-221. I had issues with it in the past but did not
get around to file a bug.

I have the same version on a TS-219 which is running fine.

Booting the TS-221 generates two types of error messages.
-->8--
[   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
[  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
[  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
[  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
[  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
[ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
-->8--

and

-->8--
[   71.033784] Unhandled fault: external abort on linefetch (0x014) at
0xb6c73db0
[   71.041037] pgd = ead9c000
[   71.043747] [b6c73db0] *pgd=3fd72831
[   84.144056] Unhandled fault: external abort on linefetch (0x014) at
0xb6d44db0
[   84.151313] pgd = eaee8000
[   84.154023] [b6d44db0] *pgd=3fc89831
[ 1300.378292] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.385547] pgd = e113c000
[ 1300.388257] [000de9e4] *pgd=3fcd9831
[ 1300.393119] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.400370] pgd = e113c000
[ 1300.403081] [000de9e4] *pgd=3fcd9831
[ 1300.407834] Unhandled fault: external abort on linefetch (0x014) at
0x000de9c0
[ 1300.415089] pgd = e113c000
[ 1300.417800] [000de9c0] *pgd=3fcd9831
[ 1300.424135] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.431387] pgd = e113c000
[ 1300.434098] [000de9e4] *pgd=3fcd9831
[ 1300.438852] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.446107] pgd = e113c000
[ 1300.448818] [000de9e4] *pgd=3fcd9831
[ 1300.453525] Unhandled fault: external abort on linefetch (0x014) at
0x000de9c0
[ 1300.460773] pgd = e113c000
[ 1300.463484] [000de9c0] *pgd=3fcd9831
[ 1300.469987] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.477246] pgd = e113c000
[ 1300.479957] [000de9e4] *pgd=3fcd9831
[ 1300.484726] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.491983] pgd = e113c000
[ 1300.494693] [000de9e4] *pgd=3fcd9831
[ 1300.499489] Unhandled fault: external abort on linefetch (0x014) at
0x000de9c0
[ 1300.506745] pgd = e113c000
[ 1300.509456] [000de9c0] *pgd=3fcd9831
[ 1300.515763] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.523017] pgd = e113c000
[ 1300.525728] [000de9e4] *pgd=3fcd9831
[ 1300.530470] Unhandled fault: external abort on linefetch (0x014) at
0x000de9e4
[ 1300.537718] pgd = e113c000
[ 1300.540429] [000de9e4] *pgd=3fcd9831
[ 1300.545138] Unhandled fault: external abort on linefetch (0x014) at
0x000de9c0
[ 1300.552394] pgd = e113c000
[ 1300.555105] [000de9c0] *pgd=3fcd9831
-->8--

Running the TS-221 on linux-image-4.3.0-0.bpo.1-kirkwood is very stable.
As said, running the TS-219 on linux-image-4.7.0-0.bpo.1-marvell is also
stable.

Main differences I see between the TS-221 (unstable with 4.7 marvell)
and TS-219 (stable with 4.7 marvell) are:
* the kernel flavor,
* the SoC and
* unencrypted raid (TS-219) versus encrypted raid (TS-221).

I am unsure if this is hardware related or setup related and also do not
know how to further analyze this issue.
Also not sure if this should be reported as a bug or not.

Any help appreciated.

GRTNX,
RobJE

Reply | Threaded
Open this post in threaded view
|

Re: error since moving to marvell kernel - BUG: Bad rss-counter state

Rob J. Epping-3
Quick update.

After tip from ukleinek in #debian-arm irc I swapped disks between
TS-221 and TS-219. Loaded kernel en initrd from usb device to overcome
luks wait.

result: kernel + initrd + disk running fine in TS-219 has below errors
in TS-221.

ukleinek and I also found that both flavors use DTB, i.e.
/proc/devicetree exists in both versions.

GRTNX,
RobJE

On 10/23/2016 05:19 PM, R Epping - debian-arm wrote:

> Hello All,
>
> Today I tried again to run linux-image-marvell (4.7.0-0.bpo.1 in this
> case) on my QNAP TS-221. I had issues with it in the past but did not
> get around to file a bug.
>
> I have the same version on a TS-219 which is running fine.
>
> Booting the TS-221 generates two types of error messages.
> -->8--
> [   37.167103] BUG: Bad rss-counter state mm:c0caa1e0 idx:1 val:1
> [  783.570365] BUG: Bad rss-counter state mm:c09e6220 idx:1 val:1
> [  800.172223] BUG: Bad rss-counter state mm:ecbc05e0 idx:1 val:1
> [  829.005336] BUG: Bad rss-counter state mm:c0d4b880 idx:1 val:1
> [  871.773956] BUG: Bad rss-counter state mm:c09e63c0 idx:1 val:1
> [ 1299.565344] BUG: Bad rss-counter state mm:ecaf8c40 idx:1 val:1
> -->8--
>
> and
>
> -->8--
> [   71.033784] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6c73db0
> [   71.041037] pgd = ead9c000
> [   71.043747] [b6c73db0] *pgd=3fd72831
> [   84.144056] Unhandled fault: external abort on linefetch (0x014) at
> 0xb6d44db0
> [   84.151313] pgd = eaee8000
> [   84.154023] [b6d44db0] *pgd=3fc89831
> [ 1300.378292] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.385547] pgd = e113c000
> [ 1300.388257] [000de9e4] *pgd=3fcd9831
> [ 1300.393119] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.400370] pgd = e113c000
> [ 1300.403081] [000de9e4] *pgd=3fcd9831
> [ 1300.407834] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9c0
> [ 1300.415089] pgd = e113c000
> [ 1300.417800] [000de9c0] *pgd=3fcd9831
> [ 1300.424135] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.431387] pgd = e113c000
> [ 1300.434098] [000de9e4] *pgd=3fcd9831
> [ 1300.438852] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.446107] pgd = e113c000
> [ 1300.448818] [000de9e4] *pgd=3fcd9831
> [ 1300.453525] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9c0
> [ 1300.460773] pgd = e113c000
> [ 1300.463484] [000de9c0] *pgd=3fcd9831
> [ 1300.469987] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.477246] pgd = e113c000
> [ 1300.479957] [000de9e4] *pgd=3fcd9831
> [ 1300.484726] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.491983] pgd = e113c000
> [ 1300.494693] [000de9e4] *pgd=3fcd9831
> [ 1300.499489] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9c0
> [ 1300.506745] pgd = e113c000
> [ 1300.509456] [000de9c0] *pgd=3fcd9831
> [ 1300.515763] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.523017] pgd = e113c000
> [ 1300.525728] [000de9e4] *pgd=3fcd9831
> [ 1300.530470] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9e4
> [ 1300.537718] pgd = e113c000
> [ 1300.540429] [000de9e4] *pgd=3fcd9831
> [ 1300.545138] Unhandled fault: external abort on linefetch (0x014) at
> 0x000de9c0
> [ 1300.552394] pgd = e113c000
> [ 1300.555105] [000de9c0] *pgd=3fcd9831
> -->8--
>
> Running the TS-221 on linux-image-4.3.0-0.bpo.1-kirkwood is very stable.
> As said, running the TS-219 on linux-image-4.7.0-0.bpo.1-marvell is also
> stable.
>
> Main differences I see between the TS-221 (unstable with 4.7 marvell)
> and TS-219 (stable with 4.7 marvell) are:
> * the kernel flavor,
> * the SoC and
> * unencrypted raid (TS-219) versus encrypted raid (TS-221).
>
> I am unsure if this is hardware related or setup related and also do not
> know how to further analyze this issue.
> Also not sure if this should be reported as a bug or not.
>
> Any help appreciated.
>
> GRTNX,
> RobJE
>