Updated installation images for Debian Ports 2019-04-20

classic Classic list List threaded Threaded
45 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Michael Schmitz-4
Hi Geert,

On 18/06/19 6:56 PM, Geert Uytterhoeven wrote:

> Hi Michael,
>
> On Mon, Jun 17, 2019 at 11:22 PM Michael Schmitz <[hidden email]> wrote:
>> On 15/06/19 11:15 AM, Finn Thain wrote:
>>> On Wed, 12 Jun 2019, Szymon Bieganski wrote:
>>>> Here is the end of dmesg (full log in attachment) when kernel stalls,
>>>> just as before:
>>>>
>>>> ------------------------
>>>>
>>>> [  122.430000] This architecture does not have kernel memory protection.
>>>> [  122.440000] Run /init as init process
>>>> [  126.690000] calling  ide_init+0x0/0x7c [ide_core] @ 43
>>>> [  126.700000] Uniform Multi-Platform E-IDE driver
>>>> [  126.710000] initcall ide_init+0x0/0x7c [ide_core] returned 0 after
>>>> 7988 usecs
>>>> [  126.980000] calling  amiga_gayle_ide_driver_init+0x0/0x1c [gayle] @ 43
>>>> [  126.990000] ide: Gayle IDE controller (A1200 style)
>>>> [  127.000000] Probing IDE interface ide0...
>>>> [  127.390000] hda: probing with STATUS(0x50) instead of ALTSTATUS(0x0a)
>>>> [  127.540000] hda: SAMSUNG MP0402H, ATA DISK drive
>>>> [  127.610000] Z2RAM: using 0K Zorro II RAM and 512K Chip RAM (Total 512K)
>>>> [  127.980000] hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x0a)
>>>> [  128.200000] hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x0a)
>>>> [  148.570000] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
>>>> [systemd-udevd:43]
>> Finn has raised the issue of systemd's short timeouts before. I'm
>> wondering whether that's part of your problem here. But the IDE driver
>> probe for a second disk should eventually complete, regardless of
>> systemd's udev module crashing?
>>
>> Not sure whether the 'probing with STATUS instead of ALTSTATUS' message
>> is normal for the A1200. Geert might remember that sort of detail.
> That comes from drivers/ide/ide-probe.c:ide_dev_read_id().
> Looking at the code, it may be caused by the drive, too.


I don't see IDE_HFLAG_BROKEN_ALTSTATUS set anywhere in the kernel code
except drivers/ide/amd74xx.c. gayle.c sets:

         .host_flags             = IDE_HFLAG_MMIO | IDE_HFLAG_SERIALIZE |
                                   IDE_HFLAG_NO_DMA,

So how would that flag bit get set for the gayle driver?

Cheers,

     Michael


> I do not see it on A4000.
>
> W.r.t. completing the probe, a log with dump_stack() added was sent
> to me by PM, and I replied the below:
>
>  From that log, I'm wondering if something is stuck in ide_probe_port().
>
> Can you sprinkle some debug prints
>
>      printk("%s:%u\n", __func__, __LINE__);
>
> in ide_probe_port() and probe_for_drive() (drivers/ide/ide-probe.c) and retry?
>
> Thanks!
>
> FTR, I've just booted my A4000 with a v5.2-rc5-based kernel, and IDE
> (Gayle, single drive) works. But old Debian, no systemd.
>
> Gr{oetje,eeting}s,
>
>                          Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
Hi Michael,

On Wed, Jun 19, 2019 at 2:09 AM Michael Schmitz <[hidden email]> wrote:

> On 18/06/19 6:56 PM, Geert Uytterhoeven wrote:
> > On Mon, Jun 17, 2019 at 11:22 PM Michael Schmitz <[hidden email]> wrote:
> >> On 15/06/19 11:15 AM, Finn Thain wrote:
> >>> On Wed, 12 Jun 2019, Szymon Bieganski wrote:
> >>>> Here is the end of dmesg (full log in attachment) when kernel stalls,
> >>>> just as before:
> >>>>
> >>>> ------------------------
> >>>>
> >>>> [  122.430000] This architecture does not have kernel memory protection.
> >>>> [  122.440000] Run /init as init process
> >>>> [  126.690000] calling  ide_init+0x0/0x7c [ide_core] @ 43
> >>>> [  126.700000] Uniform Multi-Platform E-IDE driver
> >>>> [  126.710000] initcall ide_init+0x0/0x7c [ide_core] returned 0 after
> >>>> 7988 usecs
> >>>> [  126.980000] calling  amiga_gayle_ide_driver_init+0x0/0x1c [gayle] @ 43
> >>>> [  126.990000] ide: Gayle IDE controller (A1200 style)
> >>>> [  127.000000] Probing IDE interface ide0...
> >>>> [  127.390000] hda: probing with STATUS(0x50) instead of ALTSTATUS(0x0a)
> >>>> [  127.540000] hda: SAMSUNG MP0402H, ATA DISK drive
> >>>> [  127.610000] Z2RAM: using 0K Zorro II RAM and 512K Chip RAM (Total 512K)
> >>>> [  127.980000] hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x0a)
> >>>> [  128.200000] hdb: probing with STATUS(0x00) instead of ALTSTATUS(0x0a)
> >>>> [  148.570000] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> >>>> [systemd-udevd:43]
> >> Finn has raised the issue of systemd's short timeouts before. I'm
> >> wondering whether that's part of your problem here. But the IDE driver
> >> probe for a second disk should eventually complete, regardless of
> >> systemd's udev module crashing?
> >>
> >> Not sure whether the 'probing with STATUS instead of ALTSTATUS' message
> >> is normal for the A1200. Geert might remember that sort of detail.
> > That comes from drivers/ide/ide-probe.c:ide_dev_read_id().
> > Looking at the code, it may be caused by the drive, too.
>
>
> I don't see IDE_HFLAG_BROKEN_ALTSTATUS set anywhere in the kernel code
> except drivers/ide/amd74xx.c. gayle.c sets:
>
>          .host_flags             = IDE_HFLAG_MMIO | IDE_HFLAG_SERIALIZE |
>                                    IDE_HFLAG_NO_DMA,
>
> So how would that flag bit get set for the gayle driver?

Finally someone bitten by not using the ! operator ;-)

The test condition is

    (hwif->host_flags & IDE_HFLAG_BROKEN_ALTSTATUS) == 0

i.e. if the flag is _not_ set.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
In reply to this post by Szymon Bieganski
Hi  Szymon,

On Tue, Jun 18, 2019 at 9:50 PM Szymon Bieganski <[hidden email]> wrote:

> On 6/18/19 8:56 AM, Geert Uytterhoeven wrote:
> > Can you sprinkle some debug prints
> >     printk("%s:%u\n", __func__, __LINE__);
> >
> > in ide_probe_port() and probe_for_drive() (drivers/ide/ide-probe.c) and retry?
> >
> I've recompiled with addition of these extra lines, and found out that
> the re-enable of irq = 2 hangs (see attached log for details)
>
> ===================
> printk("%s:%u\n", __func__, __LINE__);
>         /*
>          * Use cached IRQ number. It might be (and is...) changed by probe
>          * code above
>          */
> printk("irqd = %u\n", irqd);    // printed last
>         if (irqd) {
>                 enable_irq(irqd);
> printk("enabled IRQ %u\n", irqd);
>         }
> printk("%s:%u\n", __func__, __LINE__);
>         return rc;
>
> ====================
>
>
> The interrupts present on my machine are:
>
> (initramfs) cd
> /proc
> (initramfs) cat interrupts
>            CPU0
>   2:          5      auto      CIAA, apne

Does it make a difference if you remove the PCMCIA Ethernet card?
PCMCIA and IDE share interrupts through Gayle.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Szymon Bieganski
Hi Geert,

On 6/19/19 9:25 AM, Geert Uytterhoeven wrote:
> Does it make a difference if you remove the PCMCIA Ethernet card?
> PCMCIA and IDE share interrupts through Gayle.
>
Not at all. Kernel stall the same way with or without pcmcia card. LED
keeps blinking the same, but slightly faster rate.

With kind regards,

Szymon



signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Michael Schmitz-4
In reply to this post by Geert Uytterhoeven
Hi Geert,

On 19/06/19 7:16 PM, Geert Uytterhoeven wrote:

>
>>>> Not sure whether the 'probing with STATUS instead of ALTSTATUS' message
>>>> is normal for the A1200. Geert might remember that sort of detail.
>>> That comes from drivers/ide/ide-probe.c:ide_dev_read_id().
>>> Looking at the code, it may be caused by the drive, too.
>>
>> I don't see IDE_HFLAG_BROKEN_ALTSTATUS set anywhere in the kernel code
>> except drivers/ide/amd74xx.c. gayle.c sets:
>>
>>           .host_flags             = IDE_HFLAG_MMIO | IDE_HFLAG_SERIALIZE |
>>                                     IDE_HFLAG_NO_DMA,
>>
>> So how would that flag bit get set for the gayle driver?
> Finally someone bitten by not using the ! operator ;-)


Nah, I just overlooked the == 0 at the end. Same effect though.

Now where did I last see that brown paper bag ...

Cheers,

     Michael



>
> The test condition is
>
>      (hwif->host_flags & IDE_HFLAG_BROKEN_ALTSTATUS) == 0
>
> i.e. if the flag is _not_ set.
>
> Gr{oetje,eeting}s,
>
>                          Geert
>

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Szymon Bieganski
In reply to this post by Michael Schmitz-4
On 6/19/19 1:42 AM, Michael Schmitz wrote:
> Does the heartbeat LED keep flashing past this point? I wonder whether
> there's an interrupt still pending on the IDE interface that wasn't
> cleared when the probe for hdb timed out.

Keeps flashing, with slightly faster rate.


> Can you add more output to check where ide_dev_read_id() exits in this
> case? And maybe add
>
> (void)tp_ops->read_status(hwif);
>
> before the return instruction taken?

Please find the attached dmesg capture.

With kind regards,

Szymon


dmesg.4.19.37-amiga.hagel.6 (30K) Download Attachment
ide-probe.c (42K) Download Attachment
signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Michael Schmitz-4
Szymon,

Am 22.06.2019 um 04:37 schrieb Szymon Bieganski:
> On 6/19/19 1:42 AM, Michael Schmitz wrote:
>> Does the heartbeat LED keep flashing past this point? I wonder whether
>> there's an interrupt still pending on the IDE interface that wasn't
>> cleared when the probe for hdb timed out.
>
> Keeps flashing, with slightly faster rate.

Yes, I should have realized it should do that when udevd later times
out. Still, had to ask.

>
>
>> Can you add more output to check where ide_dev_read_id() exits in this
>> case? And maybe add
>>
>> (void)tp_ops->read_status(hwif);
>>
>> before the return instruction taken?
>
> Please find the attached dmesg capture.

Looks like the probe for hdb does not time out (that would have taken
either of the 'return 1' paths). So it's either return code 0 or 2. 0
would have called do_identify() which we should have seen, or 2 (drive
aborted probe).

Looks to me as though the master drive present causes a probe for a
slave drive to abort. That could have ramifications for interrupt
handling later on.

Geert - are the CIA interrupts level or edge triggered? Is there any way
to skip probing for a second drive once the first has been found?

Cheers,

        Michael

>
> With kind regards,
>
> Szymon
>

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Szymon Bieganski
Michael,

On 6/21/19 10:49 PM, Michael Schmitz wrote:
> Looks like the probe for hdb does not time out (that would have taken
> either of the 'return 1' paths). So it's either return code 0 or 2. 0
> would have called do_identify() which we should have seen, or 2 (drive
> aborted probe).
>
See below with slightly more informative debugging capture:

-----------------------

calling  amiga_gayle_ide_driver_init+0x0/0x18 @ 1
ide: Gayle IDE controller (A1200 style)
ide_probe_port:722 irqd = 2
ide_probe_port:725 disabled 2
Probing IDE interface ide0...
probe_for_drive:502 do probe.
ide_dev_read_id:266 disable device IRQ.
hda: probing with STATUS(0x50) instead of ALTSTATUS(0xff)
ide_dev_read_id:298: ask drive for ID.
ide_dev_read_id:313: ide busy sleep.
ide_dev_read_id:321: read status.
ide_dev_read_id:327: do identify.
ide_dev_read_id:332: read status.
ide_dev_read_id:339 ide_dev_read_id returns 0.
probe_for_drive:504 probe returned 0.
probe_for_drive:535 ide classify ata dev.
hda: SAMSUNG MP0402H, ATA DISK drive
probe_for_drive:553 ide disk init chs.
probe_for_drive:555 ide disk init mult count.
probe_for_drive:558 device was found; returning 1.
probe_for_drive:502 do probe.
probe_for_drive:504 probe returned 3.
probe_for_drive:508 do probe for ATAPI device.
ide_dev_read_id:266 disable device IRQ.
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0xff)
ide_dev_read_id:293 disable DMA & overlap.
ide_dev_read_id:298: ask drive for ID.
ide_dev_read_id:313: ide busy sleep.
ide_dev_read_id:321: read status.
ide_dev_read_id:336: drive refused ID.
ide_dev_read_id:339 ide_dev_read_id returns 2.
ide_dev_read_id:266 disable device IRQ.
hdb: probing with STATUS(0x00) instead of ALTSTATUS(0xff)
ide_dev_read_id:293 disable DMA & overlap.
ide_dev_read_id:298: ask drive for ID.
ide_dev_read_id:313: ide busy sleep.
ide_dev_read_id:321: read status.
ide_dev_read_id:336: drive refused ID.
ide_dev_read_id:339 ide_dev_read_id returns 2.
probe_for_drive:510 probe (ATAPI) returned 2
probe_for_drive:513 no device found.
ide_probe_port:745
ide_probe_port:747 enabling IRQ 2.
random: crng init done
------------------------------------------------

> Looks to me as though the master drive present causes a probe for a
> slave drive to abort. That could have ramifications for interrupt
> handling later on.
>
> Geert - are the CIA interrupts level or edge triggered? Is there any
> way to skip probing for a second drive once the first has been found?

For clarity during the stall condition the _IDE_IRQ is kept low, both
_IDE_CS are high, _ODD_CIA goes low for 880ns every 22us, _EVEN_CIA has
short bursts of low every 10ms, same for _INT2, while _INT6 remains low
all the time. If necessary I can provide detailed captures of these
signals in other conditions too.

Kind regards,

Szymon



signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Michael Schmitz-4
Hi Szymon,

On 23/06/19 10:26 PM, Szymon Bieganski wrote:

> Michael,
>
> On 6/21/19 10:49 PM, Michael Schmitz wrote:
>> Looks like the probe for hdb does not time out (that would have taken
>> either of the 'return 1' paths). So it's either return code 0 or 2. 0
>> would have called do_identify() which we should have seen, or 2 (drive
>> aborted probe).
>>
> See below with slightly more informative debugging capture:
>
> -----------------------
>
> calling  amiga_gayle_ide_driver_init+0x0/0x18 @ 1
> ide: Gayle IDE controller (A1200 style)
> ide_probe_port:722 irqd = 2
> ide_probe_port:725 disabled 2
> Probing IDE interface ide0...
> probe_for_drive:502 do probe.
> ide_dev_read_id:266 disable device IRQ.
> hda: probing with STATUS(0x50) instead of ALTSTATUS(0xff)
> ide_dev_read_id:298: ask drive for ID.
> ide_dev_read_id:313: ide busy sleep.
> ide_dev_read_id:321: read status.
> ide_dev_read_id:327: do identify.
> ide_dev_read_id:332: read status.
> ide_dev_read_id:339 ide_dev_read_id returns 0.
> probe_for_drive:504 probe returned 0.
> probe_for_drive:535 ide classify ata dev.
> hda: SAMSUNG MP0402H, ATA DISK drive
> probe_for_drive:553 ide disk init chs.
> probe_for_drive:555 ide disk init mult count.
> probe_for_drive:558 device was found; returning 1.
> probe_for_drive:502 do probe.
> probe_for_drive:504 probe returned 3.
> probe_for_drive:508 do probe for ATAPI device.
> ide_dev_read_id:266 disable device IRQ.
> hdb: probing with STATUS(0x00) instead of ALTSTATUS(0xff)
> ide_dev_read_id:293 disable DMA & overlap.
> ide_dev_read_id:298: ask drive for ID.
> ide_dev_read_id:313: ide busy sleep.
> ide_dev_read_id:321: read status.
> ide_dev_read_id:336: drive refused ID.
> ide_dev_read_id:339 ide_dev_read_id returns 2.

Thanks, that confirms my reading of the earlier trace.

> ide_dev_read_id:266 disable device IRQ.
> hdb: probing with STATUS(0x00) instead of ALTSTATUS(0xff)
> ide_dev_read_id:293 disable DMA & overlap.
> ide_dev_read_id:298: ask drive for ID.
> ide_dev_read_id:313: ide busy sleep.
> ide_dev_read_id:321: read status.
> ide_dev_read_id:336: drive refused ID.
> ide_dev_read_id:339 ide_dev_read_id returns 2.
> probe_for_drive:510 probe (ATAPI) returned 2
> probe_for_drive:513 no device found.
> ide_probe_port:745
> ide_probe_port:747 enabling IRQ 2.
> random: crng init done
> ------------------------------------------------
>
>> Looks to me as though the master drive present causes a probe for a
>> slave drive to abort. That could have ramifications for interrupt
>> handling later on.
>>
>> Geert - are the CIA interrupts level or edge triggered? Is there any
>> way to skip probing for a second drive once the first has been found?
> For clarity during the stall condition the _IDE_IRQ is kept low, both


The screenshot you sent by separate mail does show _IDE_IRQ high
(inactive)? Suggests it's not the hda drive flooding the system with
interrupts (which we pretty much knew already, because the heartbeat LED
still flashes and udevd eventually times out).


> _IDE_CS are high, _ODD_CIA goes low for 880ns every 22us, _EVEN_CIA has
> short bursts of low every 10ms, same for _INT2, while _INT6 remains low


Probably the system timer interrupt.


> all the time. If necessary I can provide detailed captures of these
> signals in other conditions too.


Not sure what the _IDE_CS are used for. If you could trigger when the
identify command is sent to drive hdb, that would be great, but I can't
see how that would work without tapping into all the data and address
lines on the IDE interface ...

I think we should pursue possible errors in the interrupt enable code
path instead.

Cheers,

     Michael



>
> Kind regards,
>
> Szymon
>
>

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
Hi Michael,

On Mon, Jun 24, 2019 at 4:28 AM Michael Schmitz <[hidden email]> wrote:
> On 23/06/19 10:26 PM, Szymon Bieganski wrote:
> > On 6/21/19 10:49 PM, Michael Schmitz wrote:
> >> Looks to me as though the master drive present causes a probe for a
> >> slave drive to abort. That could have ramifications for interrupt
> >> handling later on.
> >>
> >> Geert - are the CIA interrupts level or edge triggered? Is there any

Everything is level triggered.

Note that IDE is not a CIA IRQ, but plain AUTO2.

> >> way to skip probing for a second drive once the first has been found?
> > For clarity during the stall condition the _IDE_IRQ is kept low, both
>
> The screenshot you sent by separate mail does show _IDE_IRQ high
> (inactive)? Suggests it's not the hda drive flooding the system with
> interrupts (which we pretty much knew already, because the heartbeat LED
> still flashes and udevd eventually times out).

Indeed.

> > _IDE_CS are high, _ODD_CIA goes low for 880ns every 22us, _EVEN_CIA has
> > short bursts of low every 10ms, same for _INT2, while _INT6 remains low
>
> Probably the system timer interrupt.

Yep.

I find it strange that _INT6 remains low. It's used only for CIA B (timer).
 Was the probe connected correctly?

> > all the time. If necessary I can provide detailed captures of these
> > signals in other conditions too.
>
> Not sure what the _IDE_CS are used for. If you could trigger when the

IDE master and slave drive select.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
In reply to this post by Michael Schmitz-4
Hi Szymon,

On Mon, Jun 24, 2019 at 7:06 PM Szymon Bieganski <[hidden email]> wrote:

> On 6/24/19 4:27 AM, Michael Schmitz wrote:
> > The screenshot you sent by separate mail does show _IDE_IRQ high
> > (inactive)? Suggests it's not the hda drive flooding the system with
> > interrupts (which we pretty much knew already, because the heartbeat
> > LED still flashes and udevd eventually times out).
> >
> I must have pick the wrong screenshot then. I double checked it now, and
> _IDE_IRQ is indeed kept constant low (negative logic, so it must be
> constantly active). I see a lot of activity on these lines during normal
> operation (both during boot, in NetBSD and in AmigaOS).
>
> Indeed there were also two lines mixed up - _INT2 swapped with _INT6. So
> to sum up, during the stall:
>
> - _IDE_IRQ is stuck low (active)

I can't find anything about the INTRQ signal polarity in the IDE specs,
but given the presence of a pull-down resistor on both A4000 and A1200,
I assume this is an active-high, not active-low, signal. That would mean
the drive does not assert the interrupt signal!
Does this match the activity you see when running AmigaOS (i.e. high
most of the time)?

On A4000, INTRQ is fed to a transistor inverter, which supplies _IDE_INT
to GAYLE.
On A1200, INTRQ is called _IDE_IRQ. Presumably the inverter is
integrated in GAYLE, but calling the signal _IDE_IRQ is... confusing.

Note that the A4000 GAYLE is not the same chip as the A1200 GAYLE.
The former is a simple PAL, which is sufficient to handle IDE.

> - _IDE_CS(1) and _ID_CS(2) stuck high (inactive)

OK.

> - _ODD_CIA goes low for 880ns every 22us, while its corresponding

OK.

> interrupt line _INT2 is stuck low (active)

_INT2 is a shared interrupt, used by e.g. IDE through GAYLE, and a CIA.

> - _EVEN_CIA is inactive most of the time, and goes low for a short burst
> of activity once every 10ms, together with its _INT6 line

OK, HZ timer.

While I first thought this was an IDE drive issue (we had issues with
some WDC models a long time ago, which didn't show up on PC, as PCs
don't use shared interrupts), this looks like an issue with the A1200
GAYLE-specific interrupt handling in the IDE driver. It would be
interesting to know if other people with A1200s have issues with IDE or
not....

Time to look at the NetBSD sources, too...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Karoly Balogh (Charlie/SGR)
Hi,

On Tue, 25 Jun 2019, Geert Uytterhoeven wrote:

> While I first thought this was an IDE drive issue (we had issues with
> some WDC models a long time ago, which didn't show up on PC, as PCs
> don't use shared interrupts), this looks like an issue with the A1200
> GAYLE-specific interrupt handling in the IDE driver. It would be
> interesting to know if other people with A1200s have issues with IDE or
> not....

Sorry for bumping this thread after a few months, I just wanted to report
that I recently built a mainline 4.19.87 kernel with GCC 7.4.0, (playing
with some Buildroot thing) and I think I ran into exactly this issue on my
A1200. Mine runs with Blizzard 1260, 128MB RAM, and some SD2IDE adapter,
nothing fancy. Same setup works very stable and without problems under
AmigaOS, and actually, also NetBSD (although I haven't run that for a
while). I was just curious if there was any follow-up or maybe a solution
to this issue.

All previously discussed symptoms also match my observations on my
hardware, like say, PCMCIA card inserted or not makes no difference.

If that's extra info, the same kernel works flawlessly in FS-UAE (with an
emulated A1200 IDE controller), so the problems only occur on real
hardware.

Charlie

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Karoly Balogh (Charlie/SGR)
Hi,

On Mon, 9 Dec 2019, Karoly Balogh (Charlie/SGR) wrote:

> > It would be interesting to know if other people with A1200s have
> > issues with IDE or not....
>
> Sorry for bumping this thread after a few months, I just wanted to
> report that I recently built a mainline 4.19.87 kernel with GCC 7.4.0,
> (playing with some Buildroot thing) and I think I ran into exactly this
> issue on my A1200.
> (...)
> If that's extra info, the same kernel works flawlessly in FS-UAE (with an
> emulated A1200 IDE controller), so the problems only occur on real
> hardware.

OK, so I did some more tests:

1., I tried the PATA Gayle driver. This one fails to boot in UAE, seems to
be able to probe the units, then unable to read blocks from them it seems.
At least in FS UAE on my Mac. I could try to fetch a proper crashlog
somehow... Note that the IDE Gayle driver works fine with the same
emulator setup.

2., The PATA Gayle driver also locks up on my Amiga 1200 similar to the
IDE Gayle driver.

3., Actually, it still might be a problem with the SD2IDE adapter. With
the PATA Gayle driver, it doesn't even enumerate the master unit, just
locks up. With the IDE Gayle driver, it shows the device on the bus, then
locks up. When the lockup occurs, with the PATA Gayle driver, the HDD LED
remains lit. With the IDE Gayle driver, it remains dark, but:

If I try to boot the kernel from an actual 2,5" HDD (Samsung 40GB,
relatively recent), none of the enumeration problems seem to be happening,
and both the IDE driver and the PATA driver shows the partitions on the
HDD properly, then waits to the root volume to appear. Sadly this HDD
doesn't have a Linux root partition tho, and no disk space to make one. :(
So can't boot further to test the system.

For now, I ordered an SD2IDE adapter with another chipset, and I'll try to
get a real hard disk, or any other device (maybe an IDE DoM or something)
to test with. (But again, the problematic SD2IDE works with AmigaOS just
fine, and also worked with NetBSD 7.1 at least, and as I have two of the
same SD2IDE adapters, I tried with both, but both show the same symptoms.
Also tried SD cards from multiple makers, made no difference, so it must
be that SD2IDE model somehow.)

Cheers,
--
Charlie

(Ps: are these tests/sharing this kind of info is useful? I stop spamming
if no one is really interested.)

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

John Paul Adrian Glaubitz
Hi!

On 12/10/19 3:56 PM, Karoly Balogh (Charlie/SGR) wrote:
> If I try to boot the kernel from an actual 2,5" HDD (Samsung 40GB,
> relatively recent), none of the enumeration problems seem to be happening,
> and both the IDE driver and the PATA driver shows the partitions on the
> HDD properly, then waits to the root volume to appear. Sadly this HDD
> doesn't have a Linux root partition tho, and no disk space to make one. :(
> So can't boot further to test the system.

Users have reported similar experiences on the #debian-ports channel with
their Amigas and such adapters. So, this might be a hardware compatibility
issue that shows on Linux only.

> For now, I ordered an SD2IDE adapter with another chipset, and I'll try to
> get a real hard disk, or any other device (maybe an IDE DoM or something)
> to test with. (But again, the problematic SD2IDE works with AmigaOS just
> fine, and also worked with NetBSD 7.1 at least, and as I have two of the
> same SD2IDE adapters, I tried with both, but both show the same symptoms.
> Also tried SD cards from multiple makers, made no difference, so it must
> be that SD2IDE model somehow.)

Ah, good to know. So the hardware seems to be compatible in general.

> (Ps: are these tests/sharing this kind of info is useful? I stop spamming
> if no one is really interested.)

Absolutely. Please keep reporting such issues as you see them.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
In reply to this post by Karoly Balogh (Charlie/SGR)
CC linux-ide, as people there may be more familiar with the quirks and
caveats of SD2IDE adapters

On Tue, Dec 10, 2019 at 3:56 PM Karoly Balogh (Charlie/SGR)
<[hidden email]> wrote:

> On Mon, 9 Dec 2019, Karoly Balogh (Charlie/SGR) wrote:
> > > It would be interesting to know if other people with A1200s have
> > > issues with IDE or not....
> >
> > Sorry for bumping this thread after a few months, I just wanted to
> > report that I recently built a mainline 4.19.87 kernel with GCC 7.4.0,
> > (playing with some Buildroot thing) and I think I ran into exactly this
> > issue on my A1200.
> > (...)
> > If that's extra info, the same kernel works flawlessly in FS-UAE (with an
> > emulated A1200 IDE controller), so the problems only occur on real
> > hardware.
>
> OK, so I did some more tests:
>
> 1., I tried the PATA Gayle driver. This one fails to boot in UAE, seems to
> be able to probe the units, then unable to read blocks from them it seems.
> At least in FS UAE on my Mac. I could try to fetch a proper crashlog
> somehow... Note that the IDE Gayle driver works fine with the same
> emulator setup.
>
> 2., The PATA Gayle driver also locks up on my Amiga 1200 similar to the
> IDE Gayle driver.
>
> 3., Actually, it still might be a problem with the SD2IDE adapter. With
> the PATA Gayle driver, it doesn't even enumerate the master unit, just
> locks up. With the IDE Gayle driver, it shows the device on the bus, then
> locks up. When the lockup occurs, with the PATA Gayle driver, the HDD LED
> remains lit. With the IDE Gayle driver, it remains dark, but:
>
> If I try to boot the kernel from an actual 2,5" HDD (Samsung 40GB,
> relatively recent), none of the enumeration problems seem to be happening,
> and both the IDE driver and the PATA driver shows the partitions on the
> HDD properly, then waits to the root volume to appear. Sadly this HDD
> doesn't have a Linux root partition tho, and no disk space to make one. :(
> So can't boot further to test the system.
>
> For now, I ordered an SD2IDE adapter with another chipset, and I'll try to
> get a real hard disk, or any other device (maybe an IDE DoM or something)
> to test with. (But again, the problematic SD2IDE works with AmigaOS just
> fine, and also worked with NetBSD 7.1 at least, and as I have two of the
> same SD2IDE adapters, I tried with both, but both show the same symptoms.
> Also tried SD cards from multiple makers, made no difference, so it must
> be that SD2IDE model somehow.)
>
> Cheers,
> --
> Charlie
>
> (Ps: are these tests/sharing this kind of info is useful? I stop spamming
> if no one is really interested.)

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Karoly Balogh (Charlie/SGR)
In reply to this post by John Paul Adrian Glaubitz
Hi,

On Tue, 10 Dec 2019, John Paul Adrian Glaubitz wrote:

> > (Ps: are these tests/sharing this kind of info is useful? I stop spamming
> > if no one is really interested.)
>
> Absolutely. Please keep reporting such issues as you see them.

OK, so just a quick update, the effort to get a "different kind" of
adapter failed, as the seller "accidentally" used the wrong stock photo,
and sent me exactly the kind of adapter I was trying to avoid... So there
will be no test with a different adapter, but a complaint filed on Amazon.

Apart from this, I dug up my A4000/CSPPC, and I tested the 3,5"/40 pin
version of this adapter, and it shows the exact same issue, both with the
IDE and the PATA drivers. Which of course means the issue I'm seeing is
not related to A4000 vs. A1200 IDE differences. Very weird.

Meanwhile, I remembered I posted a NetBSD dmesg online once, using the
very same adapter, in the same machine. So here it is, at least it shows
some kind of IDE device info, and shows that it continues booting w/o
problems:

https://dmesgd.nycbug.org/index.cgi?do=view&id=3222

(BTW, if any kernel developer wants to take a look, I can make the A1200
or the A4000 remotely accessible, via a Raspberry Pi. Serial console, and
hardware-reset for the Amiga via RPi GPIO. On boot it would just fetch a
new kernel from the RPi (from AmigaOS), before trying to boot it. So if
anyone feels like debugging this, it's there. We've done this before for
debugging the MorphOS kernel on the same hardware, worked quite well.)

Cheers,
--
Charlie

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Geert Uytterhoeven
Hi Charlie,

On Wed, Dec 11, 2019 at 2:36 PM Karoly Balogh (Charlie/SGR)
<[hidden email]> wrote:
> (BTW, if any kernel developer wants to take a look, I can make the A1200
> or the A4000 remotely accessible, via a Raspberry Pi. Serial console, and
> hardware-reset for the Amiga via RPi GPIO. On boot it would just fetch a
> new kernel from the RPi (from AmigaOS), before trying to boot it. So if
> anyone feels like debugging this, it's there. We've done this before for
> debugging the MorphOS kernel on the same hardware, worked quite well.)

Just wondering: how do you reset the A4000 remotely?
By pulling the keyboard clockline low? By controlling power?
Anything else?

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [hidden email]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Karoly Balogh (Charlie/SGR)
Hi,

On Wed, 11 Dec 2019, Geert Uytterhoeven wrote:

> > (BTW, if any kernel developer wants to take a look, I can make the A1200
> > or the A4000 remotely accessible, via a Raspberry Pi. Serial console, and
> > hardware-reset for the Amiga via RPi GPIO. On boot it would just fetch a
> > new kernel from the RPi (from AmigaOS), before trying to boot it. So if
> > anyone feels like debugging this, it's there. We've done this before for
> > debugging the MorphOS kernel on the same hardware, worked quite well.)
>
> Just wondering: how do you reset the A4000 remotely?
> By pulling the keyboard clockline low? By controlling power?
> Anything else?

We made an adapter which with a PLCC socket, which piggybacks the Fat Gary
chip, and exposes the _KBRESET line coming from Pin 36 or so on a Pin
header. So you can hook it up to and pull that line from the RPi GPIO,
effectively causing a hardware reset.

A similar hack is possible with the Gayle chip on the A1200, and some
"PCMCIA reset fix" Gayle-piggyback adapters available at some retro HW
resellers already expose this pin.

Some details here: https://www.amigaworld.de/workshops/reset-am-amiga/ (in
German).

Cheers
--
Charlie

Reply | Threaded
Open this post in threaded view
|

Re: "BUG: soft lockup" on A1200; was: Re: Updated installation images for Debian Ports 2019-04-20

Christian T. Steigies
On Wed, Dec 11, 2019 at 03:18:06PM +0100, Karoly Balogh (Charlie/SGR) wrote:

> Hi,
>
> On Wed, 11 Dec 2019, Geert Uytterhoeven wrote:
>
> > > (BTW, if any kernel developer wants to take a look, I can make the A1200
> > > or the A4000 remotely accessible, via a Raspberry Pi. Serial console, and
> > > hardware-reset for the Amiga via RPi GPIO. On boot it would just fetch a
> > > new kernel from the RPi (from AmigaOS), before trying to boot it. So if
> > > anyone feels like debugging this, it's there. We've done this before for
> > > debugging the MorphOS kernel on the same hardware, worked quite well.)
> >
> > Just wondering: how do you reset the A4000 remotely?
> > By pulling the keyboard clockline low? By controlling power?
> > Anything else?
>
> We made an adapter which with a PLCC socket, which piggybacks the Fat Gary
> chip, and exposes the _KBRESET line coming from Pin 36 or so on a Pin
> header. So you can hook it up to and pull that line from the RPi GPIO,
> effectively causing a hardware reset.
>
> A similar hack is possible with the Gayle chip on the A1200, and some
> "PCMCIA reset fix" Gayle-piggyback adapters available at some retro HW
> resellers already expose this pin.
>
> Some details here: https://www.amigaworld.de/workshops/reset-am-amiga/ (in
> German).

And how do you transfer the kernel from the RPi?
This sounds like a cool setup, a fully remote controlled Amiga...

Christian

Reply | Threaded
Open this post in threaded view
|

Remote kernel debugging an Amiga; was: "BUG: soft lockup" on A1200

Karoly Balogh (Charlie/SGR)
Hi,

On Wed, 11 Dec 2019, Christian T. Steigies wrote:

> > > > (BTW, if any kernel developer wants to take a look, I can make the
> > > > A1200 or the A4000 remotely accessible, via a Raspberry Pi. Serial
> > > > console, and hardware-reset for the Amiga via RPi GPIO. On boot it
> > > > would just fetch a new kernel from the RPi (from AmigaOS), before
> > > > trying to boot it. So if anyone feels like debugging this, it's
> > > > there. We've done this before for debugging the MorphOS kernel on
> > > > the same hardware, worked quite well.)
> > >
> (... snip ...)
> And how do you transfer the kernel from the RPi?
> This sounds like a cool setup, a fully remote controlled Amiga...

Yeah, it was/can be quite fun. :) Quite a story to tell at events for
sure, that the main MorphOS kernel dev was sitting in South Africa, while
the Amiga was sitting on my desk near Mainz in Germany hooked on an RPi,
and this is how we debugged a PPC kernel for the box... :)

On the question itself: over the network of course. So in the A4000 case,
the thing has an X-Surf, and a minimal AmigaOS setup to boot into some
sort of networked state after a reset. (An a A1200 could utilize a PCMCIA
network card instead.)

In detail (this is long overdue for a blogpost or something):

After a reset, AmigaOS (3.x) boots by default, then autostarts a TCP/IP
stack + brings the network card online automatically. When interface up is
reached, most Amiga TCP/IP stacks support some kind of post-up script, so
it will execute an AmigaDOS script to fetch the kernel and the a boot
script from the RPi. In our case this was done using httpresume (available
on Aminet), which is a http downloader, kinda a native-wget-for-AmigaOS,
which we found it works better in some cases than the actual wget port.
Obviously, the RPi runs a small (LAN only) webserver to expose the files
to the Amiga via http, but the uploads to the RPi from "outside" are done
via scp/sftp.

Then it just makes the boot script executable, and starts it, which in
turn starts the kernel. The rest can already be observed over the serial
console. Optionally, the AmigaOS boot process and various stages described
above can be also made to send messages over serial, as there are tools
for that too, but we never bothered... (I think I researched at some
point, and in case of fatal errors in the process, there was a way to
expose an AmigaDOS command line over serial too. As a last-resort
solution, or you'd at least get some news that something has failed, and
don't wait for it to boot into the other kernel.)

Obviously, having both the kernel image and the boot script downloaded on
every iteration has the advantage, that all kernel command line parameters
can be modified for every new boot configuration. Other tools and required
files (initrd maybe? or custom amiboot versions) are easy to include too.
Actually AmigaDOS even has an ext2fs handler, so theoretically even
modules can be overwritten on some Linux root FS...

In fact, our download process for MorphOS became two stage at some point,
the first system would just download a "download script", and execute
that, so even the list of files to be fetched from the RPi on next boot
can be edited easily without physically touching the machine, or custom
commands can be executed too (copying files to the right place before
attempting to boot, etc).

In our experience, the entire reboot process including downloads takes
less than a minute on a well executed '060 based setup, so it's really
workable. We actually tried to fiddle with TFTP and other things first,
but we found this "naive" HTTP-download approach worked the best and was
the easiest to get to.

I'm happy to share more details over the AmigaOS setup, the download
scripts we used, the reset-script using RPi GPIO, etc, but given the task
we had at hand, they are as super minimal, and as sketchy and makeshift as
you might expect... :) Just ask.

Sorry for the long mail...

Cheers,
--
Charlie

123