Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Source: rootskel
Version: 1.128
Severity: important
User: [hidden email]
Usertags: sparc64

Hello!

I built updated installation images [1] for Debian Ports today and tested
the sparc64 image on our SPARC T5 in an LDOM.

Unfortunately, it seems that the recent changes to rootskel broke the
serial console on sparc64 in d-i. The kernel boots fine but d-i never
starts, the boot stops with:

steal-ctty: No such file or directory

My suspicion is that the support multiple consoles in parallel [2] introduced
this particular regression. I haven't done any debugging yet though as I'm
not sure where to start, I haven't touched the rootskel package before and
therefore would be interested in any pointers how to debug this.

Thanks,
Adrian

> [1] https://cdimage.debian.org/cdimage/ports/2019-04-06/
> [2] https://salsa.debian.org/installer-team/rootskel/commit/b6048aafed7d73ba42da04d6f7a798710f271384

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> My suspicion is that the support multiple consoles in parallel [2] introduced
> this particular regression. I haven't done any debugging yet though as I'm
> not sure where to start, I haven't touched the rootskel package before and
> therefore would be interested in any pointers how to debug this.

The problem seems to be the fact that the sparc64 kernel uses different names
for /proc/console and the actual console name:

root@landau:~# cat /proc/consoles
ttyHV0               -W- (EC p  )    4:64
tty0                 -WU (E     )    4:1
root@landau:~# readlink /sys/dev/char/4:64
../../devices/root/f0299a70/f029b788/tty/ttyS0
root@landau:~#

And this is what used to make it work [1]:

            *) # >= 2.6.38
                console_major_minor="$(get-real-console-linux)"
                console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
                console="${console_raw##*/}"
                ;;

Adrian

> [1] https://salsa.debian.org/installer-team/rootskel/blob/cb7db898f58f14c04b9d60351811cbae71b49a07/src/sbin/reopen-console-linux#L21

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Ben Hutchings-3
On Sat, 2019-04-06 at 21:33 +0200, John Paul Adrian Glaubitz wrote:

> On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> > My suspicion is that the support multiple consoles in parallel [2] introduced
> > this particular regression. I haven't done any debugging yet though as I'm
> > not sure where to start, I haven't touched the rootskel package before and
> > therefore would be interested in any pointers how to debug this.
>
> The problem seems to be the fact that the sparc64 kernel uses different names
> for /proc/console and the actual console name:
>
> root@landau:~# cat /proc/consoles
> ttyHV0               -W- (EC p  )    4:64
> tty0                 -WU (E     )    4:1
> root@landau:~# readlink /sys/dev/char/4:64
> ../../devices/root/f0299a70/f029b788/tty/ttyS0
The inconsistent name seems like a kernel bug...

> root@landau:~#
>
> And this is what used to make it work [1]:
>
>    *) # >= 2.6.38
> console_major_minor="$(get-real-console-linux)"
> console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
> console="${console_raw##*/}"
> ;;

So maybe rootskel should use that again, but applied to each console's
char device number.

(Though directly using the symlinks under /dev/char seems cleaner than
poking in sysfs.)

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Hi Ben!

On 4/7/19 1:53 AM, Ben Hutchings wrote:
>> root@landau:~# cat /proc/consoles
>> ttyHV0               -W- (EC p  )    4:64
>> tty0                 -WU (E     )    4:1
>> root@landau:~# readlink /sys/dev/char/4:64
>> ../../devices/root/f0299a70/f029b788/tty/ttyS0
>
> The inconsistent name seems like a kernel bug...

Yes. I'm trying to convince Dave Miller to fix this.

Do you think we could carry a patch in src:linux for the time being?

>> root@landau:~#
>>
>> And this is what used to make it work [1]:
>>
>>    *) # >= 2.6.38
>> console_major_minor="$(get-real-console-linux)"
>> console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
>> console="${console_raw##*/}"
>> ;;
>
> So maybe rootskel should use that again, but applied to each console's
> char device number.
>
> (Though directly using the symlinks under /dev/char seems cleaner than
> poking in sysfs.)

I agree.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Ben Hutchings-3
On Tue, 2019-04-16 at 11:47 +0200, John Paul Adrian Glaubitz wrote:

> Hi Ben!
>
> On 4/7/19 1:53 AM, Ben Hutchings wrote:
> > > root@landau:~# cat /proc/consoles
> > > ttyHV0               -W- (EC p  )    4:64
> > > tty0                 -WU (E     )    4:1
> > > root@landau:~# readlink /sys/dev/char/4:64
> > > ../../devices/root/f0299a70/f029b788/tty/ttyS0
> >
> > The inconsistent name seems like a kernel bug...
>
> Yes. I'm trying to convince Dave Miller to fix this.
>
> Do you think we could carry a patch in src:linux for the time being?
[...]

I would rather not do that until it's accepted, as if it that doesn't
happen we either have to switch back or carry it forever.

Ben.

--
Ben Hutchings
Make three consecutive correct guesses and you will be considered
an expert.



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
On 4/16/19 1:16 PM, Ben Hutchings wrote:
>> Do you think we could carry a patch in src:linux for the time being?
> [...]
>
> I would rather not do that until it's accepted, as if it that doesn't
> happen we either have to switch back or carry it forever.

Hmm, okay. Then I don't really have a way of building updated images
now for the time being.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
In reply to this post by Ben Hutchings-3
Control: reassign -1 src:linux
Control: tags -1 patch

On 4/16/19 1:16 PM, Ben Hutchings wrote:
>> Do you think we could carry a patch in src:linux for the time being?
> [...]
>
> I would rather not do that until it's accepted, as if it that doesn't
> happen we either have to switch back or carry it forever.

My patch has been merged upstream now and is planned for -stable [1].

Attaching the patch.

Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git/commit/?id=07a6d63eb1b54b5fb38092780fe618dfe1d96e23

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

0001-sunhv-Fix-device-naming-inconsistency-between-sunhv_.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Hi!

On 6/14/19 7:55 AM, John Paul Adrian Glaubitz wrote:
> My patch has been merged upstream now and is planned for -stable [1].

It's now part of the 4.19 [1] and 5.1 [2] stable queues, so I guess we just
have to wait a little now.

@Ben: Can you make sure this bug gets closed with the next stable upload?

Thanks!
Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc95841f3511b943ad72133e67a105008839ead2
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=176eeebcbf771062473c8f751fa2adb4a8baebb6

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

James Clarke-2
In reply to this post by Ben Hutchings-3
Control: reopen -1
Control: reassign -1 src:linux,rootskel
Control: severity -1 serious

(Don't know if this is a blocker for the release, but it should at
least be reviewed before we release IMO, hence the severity)

On Sun, Apr 07, 2019 at 12:53:35AM +0100, Ben Hutchings wrote:

> On Sat, 2019-04-06 at 21:33 +0200, John Paul Adrian Glaubitz wrote:
> > On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> > > My suspicion is that the support multiple consoles in parallel [2] introduced
> > > this particular regression. I haven't done any debugging yet though as I'm
> > > not sure where to start, I haven't touched the rootskel package before and
> > > therefore would be interested in any pointers how to debug this.
> >
> > The problem seems to be the fact that the sparc64 kernel uses different names
> > for /proc/console and the actual console name:
> >
> > root@landau:~# cat /proc/consoles
> > ttyHV0               -W- (EC p  )    4:64
> > tty0                 -WU (E     )    4:1
> > root@landau:~# readlink /sys/dev/char/4:64
> > ../../devices/root/f0299a70/f029b788/tty/ttyS0
>
> The inconsistent name seems like a kernel bug...
>
> > root@landau:~#
> >
> > And this is what used to make it work [1]:
> >
> >    *) # >= 2.6.38
> > console_major_minor="$(get-real-console-linux)"
> > console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
> > console="${console_raw##*/}"
> > ;;
>
> So maybe rootskel should use that again, but applied to each console's
> char device number.
>
> (Though directly using the symlinks under /dev/char seems cleaner than
> poking in sysfs.)

Just got a report in #debian-cd of a user running into this issue on
s390x with Hercules; a subset of the messages sent in conversation are
below:

[20:12:18]  <gruetzkopf> steal-ctty: No such file or directory
[20:12:29]  <gruetzkopf> will go hunt this down once i find time
[20:12:52]  <gruetzkopf> (DI buster RC2 / s390x)
[21:52:40]  <jrtc27> gruetzkopf: cat /proc/consoles ?
[21:54:00]  <jrtc27> should give something like:
[21:54:00]  <jrtc27> ttyS0                -W- (EC p  )    4:64
[21:54:22]  <jrtc27> rootskel will prefer a console which has the C flag
[21:55:17]  <gruetzkopf> now let's see how to get there
[21:55:57]  <gruetzkopf> (note: running in hercules, not real hw or qemu where i'd have virtio console)
[22:01:39]  <gruetzkopf> cat /proc/consoles
[22:01:40]  <gruetzkopf> ttyS0                -W- (EC p  )    4:64
[22:02:05]  <jrtc27> and ls -l /dev/ttyS0?
[22:03:06]  <gruetzkopf> ls: /dev/ttyS0: No such file or directory
[22:03:06]  <gruetzkopf> oh, fun!
[22:04:36]  <jrtc27> and ls -l /sys/dev/char/4:64 ?
[22:06:06]  <gruetzkopf> ls -l /sys/dev/char/4:64
[22:06:06]  <gruetzkopf> lrwxrwxrwx    1 root     root             0 Jun 26 21:05 /sys/dev/char/4:64 -> .
[22:06:06]  <gruetzkopf> ./../devices/virtual/tty/sclp_line0
[22:06:28]  <jrtc27> ok, so, it's not /dev/ttyS0, it's /dev/sclp_line0?
[22:06:32]  <jrtc27> (does that exist?)
[22:06:48]  <jrtc27> we had an issue like this on sparc64 (#926539)
[22:07:38]  <gruetzkopf> i just found that
[22:07:53]  <jrtc27> does that device node exist for you?
[22:08:13]  <gruetzkopf> crw--w----    1 root     root        4,  64 Jun 26 20:58 /dev/sclp_line0
[22:08:43]  <gruetzkopf> (and so does /dev/ttysclp0)

This is the "fault" of drivers/s390/char/sclp_tty.c. I don't know what
the best fix is; we could also patch the kernel to ensure this shows up
as /dev/sclp_line0 in /proc/consoles like sparc64 now does for sunhv,
but I worry now that this might be a game of whack-a-mole and there are
other character device drivers out there that also suffer from this.
Perhaps therefore we need to go back to looking up the device name from
the device number as has been suggested already...

James