Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Source: rootskel
Version: 1.128
Severity: important
User: [hidden email]
Usertags: sparc64

Hello!

I built updated installation images [1] for Debian Ports today and tested
the sparc64 image on our SPARC T5 in an LDOM.

Unfortunately, it seems that the recent changes to rootskel broke the
serial console on sparc64 in d-i. The kernel boots fine but d-i never
starts, the boot stops with:

steal-ctty: No such file or directory

My suspicion is that the support multiple consoles in parallel [2] introduced
this particular regression. I haven't done any debugging yet though as I'm
not sure where to start, I haven't touched the rootskel package before and
therefore would be interested in any pointers how to debug this.

Thanks,
Adrian

> [1] https://cdimage.debian.org/cdimage/ports/2019-04-06/
> [2] https://salsa.debian.org/installer-team/rootskel/commit/b6048aafed7d73ba42da04d6f7a798710f271384

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> My suspicion is that the support multiple consoles in parallel [2] introduced
> this particular regression. I haven't done any debugging yet though as I'm
> not sure where to start, I haven't touched the rootskel package before and
> therefore would be interested in any pointers how to debug this.

The problem seems to be the fact that the sparc64 kernel uses different names
for /proc/console and the actual console name:

root@landau:~# cat /proc/consoles
ttyHV0               -W- (EC p  )    4:64
tty0                 -WU (E     )    4:1
root@landau:~# readlink /sys/dev/char/4:64
../../devices/root/f0299a70/f029b788/tty/ttyS0
root@landau:~#

And this is what used to make it work [1]:

            *) # >= 2.6.38
                console_major_minor="$(get-real-console-linux)"
                console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
                console="${console_raw##*/}"
                ;;

Adrian

> [1] https://salsa.debian.org/installer-team/rootskel/blob/cb7db898f58f14c04b9d60351811cbae71b49a07/src/sbin/reopen-console-linux#L21

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Ben Hutchings-3
On Sat, 2019-04-06 at 21:33 +0200, John Paul Adrian Glaubitz wrote:

> On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> > My suspicion is that the support multiple consoles in parallel [2] introduced
> > this particular regression. I haven't done any debugging yet though as I'm
> > not sure where to start, I haven't touched the rootskel package before and
> > therefore would be interested in any pointers how to debug this.
>
> The problem seems to be the fact that the sparc64 kernel uses different names
> for /proc/console and the actual console name:
>
> root@landau:~# cat /proc/consoles
> ttyHV0               -W- (EC p  )    4:64
> tty0                 -WU (E     )    4:1
> root@landau:~# readlink /sys/dev/char/4:64
> ../../devices/root/f0299a70/f029b788/tty/ttyS0
The inconsistent name seems like a kernel bug...

> root@landau:~#
>
> And this is what used to make it work [1]:
>
>    *) # >= 2.6.38
> console_major_minor="$(get-real-console-linux)"
> console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
> console="${console_raw##*/}"
> ;;

So maybe rootskel should use that again, but applied to each console's
char device number.

(Though directly using the symlinks under /dev/char seems cleaner than
poking in sysfs.)

Ben.

--
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Hi Ben!

On 4/7/19 1:53 AM, Ben Hutchings wrote:
>> root@landau:~# cat /proc/consoles
>> ttyHV0               -W- (EC p  )    4:64
>> tty0                 -WU (E     )    4:1
>> root@landau:~# readlink /sys/dev/char/4:64
>> ../../devices/root/f0299a70/f029b788/tty/ttyS0
>
> The inconsistent name seems like a kernel bug...

Yes. I'm trying to convince Dave Miller to fix this.

Do you think we could carry a patch in src:linux for the time being?

>> root@landau:~#
>>
>> And this is what used to make it work [1]:
>>
>>    *) # >= 2.6.38
>> console_major_minor="$(get-real-console-linux)"
>> console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
>> console="${console_raw##*/}"
>> ;;
>
> So maybe rootskel should use that again, but applied to each console's
> char device number.
>
> (Though directly using the symlinks under /dev/char seems cleaner than
> poking in sysfs.)

I agree.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Ben Hutchings-3
On Tue, 2019-04-16 at 11:47 +0200, John Paul Adrian Glaubitz wrote:

> Hi Ben!
>
> On 4/7/19 1:53 AM, Ben Hutchings wrote:
> > > root@landau:~# cat /proc/consoles
> > > ttyHV0               -W- (EC p  )    4:64
> > > tty0                 -WU (E     )    4:1
> > > root@landau:~# readlink /sys/dev/char/4:64
> > > ../../devices/root/f0299a70/f029b788/tty/ttyS0
> >
> > The inconsistent name seems like a kernel bug...
>
> Yes. I'm trying to convince Dave Miller to fix this.
>
> Do you think we could carry a patch in src:linux for the time being?
[...]

I would rather not do that until it's accepted, as if it that doesn't
happen we either have to switch back or carry it forever.

Ben.

--
Ben Hutchings
Make three consecutive correct guesses and you will be considered
an expert.



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
On 4/16/19 1:16 PM, Ben Hutchings wrote:
>> Do you think we could carry a patch in src:linux for the time being?
> [...]
>
> I would rather not do that until it's accepted, as if it that doesn't
> happen we either have to switch back or carry it forever.

Hmm, okay. Then I don't really have a way of building updated images
now for the time being.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
In reply to this post by Ben Hutchings-3
Control: reassign -1 src:linux
Control: tags -1 patch

On 4/16/19 1:16 PM, Ben Hutchings wrote:
>> Do you think we could carry a patch in src:linux for the time being?
> [...]
>
> I would rather not do that until it's accepted, as if it that doesn't
> happen we either have to switch back or carry it forever.

My patch has been merged upstream now and is planned for -stable [1].

Attaching the patch.

Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc.git/commit/?id=07a6d63eb1b54b5fb38092780fe618dfe1d96e23

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

0001-sunhv-Fix-device-naming-inconsistency-between-sunhv_.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
Hi!

On 6/14/19 7:55 AM, John Paul Adrian Glaubitz wrote:
> My patch has been merged upstream now and is planned for -stable [1].

It's now part of the 4.19 [1] and 5.1 [2] stable queues, so I guess we just
have to wait a little now.

@Ben: Can you make sure this bug gets closed with the next stable upload?

Thanks!
Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=cc95841f3511b943ad72133e67a105008839ead2
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=176eeebcbf771062473c8f751fa2adb4a8baebb6

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Jessica Clarke
In reply to this post by Ben Hutchings-3
Control: reopen -1
Control: reassign -1 src:linux,rootskel
Control: severity -1 serious

(Don't know if this is a blocker for the release, but it should at
least be reviewed before we release IMO, hence the severity)

On Sun, Apr 07, 2019 at 12:53:35AM +0100, Ben Hutchings wrote:

> On Sat, 2019-04-06 at 21:33 +0200, John Paul Adrian Glaubitz wrote:
> > On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> > > My suspicion is that the support multiple consoles in parallel [2] introduced
> > > this particular regression. I haven't done any debugging yet though as I'm
> > > not sure where to start, I haven't touched the rootskel package before and
> > > therefore would be interested in any pointers how to debug this.
> >
> > The problem seems to be the fact that the sparc64 kernel uses different names
> > for /proc/console and the actual console name:
> >
> > root@landau:~# cat /proc/consoles
> > ttyHV0               -W- (EC p  )    4:64
> > tty0                 -WU (E     )    4:1
> > root@landau:~# readlink /sys/dev/char/4:64
> > ../../devices/root/f0299a70/f029b788/tty/ttyS0
>
> The inconsistent name seems like a kernel bug...
>
> > root@landau:~#
> >
> > And this is what used to make it work [1]:
> >
> >    *) # >= 2.6.38
> > console_major_minor="$(get-real-console-linux)"
> > console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
> > console="${console_raw##*/}"
> > ;;
>
> So maybe rootskel should use that again, but applied to each console's
> char device number.
>
> (Though directly using the symlinks under /dev/char seems cleaner than
> poking in sysfs.)

Just got a report in #debian-cd of a user running into this issue on
s390x with Hercules; a subset of the messages sent in conversation are
below:

[20:12:18]  <gruetzkopf> steal-ctty: No such file or directory
[20:12:29]  <gruetzkopf> will go hunt this down once i find time
[20:12:52]  <gruetzkopf> (DI buster RC2 / s390x)
[21:52:40]  <jrtc27> gruetzkopf: cat /proc/consoles ?
[21:54:00]  <jrtc27> should give something like:
[21:54:00]  <jrtc27> ttyS0                -W- (EC p  )    4:64
[21:54:22]  <jrtc27> rootskel will prefer a console which has the C flag
[21:55:17]  <gruetzkopf> now let's see how to get there
[21:55:57]  <gruetzkopf> (note: running in hercules, not real hw or qemu where i'd have virtio console)
[22:01:39]  <gruetzkopf> cat /proc/consoles
[22:01:40]  <gruetzkopf> ttyS0                -W- (EC p  )    4:64
[22:02:05]  <jrtc27> and ls -l /dev/ttyS0?
[22:03:06]  <gruetzkopf> ls: /dev/ttyS0: No such file or directory
[22:03:06]  <gruetzkopf> oh, fun!
[22:04:36]  <jrtc27> and ls -l /sys/dev/char/4:64 ?
[22:06:06]  <gruetzkopf> ls -l /sys/dev/char/4:64
[22:06:06]  <gruetzkopf> lrwxrwxrwx    1 root     root             0 Jun 26 21:05 /sys/dev/char/4:64 -> .
[22:06:06]  <gruetzkopf> ./../devices/virtual/tty/sclp_line0
[22:06:28]  <jrtc27> ok, so, it's not /dev/ttyS0, it's /dev/sclp_line0?
[22:06:32]  <jrtc27> (does that exist?)
[22:06:48]  <jrtc27> we had an issue like this on sparc64 (#926539)
[22:07:38]  <gruetzkopf> i just found that
[22:07:53]  <jrtc27> does that device node exist for you?
[22:08:13]  <gruetzkopf> crw--w----    1 root     root        4,  64 Jun 26 20:58 /dev/sclp_line0
[22:08:43]  <gruetzkopf> (and so does /dev/ttysclp0)

This is the "fault" of drivers/s390/char/sclp_tty.c. I don't know what
the best fix is; we could also patch the kernel to ensure this shows up
as /dev/sclp_line0 in /proc/consoles like sparc64 now does for sunhv,
but I worry now that this might be a game of whack-a-mole and there are
other character device drivers out there that also suffer from this.
Perhaps therefore we need to go back to looking up the device name from
the device number as has been suggested already...

James

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

Ivo De Decker-3
Hi,

On Wed, Jun 26, 2019 at 10:18:37PM +0100, James Clarke wrote:

> (Don't know if this is a blocker for the release, but it should at
> least be reviewed before we release IMO, hence the severity)
>
> On Sun, Apr 07, 2019 at 12:53:35AM +0100, Ben Hutchings wrote:
> > On Sat, 2019-04-06 at 21:33 +0200, John Paul Adrian Glaubitz wrote:
> > > On 4/6/19 6:46 PM, John Paul Adrian Glaubitz wrote:
> > > > My suspicion is that the support multiple consoles in parallel [2] introduced
> > > > this particular regression. I haven't done any debugging yet though as I'm
> > > > not sure where to start, I haven't touched the rootskel package before and
> > > > therefore would be interested in any pointers how to debug this.
> > >
> > > The problem seems to be the fact that the sparc64 kernel uses different names
> > > for /proc/console and the actual console name:
> > >
> > > root@landau:~# cat /proc/consoles
> > > ttyHV0               -W- (EC p  )    4:64
> > > tty0                 -WU (E     )    4:1
> > > root@landau:~# readlink /sys/dev/char/4:64
> > > ../../devices/root/f0299a70/f029b788/tty/ttyS0
> >
> > The inconsistent name seems like a kernel bug...
> >
> > > root@landau:~#
> > >
> > > And this is what used to make it work [1]:
> > >
> > >    *) # >= 2.6.38
> > > console_major_minor="$(get-real-console-linux)"
> > > console_raw="$(readlink "/sys/dev/char/${console_major_minor}")"
> > > console="${console_raw##*/}"
> > > ;;
> >
> > So maybe rootskel should use that again, but applied to each console's
> > char device number.
> >
> > (Though directly using the symlinks under /dev/char seems cleaner than
> > poking in sysfs.)
>
> Just got a report in #debian-cd of a user running into this issue on
> s390x with Hercules; a subset of the messages sent in conversation are
> below:
>
> [20:12:18]  <gruetzkopf> steal-ctty: No such file or directory
> [20:12:29]  <gruetzkopf> will go hunt this down once i find time
> [20:12:52]  <gruetzkopf> (DI buster RC2 / s390x)
> [21:52:40]  <jrtc27> gruetzkopf: cat /proc/consoles ?
> [21:54:00]  <jrtc27> should give something like:
> [21:54:00]  <jrtc27> ttyS0                -W- (EC p  )    4:64
> [21:54:22]  <jrtc27> rootskel will prefer a console which has the C flag
> [21:55:17]  <gruetzkopf> now let's see how to get there
> [21:55:57]  <gruetzkopf> (note: running in hercules, not real hw or qemu where i'd have virtio console)
> [22:01:39]  <gruetzkopf> cat /proc/consoles
> [22:01:40]  <gruetzkopf> ttyS0                -W- (EC p  )    4:64
> [22:02:05]  <jrtc27> and ls -l /dev/ttyS0?
> [22:03:06]  <gruetzkopf> ls: /dev/ttyS0: No such file or directory
> [22:03:06]  <gruetzkopf> oh, fun!
> [22:04:36]  <jrtc27> and ls -l /sys/dev/char/4:64 ?
> [22:06:06]  <gruetzkopf> ls -l /sys/dev/char/4:64
> [22:06:06]  <gruetzkopf> lrwxrwxrwx    1 root     root             0 Jun 26 21:05 /sys/dev/char/4:64 -> .
> [22:06:06]  <gruetzkopf> ./../devices/virtual/tty/sclp_line0
> [22:06:28]  <jrtc27> ok, so, it's not /dev/ttyS0, it's /dev/sclp_line0?
> [22:06:32]  <jrtc27> (does that exist?)
> [22:06:48]  <jrtc27> we had an issue like this on sparc64 (#926539)
> [22:07:38]  <gruetzkopf> i just found that
> [22:07:53]  <jrtc27> does that device node exist for you?
> [22:08:13]  <gruetzkopf> crw--w----    1 root     root        4,  64 Jun 26 20:58 /dev/sclp_line0
> [22:08:43]  <gruetzkopf> (and so does /dev/ttysclp0)
>
> This is the "fault" of drivers/s390/char/sclp_tty.c. I don't know what
> the best fix is; we could also patch the kernel to ensure this shows up
> as /dev/sclp_line0 in /proc/consoles like sparc64 now does for sunhv,
> but I worry now that this might be a game of whack-a-mole and there are
> other character device drivers out there that also suffer from this.
> Perhaps therefore we need to go back to looking up the device name from
> the device number as has been suggested already...

This bug wasn't fixed in time for buster. Is it still present in bullseye? If
so, it might be good to try to fix it this time.

Cheers,

Ivo

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on at least sparc64

John Paul Adrian Glaubitz
In reply to this post by John Paul Adrian Glaubitz
Control: reopen -1

On 3/28/20 6:16 PM, John Paul Adrian Glaubitz wrote:
> On 3/28/20 5:39 PM, Ivo De Decker wrote:
>> This bug wasn't fixed in time for buster. Is it still present in bullseye? If
>> so, it might be good to try to fix it this time.
>
> I fixed the bug upstream [1], so we can safely close the issue here.

Ah, I just realized this bug also affected s390x. Sorry, I will reopen it.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on s390x

Valentin Vidic-4
In reply to this post by John Paul Adrian Glaubitz
Similar change for console name on s390x was not accepted:

  https://lkml.org/lkml/2020/5/19/854

so please fix in rootskel.

--
Valentin

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on s390x

John Paul Adrian Glaubitz
In reply to this post by John Paul Adrian Glaubitz
On 5/20/20 11:17 AM, John Paul Adrian Glaubitz wrote:
> I don't see any discussion in this thread. I would like to know the reasoning
> why kernel upstream thinks that this naming inconsistency is correct. It
> makes no sense, in my opinion and it can potentially trigger more problems.

Ah, sorry. I was seeing the cached version of the thread, refreshing helped.

In any case, the SPARC kernel maintainer (Dave Miller) had the same argument
that it would potentially break existing setups but eventually I could
convince him that the change was right.

Not sure which distributions he has in mind.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on s390x

Valentin Vidic-4
On Wed, May 20, 2020 at 11:19:53AM +0200, John Paul Adrian Glaubitz wrote:
> Ah, sorry. I was seeing the cached version of the thread, refreshing helped.
>
> In any case, the SPARC kernel maintainer (Dave Miller) had the same argument
> that it would potentially break existing setups but eventually I could
> convince him that the change was right.
>
> Not sure which distributions he has in mind.

It is hard to tell, but it seems the current state is hardcoded
in different places:

https://www.redhat.com/archives/libguestfs/2017-May/msg00068.html
https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lhdd/lhdd_r_console_sum.html

I think it would be better to make debian-installer smarter about
this since we will probably run into the same problem again with
a different architecture/driver.

--
Valentin

Reply | Threaded
Open this post in threaded view
|

Bug#926539: rootskel: steal-ctty no longer works on s390x

John Paul Adrian Glaubitz
In reply to this post by John Paul Adrian Glaubitz
On 5/20/20 1:18 PM, Philipp Kern wrote:
> But then I keep wondering how representative qemu is. Is VT220 SCLP even
> something you get on a real z machine? Not that we shouldn't fix qemu,
> of course. But Hercules might be closer to the real thing in this regard.

Hercules shows the exact same behavior. I also don't think the emulation
is relevant as the underlying issue is a naming inconsistency in the kernel
which is only present on s390x and used to be present on sparc64.

Adrian

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913