Bypassing the 2/3/4GB virtual memory space on 32-bit ports

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Aurelien Jarno
[ debian-arm is Cced: as armel and armhf might be impacted in the
  future]
[ debian-devel is Cced: as i386 might be impacted in the future]
[ debian-release is Cced: as the release has to agree with the
  solution]


Hi all,

32-bit processes are able to address at maximum 4GB of memory (2^32),
and often less (2 or 3GB) due to architectural or kernel limitations.
There are many ways to support more than 4GB of memory on a system
(64-bit CPU and kernel, PAE, etc.), but in the end the limit per process
is unchanged.

As Debian builds packages natively, this 4GB limit also applies to
the toolchain, and it's not uncommon anymore to get a "virtual memory
exhausted" error when building big packages. Tricks have been used
to workaround that, like disabling debugging symbols or tweaking the
GCC garbage collector (ggc-min-expand option). This is the case for
example of firefox or many scientific packages. For leaf packages they
are usually left uncompiled on the corresponding architectures.

mips and mipsel are more affected by the issue as the virtual address
space is limited to 2GB. Therefore on those architectures, this issue
recently started to also affect core packages like ghc and rustc, and
the usual tricks are not working anymore. The case of ghc is interesting,
as the problem also now happens on non-official architectures like hppa
and x32. The *i386 architectures are not affected as they use the native
code generator. The armel and armhf architectures are not affected as
they use the LLVM code generator.

We are at a point were we should probably look for a real solution
instead of relying on tricks. Usually upstreams are not really
interested in fixing that issue [1]. The release team has made clear
that packages have to be built natively (NOT cross-built) [2]. Therefore
I currently see only two options:

1) Build a 64-bit compiler targeting the 32-bit corresponding
   architecture and install it in the 32-bit chroot with the other
   64-bit dependencies. This is still a kind of cross-compiler, but the
   rest of the build is unchanged and the testsuite can be run. I guess
   it *might* be something acceptable. release-team, could you please
   confirm?
   
   In the past it would have been enough to "just" do that for GCC, but
   nowadays, it will also be needed for rustc, clang and many more. The
   clang case is interesting as it is already a cross-compiler
   supporting all the architectures, but it default to the native
   target. I wonder if we should make mandatory the "-target" option,
   just like we do not call "gcc" anymore but instead "$(triplet)-gcc".
   Alternatively instead of creating new packages, we might just want
   to use the corresponding multiarch 64-bit package and use a wrapper
   to change the native target, ie passing -m32 to gcc or -target to
   clang.

2) Progressively drop 32-bit architectures when they are not able to
   build core packages natively anymore.

Any comments, ideas, or help here?

Regards,
Aurelien

[1] https://github.com/rust-lang/rust/issues/56888
[2] https://lists.debian.org/debian-release/2019/08/msg00215.html 

--
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
[hidden email]                 http://www.aurel32.net

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Ben Hutchings-3
On Thu, 2019-08-08 at 22:38 +0200, Aurelien Jarno wrote:
[...]

> 1) Build a 64-bit compiler targeting the 32-bit corresponding
>    architecture and install it in the 32-bit chroot with the other
>    64-bit dependencies. This is still a kind of cross-compiler, but the
>    rest of the build is unchanged and the testsuite can be run. I guess
>    it *might* be something acceptable. release-team, could you please
>    confirm?
>    
>    In the past it would have been enough to "just" do that for GCC, but
>    nowadays, it will also be needed for rustc, clang and many more. The
>    clang case is interesting as it is already a cross-compiler
>    supporting all the architectures, but it default to the native
>    target. I wonder if we should make mandatory the "-target" option,
>    just like we do not call "gcc" anymore but instead "$(triplet)-gcc".
>    Alternatively instead of creating new packages, we might just want
>    to use the corresponding multiarch 64-bit package and use a wrapper
>    to change the native target, ie passing -m32 to gcc or -target to
>    clang.
[...]
> Any comments, ideas, or help here?
[...]

1a. Require 32-bit build environments to be multiarch with the
    related 64-bit architecture also enabled.

Ben.

--
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed
                                                    - Carolyn Scheppner



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

John Paul Adrian Glaubitz
In reply to this post by Aurelien Jarno
Hi!

On 8/8/19 10:38 PM, Aurelien Jarno wrote:
> Any comments, ideas, or help here?
I'm by no means a GHC nor Haskell expert, but I think it should be generally
feasible to add native code generation support in GHC for all architectures
which are supported by LLVM.

According to a bug report I saw upstream [1], adding native support for GHC
through LLVM is comparably easy and might be a good options for mips*, riscv*,
s390x and sparc* which all are officially supported by LLVM but have no NGC
in GHC (with the exception of SPARC which has a 32-bit NGC that James Clarke is
currently porting to SPARC64 [2]).

I would be willing to support such a project financially (e.g. BountySource).

Adrian

> [1] https://gitlab.haskell.org/ghc/ghc/issues/16783
> [2] https://github.com/jrtc27/ghc/tree/rebased

--
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - [hidden email]
`. `'   Freie Universitaet Berlin - [hidden email]
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Aurelien Jarno
In reply to this post by Ben Hutchings-3
On 2019-08-08 22:23, Ben Hutchings wrote:

> On Thu, 2019-08-08 at 22:38 +0200, Aurelien Jarno wrote:
> [...]
> > 1) Build a 64-bit compiler targeting the 32-bit corresponding
> >    architecture and install it in the 32-bit chroot with the other
> >    64-bit dependencies. This is still a kind of cross-compiler, but the
> >    rest of the build is unchanged and the testsuite can be run. I guess
> >    it *might* be something acceptable. release-team, could you please
> >    confirm?
> >    
> >    In the past it would have been enough to "just" do that for GCC, but
> >    nowadays, it will also be needed for rustc, clang and many more. The
> >    clang case is interesting as it is already a cross-compiler
> >    supporting all the architectures, but it default to the native
> >    target. I wonder if we should make mandatory the "-target" option,
> >    just like we do not call "gcc" anymore but instead "$(triplet)-gcc".
> >    Alternatively instead of creating new packages, we might just want
> >    to use the corresponding multiarch 64-bit package and use a wrapper
> >    to change the native target, ie passing -m32 to gcc or -target to
> >    clang.
> [...]
> > Any comments, ideas, or help here?
> [...]
>
> 1a. Require 32-bit build environments to be multiarch with the
>     related 64-bit architecture also enabled.
Indeed, but that looks like the first step. From there do you think
a) the package is responsible for build-depending on the 64-bit
   toolchain and calling it with the right option to generate 32-bit
   binaries?
or
b) the build environment should be already configured to make the
   64-bit toolchain available transparently

I had option b) in mind, but option a) looks way easier to implement on
the infrastructure side, although a bit less on the packaging side. It
can also be a first step towards b). In that case we should also make
sure that using a 64-bit compiler doesn't switch the package build
system to a cross-compilation mode, where notably the testsuite is
disabled.

Aurelien

--
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
[hidden email]                 http://www.aurel32.net

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Aurelien Jarno
In reply to this post by John Paul Adrian Glaubitz
On 2019-08-08 23:57, John Paul Adrian Glaubitz wrote:

> Hi!
>
> On 8/8/19 10:38 PM, Aurelien Jarno wrote:
> > Any comments, ideas, or help here?
> I'm by no means a GHC nor Haskell expert, but I think it should be generally
> feasible to add native code generation support in GHC for all architectures
> which are supported by LLVM.
>
> According to a bug report I saw upstream [1], adding native support for GHC
> through LLVM is comparably easy and might be a good options for mips*, riscv*,
> s390x and sparc* which all are officially supported by LLVM but have no NGC
> in GHC (with the exception of SPARC which has a 32-bit NGC that James Clarke is
> currently porting to SPARC64 [2]).

Yes, that's clearly the way to go for the GHC issue. As a bonus it
greatly improves the performances.

That said it doesn't solve the problem in general, ie for rustc or the
future affected packages.

Aurelien

--
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
[hidden email]                 http://www.aurel32.net

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Ben Hutchings-3
In reply to this post by Aurelien Jarno
On Fri, 2019-08-09 at 00:28 +0200, Aurelien Jarno wrote:
> On 2019-08-08 22:23, Ben Hutchings wrote:
[...]

> > 1a. Require 32-bit build environments to be multiarch with the
> >     related 64-bit architecture also enabled.
>
> Indeed, but that looks like the first step. From there do you think
> a) the package is responsible for build-depending on the 64-bit
>    toolchain and calling it with the right option to generate 32-bit
>    binaries?
> or
> b) the build environment should be already configured to make the
>    64-bit toolchain available transparently
>
> I had option b) in mind, but option a) looks way easier to implement on
> the infrastructure side, although a bit less on the packaging side. It
> can also be a first step towards b).
Yes - if relatively few packages are hitting the limits, I think it
makes sense to implement (a) in the short term fix for them, then work
on (b) as the longer term solution.

> In that case we should also make
> sure that using a 64-bit compiler doesn't switch the package build
> system to a cross-compilation mode, where notably the testsuite is
> disabled.

Right.

Ben.

--
Ben Hutchings
If you seem to know what you are doing, you'll be given more to do.



signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Ivo De Decker-3
In reply to this post by Aurelien Jarno
Hi Aurelien,

On 8/8/19 10:38 PM, Aurelien Jarno wrote:

> 32-bit processes are able to address at maximum 4GB of memory (2^32),
> and often less (2 or 3GB) due to architectural or kernel limitations.

[...]

Thanks for bringing this up.

> 1) Build a 64-bit compiler targeting the 32-bit corresponding
>     architecture and install it in the 32-bit chroot with the other
>     64-bit dependencies. This is still a kind of cross-compiler, but the
>     rest of the build is unchanged and the testsuite can be run. I guess
>     it *might* be something acceptable. release-team, could you please
>     confirm?

As you noted, our current policy doesn't allow that. However, we could
certainly consider reevaluating this part of the policy if there is a
workable solution.

Some random notes (these are just my preliminary thoughts, not a new
release team policy):

- There would need to be a team of '32-bit porters' (probably
   overlapping with the porters for the remaining 32-bit architectures)
   who manage the changes to make and keep this working. Without a team
   committed to this, we can't really support this in a stable release.

- There would need to be a rough consensus that the solution is the way
   to go.

- The solution needs to work on the buildds. We still want all binaries
   to be built on the buildds.

- We are talking about having both native 32-bit and 64-bit packages in
   the same environment. We are NOT talking about emulated builds. The
   resulting (32-bit) binaries still need to run natively in the build
   environment.

- It's not our intention to lower the bar for architectures in testing.
   On the contrary. We intend to raise the bar at some point. As we
   already stated in the past, we would really prefer if more release
   architectures had some type of automated testing (piuparts,
   autopkgtests, archive rebuilds, etc). Eventually, this will probably
   become a requirement for release architectures.

- For architectures to be included in a future stable release, they
   still need to be in good enough shape. I won't go into everything
   involved in architecture qualification in this mail, but I do want to
   mention that the buildd capacity for mipsel/mips64el is quite limited.
   During the buster release cycle, they had trouble keeping up. If this
   continues, we might be forced to drop (one of) these architectures in
   the near future.

>     In the past it would have been enough to "just" do that for GCC, but
>     nowadays, it will also be needed for rustc, clang and many more. The
>     clang case is interesting as it is already a cross-compiler
>     supporting all the architectures, but it default to the native
>     target. I wonder if we should make mandatory the "-target" option,
>     just like we do not call "gcc" anymore but instead "$(triplet)-gcc".
>     Alternatively instead of creating new packages, we might just want
>     to use the corresponding multiarch 64-bit package and use a wrapper
>     to change the native target, ie passing -m32 to gcc or -target to
>     clang.

I think a solution based on multiarch packages would probably be nicer
than the mess of having packages for the 32-bit arch that contain the
64-bit compiler.

Thanks,

Ivo

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Simon McVittie-7
On Fri, 09 Aug 2019 at 14:31:47 +0200, Ivo De Decker wrote:
> On 8/8/19 10:38 PM, Aurelien Jarno wrote:
> > This is still a kind of cross-compiler
>
> As you noted, our current policy doesn't allow that.
...
> The resulting (32-bit) binaries still need to run natively in the build
> environment.

Am I right in thinking that the reason for both these requirements is that
when packages have build-time tests, you want them to be run; and you want
them to be run on real hardware, so that they will not incorrectly pass
(when they would have failed on real hardware) due to flaws in emulation?

> As we
> already stated in the past, we would really prefer if more release
> architectures had some type of automated testing (piuparts,
> autopkgtests, archive rebuilds, etc). Eventually, this will probably
> become a requirement for release architectures.

This seems like a good direction to be going in: not all packages can
be tested realistically at build-time, and expanding that to include
as-installed tests (autopkgtest) can only help to improve coverage.
In particular, relatively simple autopkgtests can often identify bugs of
the form "this package has always been compiled on (for example) mips,
but has never actually worked there, because it crashes on startup".

    smcv

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Ivo De Decker-3
In reply to this post by Ivo De Decker-3
Hi,

On 8/9/19 4:41 PM, Karsten Merker wrote:

> On Fri, Aug 09, 2019 at 02:31:47PM +0200, Ivo De Decker wrote:
>> Some random notes (these are just my preliminary thoughts, not a new release
>> team policy):
> [...]
>> - We are talking about having both native 32-bit and 64-bit packages in
>>    the same environment. We are NOT talking about emulated builds. The
>>    resulting (32-bit) binaries still need to run natively in the build
>>    environment.
>
> Hello,
>
> this requirement poses a significant difficulty: while 64bit x86
> CPUs can always execute 32bit x86 code, the same isn't
> necessarily the case on other architectures. On arm, there is no
> requirement that an arm64 system has to be able to execute 32bit
> arm code and in particular in the arm server space, i.e. on the
> kind of hardware that DSA wants to have for running buildds on,
> 64bit-only systems are becoming more and more common. On RISC-V
> the 64bit and 32bit ISAs have been disjunct forever, i.e. there
> is no riscv64 system that can natively execute riscv32 code. I
> don't know what the situation looks like in mips and ppc/power
> land but I wouldn't be surprised if developments would go into
> the same direction there.

To be clear:

This requirement would only apply for 32-bit ports that are built with
64-bit compilers. For ports that are built natively (as happens now)
nothing would change.

So if (say) armhf would be built using 64-bit (arm64) compilers, the
build environment for armhf would need to be able to run both the 64-bit
and 32-bit binaries natively.

For 64-bit architectures that are built natively (like arm64), there is
no requirement that these buildds should be able to run 32-bit binaries.

If (some of) the arm64 buildds would run on hardware that doesn't
support 32-bit, that would obviously mean that builds for armel and
armhf would have to be done on other hardware.

And for 32-bit architectures that are built natively, there is no
requirement that the buildds should be able to run 64-bit binaries
(assuming building on 32-bit still works).

I hope this clarifies what I meant.

Ivo

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
In reply to this post by Aurelien Jarno
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Thu, Aug 8, 2019 at 9:39 PM Aurelien Jarno <[hidden email]> wrote:

> We are at a point were we should probably look for a real solution
> instead of relying on tricks.

 *sigh* i _have_ been pointing out for several years now that this is
a situation that is going to get increasingly worse and worse, leaving
perfectly good hardware only fit for landfill.

 i spoke to Dr Stallman about the lack of progress here:
 https://sourceware.org/bugzilla/show_bug.cgi?id=22831

he expressed some puzzlement as the original binutils algorithm was
perfectly well capable of handling linking with far less resident
memory than was available at the time - and did *NOT*, just like gcc -
assume that virtual memory was "the way to go".  this because the
algorithm used in ld was written at a time when virtual memory was far
from adequate.

 then somewhere in the mid-90s, someone went "4GB is enough for
anybody" and ripped the design to shreds, making the deeply flawed and
short-sighted assumption that application linking would remain -
forever - below 640k^H^H^H^H4GB.

 now we're paying the price.

 the binutils-gold algorithm (with options listed in the bugreport, a
is *supposed* to fix this, however the ld-torture test that i created
shows that the binutils-gold algorithm is *also* flawed: it probably
uses mmap when it is in *no way* supposed to.

 binutils with the --no-keep-memory option actually does far better
than binutils-gold... in most circumstances.  however it also
spuriously fails with inexplicable errors.

 basically, somebody needs to actually properly take responsibility
for this and get it fixed.  the pressure will then be off: linking
will take longer *but at least it will complete*.

 i've written the ld-torture program - a random function generator -
so that it can be used to easily generate large numbers of massive c
source files that will hit well over the 4GB limit at link time.  so
it's easily reproducible.

 l.

p.s. no, going into virtual memory is not acceptable.  the
cross-referencing instantly creates a swap-thrash scenario, that will
put all and any builds into 10 to 100x the completion time.  any link
that goes into "thrash" will take 2-3 days to complete instead of an
hour.  "--no-keep-memory" is supposed to fix that, but it is *NOT* an
option on binutils-gold, it is *ONLY* available on the *original*
binutils-ld.

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
In reply to this post by Ivo De Decker-3
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Fri, Aug 9, 2019 at 1:49 PM Ivo De Decker <[hidden email]> wrote:

>
> Hi Aurelien,
>
> On 8/8/19 10:38 PM, Aurelien Jarno wrote:
>
> > 32-bit processes are able to address at maximum 4GB of memory (2^32),
> > and often less (2 or 3GB) due to architectural or kernel limitations.
>
> [...]
>
> Thanks for bringing this up.
>
> > 1) Build a 64-bit compiler targeting the 32-bit corresponding
> >     architecture and install it in the 32-bit chroot with the other
> >     64-bit dependencies. This is still a kind of cross-compiler, but the
> >     rest of the build is unchanged and the testsuite can be run. I guess
> >     it *might* be something acceptable. release-team, could you please
> >     confirm?
>
> As you noted, our current policy doesn't allow that. However, we could
> certainly consider reevaluating this part of the policy if there is a
> workable solution.

it was a long time ago: people who've explained it to me sounded like
they knew what they were talking about when it comes to insisting that
builds be native.

fixing binutils to bring back the linker algorithms that were
short-sightedly destroyed "because they're so historic and laughably
archaic why would we ever need them" should be the first and only
absolute top priority.

only if that catastrophically fails should other options be considered.

with the repro ld-torture code-generator that i wrote, and the amount
of expertise there is within the debian community, it would not
surprise me at all if binutils-ld could be properly fixed extremely
rapidly.

a proper fix would also have the advantage of keeping linkers for
*other* platforms (even 64 bit ones) out of swap-thrashing, saving
power consumption for build hardware and costing a lot less on SSD and
HDD regular replacements.

l.

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Aurelien Jarno
On 2019-08-09 16:26, Luke Kenneth Casson Leighton wrote:

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Fri, Aug 9, 2019 at 1:49 PM Ivo De Decker <[hidden email]> wrote:
> >
> > Hi Aurelien,
> >
> > On 8/8/19 10:38 PM, Aurelien Jarno wrote:
> >
> > > 32-bit processes are able to address at maximum 4GB of memory (2^32),
> > > and often less (2 or 3GB) due to architectural or kernel limitations.
> >
> > [...]
> >
> > Thanks for bringing this up.
> >
> > > 1) Build a 64-bit compiler targeting the 32-bit corresponding
> > >     architecture and install it in the 32-bit chroot with the other
> > >     64-bit dependencies. This is still a kind of cross-compiler, but the
> > >     rest of the build is unchanged and the testsuite can be run. I guess
> > >     it *might* be something acceptable. release-team, could you please
> > >     confirm?
> >
> > As you noted, our current policy doesn't allow that. However, we could
> > certainly consider reevaluating this part of the policy if there is a
> > workable solution.
>
> it was a long time ago: people who've explained it to me sounded like
> they knew what they were talking about when it comes to insisting that
> builds be native.
>
> fixing binutils to bring back the linker algorithms that were
> short-sightedly destroyed "because they're so historic and laughably
> archaic why would we ever need them" should be the first and only
> absolute top priority.
>
> only if that catastrophically fails should other options be considered.
>
> with the repro ld-torture code-generator that i wrote, and the amount
> of expertise there is within the debian community, it would not
> surprise me at all if binutils-ld could be properly fixed extremely
> rapidly.
>
> a proper fix would also have the advantage of keeping linkers for
> *other* platforms (even 64 bit ones) out of swap-thrashing, saving
> power consumption for build hardware and costing a lot less on SSD and
> HDD regular replacements.

That would only fix ld, which is only a small part of the issue. Do you
also have ideas about how to fix llvm, gcc or rustc which are also
affected by virtual memory exhaustion on 32-bit architectures?

--
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
[hidden email]                 http://www.aurel32.net

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
On Wed, Aug 14, 2019 at 5:13 PM Aurelien Jarno <[hidden email]> wrote:

> > a proper fix would also have the advantage of keeping linkers for
> > *other* platforms (even 64 bit ones) out of swap-thrashing, saving
> > power consumption for build hardware and costing a lot less on SSD and
> > HDD regular replacements.
>
> That would only fix ld, which is only a small part of the issue. Do you
> also have ideas about how to fix llvm, gcc or rustc which are also
> affected by virtual memory exhaustion on 32-bit architectures?

*deep breath* - no.  or, you're not going to like it: it's not a
technical solution, it's going to need a massive world-wide sustained
and systematic education campaign, written in reasonable and logical
language, explaining and advising GNU/Linux applications writers to
take more care and to be much more responsible about how they put
programs together.

a first cut at such a campaign would be:

* designing of core critical libraries to be used exclusively through
dlopen / dlsym.  this is just good library design practice in the
first place: one function and one function ONLY is publicly exposed,
returning a pointer to a table-of-functions (samba's VFS layer for
example [1]).
* compile-time options that use alternative memory-efficient
algorithms instead of performance-efficient ones
* compile-time options to remove non-essentlal resource-hungry features
* compiling options in Makefiles that do not assume that there is vast
amounts of memory available (KDE "developer" mode for example would
compile c++ objects individually whereas "maintainer" mode would
auto-generate a file that #included absolutely every single .cpp file
into one, because it's "quicker").
* potential complete redesigns using IPC/RPC modular architectures:
applying the "UNIX Philosophy" however doing so through a runtime
binary self-describing "system" specifically designed for that
purpose.  *this is what made Microsoft [and Apple] successful*.  that
means a strategic focus on getting DCOM for UNIX up and running [2].
god no please not d-bus [3] [4].

also, it's going to need to be made clear to people - diplomatically
but clearly - that whilst they're developing on modern hardware
(because it's what *they* can afford, and what *they* can get - in the
West), the rest of the world (particularly "Embedded" processors)
simply does not have the money or the resources that they do.

unfortunately, here, the perspective "i'm ok, i'm doing my own thing,
in my own free time, i'm not being paid to support *your* hardware" is
a legitimate one.

now, i'm not really the right person to head such an effort.  i can
*identify* the problem, and get the ball rolliing on a discussion:
however with many people within debian alone having the perspective
that everything i do, think or say is specifically designed to "order
people about" and "tell them what to do", i'm disincentivised right
from the start.

also, i've got a thousand systems to deliver as part of a
crowd-funding campaign [and i'm currently also dealing wiith designing
the Libre RISC-V CPU/GPU/VPU]

as of right now those thousand systems - 450 of them are going to have
to go out with Debian/Testing 8.  there's no way they can go out with
Debian 10.  why? because this first revision hardware - designed to be
eco-conscious - uses an Allwinner A20 and only has 2GB of RAM
[upgrades are planned - i *need* to get this first version out, first]

with Debian 10 requiring 4GB of RAM primarily because of firefox,
they're effectively useless if they're ever "upgraded"

that's a thousand systems that effectively go straight into landfill.

l.

[1] https://www.samba.org/samba/docs/old/Samba3-Developers-Guide/vfs.html#id2559133

[2] incredibly, Wine has had DCOM and OLE available and good enough to
use, for about ten years now.  it just needs "extracting" from the
Wine codebase. DCOM stops all of the arguments over APIs (think
"libboost".  if puzzled, add debian/testing and debian/old-stable to
/etc/apt/sources.lst, then do "apt-cache search boost | wc")

due to DCOM providing "a means to publish a runtime self-describing
language-independent interface", 30-year-old WIN32 OLE binaries for
which the source code has been irretrievably lost will *still work*
and may still be used, in modern Windows desktops today.  Mozilla
ripped out XPCOM because although it was "inspired" by COM, they
failed, during its initial design, to understand why Co-Classes exist.

as a result it caused massive ongoing problems for 3rd party java and
c++ users, due to binary incompatibility caused by changes to APIs on
major releases.  Co-Classes were SPECIFICALLY designed to stop EXACTLY
that problem... and Mozilla failed to add it to XPCOM.

bottom line: the free software community has, through "hating" on
microsoft, rejected the very technology that made microsoft so
successful in the first place.

Microsoft used DCOM (and OLE), Apple (thanks to Steve's playtime /
break doing NeXT) developed Objective-C / Objective-J / Objective-M
(dynamic runtime self-describing capabilities *built-in to the
compilers*) and built the Cocoa framework around it.  that both
efforts used runtime dynamic self-descriptive capabilities and that
both companies were a runaway dominant success is not a coincidence.

[3] d-bus.  *shakes head*.  when it first came out (about 15 years
ago?) i did a side-by-side review of the d-bus spec and the DCE/RPC
OpenGroup spec.  incredibly, the wording and technical capability was
*NINETY* percent identical.  except that d-bus was providing about 25%
of the *functionality* of DCE/RPC.  as you can see from the
XPCOM-vs-COM example above, it's that missing functionality - the lack
of strategic foresight - that causes the GNU/Linux community to miss a
golden opportunity.

[4] remember the hard lesson of OpenMoko.  on an ARM9 processor which,
when it context-switched it would throw out the entire L1 cache,
whereas Nokia was doing single-threaded applications and
heavily-optimised real-time OSes, the OpenMoko team designed
everything based around d-bus ("because it was shiny and new").

some interfaces did not even exist and were still being called.

on receipt of a phone call, the X11 "answer call" application took so
long to appear on-screen, and was so unresponsive thanks to d-bus
*and* X11 hammering the poor device into the ground, that even when
the buttons appeared appeared you still couldn't actually answer the
call.  some 30 seconds after the call had gone to voicemail, the OS
would "recover".

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Sam Hartman-3
>>>>> "Luke" == Luke Kenneth Casson Leighton <[hidden email]> writes:

    Luke> On Wed, Aug 14, 2019 at 5:13 PM Aurelien Jarno <[hidden email]> wrote:
    >> > a proper fix would also have the advantage of keeping linkers
    >> for > *other* platforms (even 64 bit ones) out of swap-thrashing,
    >> saving > power consumption for build hardware and costing a lot
    >> less on SSD and > HDD regular replacements.
    >>
    >> That would only fix ld, which is only a small part of the
    >> issue. Do you also have ideas about how to fix llvm, gcc or rustc
    >> which are also affected by virtual memory exhaustion on 32-bit
    >> architectures?

    Luke> *deep breath* - no.  or, you're not going to like it: it's not
    Luke> a technical solution, it's going to need a massive world-wide
    Luke> sustained and systematic education campaign, written in
    Luke> reasonable and logical language, explaining and advising
    Luke> GNU/Linux applications writers to take more care and to be
    Luke> much more responsible about how they put programs together.

Your entire argument is built on the premis that it is actually
desirable for these applications (compilers, linkers, etc) to work in
32-bit address spaces.


I'm not at all convinced that is true.
What you propose involves a lot of work for application writers and even
more for compiler/linker writers.

It seems simpler to decide that we'll build software on 64-bit
architectures.
That has some challenges for Debian because currently we don't  accept
cross-built binaries for the archive.

Long term, I kind of suspect it would be better for Debian to meet those
challenges and get to where we can cross-build for 32-bit architectures.

--Sam

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
On Mon, Aug 19, 2019 at 7:29 PM Sam Hartman <[hidden email]> wrote:

> Your entire argument is built on the premise that it is actually
> desirable for these applications (compilers, linkers, etc) to work in
> 32-bit address spaces.

that's right [and in another message in the thread it was mentioned
that builds have to be done natively.  the reasons are to do with
mistakes that cross-compiling, particularly during autoconf
hardware/feature-detection, can introduce *into the binary*.  with
40,000 packages to build, it is just far too much extra work to
analyse even a fraction of them]

at the beginning of the thread, the very first thing that was
mentioned was: is it acceptable for all of us to abdicate
responsibility and, "by default" - by failing to take that
responsibility - end up indirectly responsible for the destruction and
consignment to landfill of otherwise perfectly good [32-bit] hardware?

now, if that is something that you - all of you - find to be perfectly
acceptable, then please continue to not make the decision to take any
action, and come up with whatever justifications you see fit which
will help you to ignore the consequences.

that's the "tough, reality-as-it-is, in-your-face" way to look at it.

the _other_ way to look at is: "nobody's paying any of us to do this,
we're perfectly fine doing what we're doing, we're perfectly okay with
western resources, we can get nice high-end hardware, i'm doing fine,
why should i care??".

this perspective was one that i first encountered during a ukuug
conference on samba as far back as... 1998.  i was too shocked to even
answer the question, not least because everybody in the room clapped
at this utterly selfish, self-centered "i'm fine, i'm doing my own
thing, why should i care, nobody's paying us, so screw microsoft and
screw those stupid users for using proprietary software, they get
everything they deserve" perspective.

this very similar situation - 32-bit hardware being consigned to
landfill - is slowly and inexorably coming at us, being squeezed from
all sides not just by 32-bit hardware itself being completely useless
for actual *development* purposes (who actually still has a 32-bit
system as a main development machine?) it's being squeezed out by
advances in standards, processor speed, user expectations and much
more.

i *know* that we don't have - and can't use - 32-bit hardware for
primary development purposes.  i'm writing this on a 2.5 year old
gaming laptop that was the fastest high-end resourced machine i could
buy at the time (16GB RAM, 512mb NVMe, 3.6ghz quad-core
hyperthreaded).

and y'know what? given that we're *not* being paid by these users of
32-bit hardware - in fact most of us are not being paid *at all* -
it's not as unreasonable as it first sounds.

i am *keenly aware* that we volunteer our time, and are not paid
*anything remotely* close to what we should be paid, given the
responsibility and the service that we provide to others.

it is a huge "pain in the ass" conundrum, that leaves each of us with
a moral and ethical dilemma that we each *individually* have to face.

l.

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Sam Hartman-3
[trimming the cc]

>>>>> "Luke" == Luke Kenneth Casson Leighton <[hidden email]> writes:

    Luke> On Mon, Aug 19, 2019 at 7:29 PM Sam Hartman <[hidden email]> wrote:
    >> Your entire argument is built on the premise that it is actually
    >> desirable for these applications (compilers, linkers, etc) to
    >> work in 32-bit address spaces.

    Luke> that's right [and in another message in the thread it was
    Luke> mentioned that builds have to be done natively.  the reasons
    Luke> are to do with mistakes that cross-compiling, particularly
    Luke> during autoconf hardware/feature-detection, can introduce
    Luke> *into the binary*.  with 40,000 packages to build, it is just
    Luke> far too much extra work to analyse even a fraction of them]

    Luke> at the beginning of the thread, the very first thing that was
    Luke> mentioned was: is it acceptable for all of us to abdicate
    Luke> responsibility and, "by default" - by failing to take that
    Luke> responsibility - end up indirectly responsible for the
    Luke> destruction and consignment to landfill of otherwise perfectly
    Luke> good [32-bit] hardware?

I'd ask you to reconsider your argument style.  You're using very
emotionally loaded language, appeals to authority, and moralistic
language to create the impression that your way of thinking is the only
reasonable one.  Instead, let us have a discussion that respects
divergent viewpoints and that focuses on the technical trade offs
without using language like "abdicate responsibility," or implies those
that prefer nmap are somehow intellectually inferior rather than simply
viewing the trade offs different than you do.

I'm particularly frustrated that you spent your entire reply moralizing
and ignored the technical points I made.

As you point out there are challenges with cross building.
I even agree with you that we cannot address these challenges and get to
a point where we have confidence a large fraction of our software will
cross-build successfully.

But we don't need to address a large fraction of the source packages.
There are a relatively small fraction of the source packages that
require more than 2G of RAM to build.
Especially given that in the cases we care about we can (at least today)
arrange to natively run both host and target binaries, I think we can
approach limited cross-building in ways that  meet our needs.
Examples include installing cross-compilers for arm64 targeting arm32
into the arm32 build chroots when building arm32 on native arm64
hardware.
There are limitations to that we've discussed in the thread.

More generally, though, there are approaches that are less risky than
full cross building.  As an example, tools like distcc or making
/usr/bin/gcc be a 64-bit hosted 32-bitcross compiler may be a lot less
risky than typical cross building.  Things like distcc can even be used
to run a 64-bit compiler for arm32 even in environments where the arm64
arch cannot natively run arm32 code.

Yes, there's work to be done with all the above.
My personal belief is that the work I'm talking about is more tractable
than your proposal to significantly change how we think about cross
library linkage.

And ultimately, if no one does the work, then we will lose the 32-bit
architectures.

--Sam

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
On Tue, Aug 20, 2019 at 1:17 PM Sam Hartman <[hidden email]> wrote:

> I'd ask you to reconsider your argument style.

that's very reasonable, and appreciated the way that you put it.

> I'm particularly frustrated that you spent your entire reply moralizing
> and ignored the technical points I made.

ah: i really didn't (apologies for giving that impression).  i
mentioned that earlier in the thread, cross-building had been
mentioned, and (if memory serves correctly), the build team had
already said it wasn't something that should be done lightly.

> As you point out there are challenges with cross building.

yes.  openembedded, as one of the longest-standing
cross-compiling-capable distros that has been able to target sub-16MB
systems as well as modern desktops for two decades, deals with it in a
technically amazing way, including:

* the option to over-ride autoconf with specially-prepared config.sub
/ config.guess files
* the ability to compile through a command-line-driven hosted native
compiler *inside qemu*
* many more "tricks" which i barely remember.

so i know it can be done... it's just that, historically, the efforts
completely overwhelmed the (small) team, as the number of systems,
options and flexibility that they had to keep track of far exceeded
their resources.

> I even agree with you that we cannot address these challenges and get to
> a point where we have confidence a large fraction of our software will
> cross-build successfully.

sigh.

> But we don't need to address a large fraction of the source packages.
> There are a relatively small fraction of the source packages that
> require more than 2G of RAM to build.

... at the moment.  with there being a lack of awareness of the
consequences of the general thinking, "i have a 64 bit system,
everyone else must have a 64 bit system, 32-bit must be on its last
legs, therefore i don't need to pay attention to it at all", unless
there is a wider (world-wide) general awareness campaign, that number
is only going to go up, isn't it?


> Especially given that in the cases we care about we can (at least today)
> arrange to natively run both host and target binaries, I think we can
> approach limited cross-building in ways that  meet our needs.
> Examples include installing cross-compilers for arm64 targeting arm32
> into the arm32 build chroots when building arm32 on native arm64
> hardware.
> There are limitations to that we've discussed in the thread.

indeed.  and my (limited) torture-testing of ld, showed that it really
doesn't work reliably (i.e. there's bugs in binutils that are
triggered by large binaries greater than 4GB being linked *on 64-bit
systems*).

it's a mess.

> Yes, there's work to be done with all the above.
> My personal belief is that the work I'm talking about is more tractable
> than your proposal to significantly change how we think about cross
> library linkage.

i forgot to say: i'm thinking ahead over the next 3-10 years,
projecting the current trends.


> And ultimately, if no one does the work, then we will lose the 32-bit
> architectures.

... and i have a thousand 32-bit systems that i am delivering on a
crowdfunding campaign, the majority of which would go directly into
landfill.

l.

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Sam Hartman-3
>>>>> "Luke" == Luke Kenneth Casson Leighton <[hidden email]> writes:

    >> I even agree with you that we cannot address these challenges and
    >> get to a point where we have confidence a large fraction of our
    >> software will cross-build successfully.

    Luke> sigh.

I don't really see the need for a sigh.
I think we can address enough of the challenges that we are not
significantly harmed.

    >> But we don't need to address a large fraction of the source
    >> packages.  There are a relatively small fraction of the source
    >> packages that require more than 2G of RAM to build.

    Luke> ... at the moment.  with there being a lack of awareness of
    Luke> the consequences of the general thinking, "i have a 64 bit
    Luke> system, everyone else must have a 64 bit system, 32-bit must
    Luke> be on its last legs, therefore i don't need to pay attention
    Luke> to it at all", unless there is a wider (world-wide) general
    Luke> awareness campaign, that number is only going to go up, isn't
    Luke> it?

I'd rather say that over time, we'll get better at dealing with cross
building more things and 32-bit systems will become less common.
Eventually, yes, we'll get to a point where 32-bit systems are
infrequent enough and the runtime software needs have increased enough
that 32-bit general-purpose systems don't make sense.
They will still be needed for embedded usage.

There are Debian derivatives that already deal better with building
subsets of the archive for embedded uses.
Eventually, Debian itself will need to either give up on 32-bit entirely
or deal with more of that itself.

I think my concern about your approach is that you're trying to change
how the entire world thinks.  You're trying to convince everyone to be
conservative in how much (virtual) memory they use.

Except I think that a lot of people actually only do need to care about
64-bit environments with reasonable memory.  I think that will increase
over time.

I think that approaches that focus the cost of constrained environments
onto places where we need constrained environments are actually better.

There are cases where it's actually easier to write code assuming you
have lots of virtual memory.  Human time is one of our most precious
resources.  It's reasonable for people to value their own time.  Even
when people are aware of the tradeoffs, they may genuinely decide that
being able to write code faster and that is conceptually simpler is the
right choice for them.  And having a flat address space is often
conceptually simpler than having what amounts to multiple types/levels
of addressing.  In this sense, having an on-disk record store/database
and indexing that and having a library to access it is just a complex
addressing mechanism.

We see this trade off all over the place as memory mapped databases
compete with more complex relational databases which compete with nosql
databases which compete with sharded cloud databases that are spread
across thousands of nodes.  There are trade offs involving complexity of
code, time to write code, latency, overall throughput, consistency, etc.

How much effort we go to support 32-bit architectures as our datasets
(and building is just another dataset) grow is just the same trade offs
in miniture.  And choosing to write code quickly is often the best
answer.  It gets us code after all.

--Sam

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Luke Kenneth Casson Leighton
On Tue, Aug 20, 2019 at 2:52 PM Sam Hartman <[hidden email]> wrote:

> I think my concern about your approach is that you're trying to change
> how the entire world thinks.

that would be... how can i put it... an "incorrect" interpretation.  i
think globally - i always have.  i didn't start the NT Domains
Reverse-Engineering "because it would be fun", i did it because,
world-wide, i could see the harm that was being caused by the
polarisation between the Windows and UNIX worlds.

>  You're trying to convince everyone to be
> conservative in how much (virtual) memory they use.

not quite: i'm inviting people to become *aware of the consequences*
of *not* being conservative in how much (virtual) memory they use...
when the consequences of their focus on the task that is "today" and
is "right now", with "my resources and my development machine" are
extended to a global scale.

whether people listen or not is up to them.

> Except I think that a lot of people actually only do need to care about
> 64-bit environments with reasonable memory.  I think that will increase
> over time.
>
> I think that approaches that focus the cost of constrained environments
> onto places where we need constrained environments are actually better.
>
> There are cases where it's actually easier to write code assuming you
> have lots of virtual memory.

yes.  a *lot* easier.  LMDB for example simply will not work on files
that are larger than 4GB, because it uses shared-memory copy-on-write
B+-Trees (just like BTRFS).

...oops :)

> Human time is one of our most precious
> resources.  It's reasonable for people to value their own time.  Even
> when people are aware of the tradeoffs, they may genuinely decide that
> being able to write code faster and that is conceptually simpler is the
> right choice for them.

indeed.  i do recognise this.  one of the first tasks that i was given
at university was to write a Matrix Multiply function that could
(hypothetically) extend well beyond the size of virtual memory (let
alone physical memory).

"vast matrix multiply" is known to be such a hard problem that you
just... do... not... try it.  you use a math library, and that's
really the end of the discussion!

there are several other computer science problems that fall into this
category.  one of them is, ironically (given how the discussion
started) linking.

i really wish Dr Stallman's algorithms had not been ripped out of ld.


>  And having a flat address space is often
> conceptually simpler than having what amounts to multiple types/levels
> of addressing.  In this sense, having an on-disk record store/database
> and indexing that and having a library to access it is just a complex
> addressing mechanism.
>
> We see this trade off all over the place as memory mapped databases
> compete

... such as LMDB...

> with more complex relational databases which compete with nosql
> databases which compete with sharded cloud databases that are spread
> across thousands of nodes.  There are trade offs involving complexity of
> code, time to write code, latency, overall throughput, consistency, etc.
>
> How much effort we go to support 32-bit architectures as our datasets
> (and building is just another dataset) grow is just the same trade offs
> in miniture.  And choosing to write code quickly is often the best
> answer.  It gets us code after all.

indeed.

i do get it - i did say.  i'm aware that software libre developers
aren't paid, so it's extremely challenging to expect any change - at
all.  they're certainly not paid by the manufacturers of the hardware
that their software actually *runs* on.

i just... it's frustrating for me to think ahead, projecting where
things are going (which i do all the time), and see the train wreck
that has a high probability of occurring.

l.

Reply | Threaded
Open this post in threaded view
|

Re: Bypassing the 2/3/4GB virtual memory space on 32-bit ports

Sam Hartman-3
>>>>> "\Luke" == Luke Kenneth Casson Leighton <[hidden email]> writes:
Hi.
First, thanks for working with you.
I'm seeing a lot more depth into where you're coming from, and it is
greatly appreciated.
    \Luke> indeed.

    \Luke> i do get it - i did say.  i'm aware that software libre
    \Luke> developers aren't paid, so it's extremely challenging to
    \Luke> expect any change - at all.  they're certainly not paid by
    \Luke> the manufacturers of the hardware that their software
    \Luke> actually *runs* on.

    \Luke> i just... it's frustrating for me to think ahead, projecting
    \Luke> where things are going (which i do all the time), and see the
    \Luke> train wreck that has a high probability of occurring.

I'd like to better understand the train wreck you see.
What I see likely is that  the set of software that runs on 32-bit
arches will decrease over time, and the amount of time we'll spend
getting basic tools to work will increase.
We'll get some general approaches other folks have adopted into Debian
along the way.

Eventually, Debian itself will drop 32-bit arches.  I386 and proprietary
software and Steam will probably hold that off for a couple of releases.

32-bit support will continue for a bit beyond that in the Debian
ecosystem/Debian ports but with a decreasing fraction of the archive
building.

Meanwhile along the same path, there will be fewer 32-bit general
purpose systems in use.


Where is the train wreck?

12