Adding steps to debian-installer before downloading debconf
I asked a similar question last week on debian-user, but I realize now that this is probably a better list to try. I apologize if this is the wrong place for this question; I notice most posts seem to be about pull requests.
tl;dr: I want to add some custom scripts and/or programs to the debian installer that execute after it configures networking, but before it attempts to download the debconf file. I've had limited success by injecting my program into the initrd image that pxeboot downloads from our local tftpboot server, replacing the existing /bin/netcfg binary, but this causes other problems. Not insurmountable ones, but ones I'd rather not have. I've tried tracing init scripts that get called by busybox and then debian-installer, but most seem to call precompiled binaries. I've tried digging through their source codes, but I'm not a C programmer, or even a software engineer. I'm a sysadmin tasked with finding generalized solutions for annoying, esoteric problems in our cluster.
The solution need not be an atomic step in the menu; the only requirement is that it happens after the netcfg detects the network hardware, and before debian-installer attempts the download the preseed/url specified in the pxeconfig file.
I work with a production cluster with a large number of eccentricities with regards to the networking configuration on any given host. This manifests as DHCP failing on any host where the interface where the link to a dhcp server is anything but the first one the kernel configures.
While playing around with it manually, I found that if I either let DHCP fail on the host, or disable networking autoconfig entirely in the pxeconfig file, I can get an IP address assigned by executing `dhclient` in a shell on the host. I can then select 'download
the preconfiguration file' manually and everything will work without issue from there. This tells me that it should be possible to automate this recovery scenario, as long as I can find the code that defines the order in which the steps are executed.
My hope is that I'll be able to inject scripts/custom programs into the initrd image or modify existing scripts to add the functionality I need.
The wiki for debian-installer declares that it was designed for modularity and customizability so that it can work on any hardware, no matter how esoteric. Most of the documentation I've found for modifying debian-installer though seems to center around adding drivers. My first attempt was to modify /init in the initrd to get a DHCP lease before `busybox init` is executed. The problem with this is that even if I set `netcfg/enable boolean false` in the pxeconfig, this lease is wiped out when netcfg detects networking hardware. But it was able to get a lease and resolve internal addresses before netcfg executed. I've also been able to reliably solve this problem with custom programs injected into the initrd that I execute when the problem happens, but that requires manual intervention, and is not something we consider acceptable for a permanent solution.
We've tried a number of solutions to this problem that have been suggested in previous posts: separate links for imaging and application traffic, manually setting the interface name to use for dhcp in the pxeconfig with `interface=fooX`, disabling unneeded interfaces in the system BIOS on affected hosts, and manually configuring the network interactively in the event that DHCP fails, and disabling all network devices on the host and installing our own NIC. Unfortunately, these solutions either will not work universally for every host we have in our cluster, cost too much to be scalable, or require more hands-on attention than we are willing to dedicate. We've considered replacing our imaging system with something that doesn't have this problem, but the work involved in that is not trivial. It may be the route we go if I can't find a solution within our current system.
The distros that we are currently installing are ubuntu trusty and bionic, as well as debian stretch. All 3 (and every ubuntu distro we've supported in the past) show the same behavior.
I appreciate any support, and apologize for both the length of this question. I do humbly and politely request that suggestions remain in the confines of what I'm asking for. I know that when presented with a weird problem where the proposed solutions are unorthodox or not immediately apparent, the urge is to find ways to work around solving the problem.
Re: Adding steps to debian-installer before downloading debconf
So, I found a workaround that wasn't as well documented as I had hoped. Describing it here in case some other hapless sysadmin ever has the same problem with debian installer not attempting dhcp over an interface other than eth0 and doesn't know what to do
If preseed.cfg exists in the root of the initrd, debian-installer will execute any commands listed there before it tries to download the preseed file listed in the pxeboot config. So, if that file happens to contain preseed commands to statically configure an IP address, guess what, it will do that.
Of course, this means having an IP config pre-generated in the initrd, right? You're right back to step 1 with needing to know network configuration for the host.
If you execute `dhclient` in the init script, before it executes `busybox init`, it will broadcast across all configurable network interfaces. If it gets a valid response, it will get an IP address. I mentioned this earlier as my first attempt at bypassing the netcfg dhcp autoconfig that is the root of all these problems. netcfg SO VERY HELPFULLY clears all running IP configurations when it detects interfaces, whether they be configured via dhcp or statically assigned with `ip addr add ...`. What netcfg DOES NOT clear is the contents of `/var/lib/dhcp.dhclient.leases`, which contains an interface name, ip address, hostname, gateway, nameservers, domain name, etc, that the dhcp server responded with.
From this file, one can parse out information needed for preseed/netcfg static IP configuration; which can be written as answers in preseed.cfg.
So, I made the following changes to the initrd:
add `dhclient -v` early in the init script
created a custom program to parse the contents of /var/lib/dhcp/dhclient.leases and generate preseed netcfg answers that get written to preseed.cfg in the initrd root. It can be written in whatever you want, I found it easiest to write it in golang and put the compiled binary in the initrd's /bin
added a line to init to execute this program immediately after the dhcp client finishes
removed /media in the initrd
That last step may seem perplexing. One really helpful behavior I discovered is that debian-installer will mount /dev/sda1 to /media if it reads a preseed.cfg file stored in the root of the initrd. I assume it does this because why would you want a preseed file on disk, unless you're on a liveCD, which will probably have more data in /media that is essential. This wouldn't be a problem if the partitioner didn't default to using /dev/sda, because why would anyone anywhere want to install their OS into the first or only disk present in a system? So, if /dev/sda1 exists and has a filesystem, it gets mounted to /media, and no combination of di late commands or partman options to forcibly unmount stuff seemed to work. It would always prompt me to unmount the partition before proceeding. Deleting the /media mountpoint means that it can't mount /dev/sda1 there. In a netboot initrd, /media is empty anyway.