Years behind schedule I finally got around to replacing ISC DHCP with Kea DHCP so I could finally have proper IPv6 host reservations. What I just learned, and should have learned years ago, that several of my motherboards such as the Supermicro A1SAi and Intel NUC while they support UEFI PXE booting, they do not support TFTP servers outside of their local /64 network. Doh! They will happily get an address via DHCPv6 on a DHCPv6 server on another network via a relay, that’s not a problem, but if the TFTP server is not on the same LAN the NBP download process times out and fails. It would seem that Linkedin learned this years ago too. This is similar in effect to my misconfigured DHCP server the other day, but not the same cause.
The only solution is to either have a TFTP server on the same LAN as the target system, or keeping around legacy IPv4 networking so that the target system can use UEFI IPV4 PXE to boot something like syslinux.efi, or GRUB2, or iPXE, which in turn has IPv6 support, and can finish downloading the kernel and initramfs over IPv6.
At first I thought I was doing something wrong in Kea (and I verified this with the old ISC DHCP), but no, packet captures prove that during UEFI PXE boot the system is making zero effort to send out Router Solicitations. It also tries to do Neighbor Discovery for IPv6 addresses that it should be sending to the default gateway, which implies it’s not honoring Router Advertisements that tell the system its prefix and prefix length. Or, it has some wild ideas as what it thinks are “on-link”, which is how IPv6 determines if something is on the same L2 network.
An example
Here’s a target system, Supermicro A1SAi-2550 with MAC address 0c:c4:7a:32:27:6, trying to UEFI PXE boot over IPv6:
First, the Kea DHCP6 server configuration, just says here your IP address is 2001:470:8122:1::9, and go fetch grub2 using tftp at 2001:470:1f05:2c9::10:
"client-classes": [ { "name": "grub2_tftp_efi", "test": "option[61].hex == 0x0007", "option-data": [ { "name": "bootfile-url", "data": "tftp://[2001:470:1f05:2c9::10]/efi/bootx64.efi" } ... ... "subnet6": [ ... ... "hostname": "basic09.wann.net", "hw-address": "0c:c4:7a:32:27:6c", "ip-addresses": [ "2001:470:8122:1::9" ], "client-classes": [ "ikeacluster" ] ...
On boot, this is displayed on console:
>>Checking Media Presence...... >>Media Present...... >>Start PXE over IPv6.. Station IP address is 2001:470:8122:1:0:0:0:9 ....long 20 second wait... Server IP address is 2001:470:1F05:2C9:0:0:0:10 NBP filename is efi/bootx64.efi NBP filesize is 0 Bytes PXE-E18: Server response timeout.
This tells us the target system did a successful DHCPv6 Solicit/Advertise/Request/Reply (S.A.R.R.) to Kea, it understood the bootname-url option in the DHCP6 response. But then got zero bytes.
From the standpoint of the DHCPv6 and TFTP servers, there’s not much to see. The SARR process happens, and that’s it. Nothing tries to hit the tftp server at all.
From a packet capture of the router (:89:f0) facing the Supermicro system (:27:6c) we see:
Solicit XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c ok Advertise XID: 0xe8a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9 ok Request XID: Oxe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1: :9 ok Reply XID: 0xe9a7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9 ok Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c what! Neighbor Solicitation for fe80::ec4:7aff:fe32:276c from fc:ec:da:4a:89:f0 < router :f0 asks who :6c is Neighbor Advertisement fe80::ec4:7aff:fe32:276c (sol, ovr) is at 0c:c4:7a:32:27:6c < :6c replies Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:6c what! Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 what! Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c: c4:7a:32:27:6c Neighbor Solicitation for fe80::feec:daff:fe4a:89f0 from Oc:c4:7a:32:27:6c < :6c unicast-asks who's :89:f0 Neighbor Advertisement fe80::feec:daff:fe4a:89f0 (rtr, sol) < :f0 replies I am he, also I'm a router Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from 0c:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:C4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Neighbor Solicitation for 2001:470:1f05:2c9::10 from Oc:c4:7a:32:27:60 Release XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c IAA: 2001:470:8122:1::9 < :6c I give up Reply XID: Oxeaa7e3 CID: 000100013875b6020cc47a32276c
We see the Supermicro go through the whole SARR process. DHCP6 by design does not carry any router details or subnet/prefix information design. It’s up to the target system to listen for Router Advertisements to find the prefix of the associated subnet of the LAN. In other words, the Supermicro assumes it is 2001:470:8122:1::9/128 until something tells it otherwise. Here the Supermicro did not make any sort of Router Solicitation. I’ve filtered it out for brevity, but the router was indeed sending out RAs every 4 seconds so it had ample time and had at least 5 go by in this time frame.
My hypothesis is that maybe the IP stack did receive an RA but just decided everything is “on-link” anyways? Or has a wildly wrong prefix misconfigured and thinks everything in the world is on the same network. For giggles I did try to fetch from a Comcast 2601:646:: network so it’s in at least a different /14, didn’t help. In any event, the Supermicro starts sending out Neighbor Discovery requests for the TFTP server at 2001:470:1f05:2c9::10 over and over, which is a completely different subnet on a completely different LAN.
It tries this for many seconds and eventually gives up. It’s nice enough to release the DHCPv6 lease before it returns to the boot menu.
How to fix?
I don’t know if there is a fix for this, at least one available to me. I’ve already tried upgrading the Supermicro BIOS which jumped it way ahead from a 2014 vintage to 2019. I’m sure Supermicro’s solution is “buy something newer”.
In the meantime I’m going to go back to booting GRUB2 over IPv4 and be mad about it.
A peek inside PXE – TianoCore EDK II
Googling for anything related to PXE booting is futile. Pages and pages of people way off the mark and no real definitive information. The UEFI 2.1 and 2.7 Implementation specifications are useful, they go into a lot of detail as to what should happen, but it’s up to others to actually write the code. Somehow I did stumble upon TianoCore EDK II, from what I gather was Intel’s original EFI reference code that was open sourced and now has grown into its own reference UEFI codebase. TianoCore is the community, EDK II is the reference implementation.
I have no idea if Supermicro’s UEFI code is based off of EDK, it seems fairly sorta similar from what little I can see at least. Maybe not, because EDK supports UEFI HTTP boot and my Supermicro doesn’t. I give the Intel NUC more possibility that it could be using code from the same pedigree.
EDK II code is a fascinating read, especially the NetworkPkg/UefiPxeBcDxe code that shows an actual PXE implementation, “Start PXE over IPv6” and all. It answers a few questions, such as what’s the real format for bootfile-url options in DHCP (tftp://ip.address./path/path/file), or why the leading / slash gets chopped off paths, or the variety of code paths that get you to different PXE-Exx error codes.
Another cute thing I learned from the EDK code, and I’ve seen it on the NUC, is that every dot it prints after “Start PXE over IPv6” means the stack has sent a packet on the network.