Feed on
Posts
Comments

mellanox-flexboot-ipv6

If you have Mellanox ConnectX-3 or ConnectX-4 NICs in your servers, I discovered it’s possible to do IPv6 OS installations via PXE. FlexBoot is their on-board PXE implementation that ships on their NICs and it’s based on iPXE. It turns out that as of FlexBoot version 3.4.718 from January 2016 they’ve added beta IPv6 support. If you have a motherboard that doesn’t support UEFI IPv6 PXE, you can configure your system to boot FlexBoot from the expansion ROM instead. This will let you do netboots and OS installations over v6 natively and eliminates the need for chain loading or using PXELINUX.

The catch is that the option is off by default and you must enter the FlexBoot menu during boot (Ctrl-B) to enable it. The v6 beta support addition was mentioned in the FlexBoot release notes, but directions on how to actually enable it was buried in the PreBoot User Manual. sigh.

mellanox-flexboot-release-notes

FlexBoot 3.4.718 release notes

 

FlexBoot (http://mellanox.com) 04:00.0 3D00 PCI3.00 PnP PMM+74250020+74269020 C800
Press Ctrl-B to configure FlexBoot v3.4.718 (PCI 04:00.0)...

mellanox-flexboot-system-setup

FlexBoot system setup screen

 

mellanox-flexboot-net0-setup

FlexBoot net0 settings

 

mellanox-flexboot-ipv6

FlexBoot net0 NIC configuration, IPv4/IPv6 support

The NIC setting supports IPv4, IPv4+IPv6, and IPv6-only configurations. Unfortunately there doesn’t seem to be a way to configure this from an OS to automate turning this option on. The FlexBoot source code is available from Mellanox so maybe you can compile it with v6 support turned on and burn it to NICs. Hopefully after it comes out of beta this option will be on by default.

Note: When both v4 and v6 are configured, it sends out both a DHCPv4 request and a v6 router solicitation at the same time on boot. If you’re v6 only or your v4 DHCP server doesn’t respond it takes 10-20 seconds to eventually time out on v4 and proceed with the v6 address it gets.

That’s the tl;dr version of enabling v6 on the NIC.

Prerequisites

Anyways, for actually kickstart installing CentOS/Fedora/RHEL with this a few things need to happen. There’s a lot of options and ways to do this, depending how much of an existing kickstart server setup you have and how much IPv6-ready infrastructure you have already running.

For example, with iPXE/FlexBoot you can fetch kernels and ramdisks over HTTP, TFTP, or NFS, and can also download scripts to do fancy things before booting or downloading things.

At a minimum, you’ll need to configure at least these things:

  • DHCPv6 server: serve up the bootfile-url to FlexBoot/iPXE. This is equivalent to the next-server and filename options in DHCPv4.
  • HTTP or TFTP server to serve up the iPXE configuration script.
  • HTTP, TFTP, or NFS to serve up the kernel, ramdisk (initrd), and your kickstart configuration.
  • Bootloader order to do network booting first, then fall through to disk.

For purposes of this post I’m going to assume you already have a working kickstart setup, I won’t go into details of how to set one up from scratch. I will describe what’s needed to an extend an existing kickstart setup to do IPv6 installs with iPXE/FlexBoot over HTTP. Hopefully you can adapt this for your environment.

Most of this is applicable for all sorts of IPv6 PXE installations, not just Mellanox FlexBoot and iPXE based things.

IPv6 networking on the LAN

When you configure your router, layer3 top-of-rack switch, or a box running radvd, you’ll need to configure router advertisements with the “Other” and “Managed” config flags turned on.

Managed” will tell the target host to request its IPv6 address from a DHCPv6 server instead of using SLAAC. In a pinch you can get away with installing CentOS using SLAAC, you just won’t easily know what address it’s going to use compared to a DHCP reservation.

Other” is the important bit, as it tells the target host that (non-address related) configuration information is available from the DHCPv6 server. In the case of Mellanox NICs, they’re going to request the DNS servers, DNS search list, option 59 (boot file URL), and option 60 (boot file parameters, not used here).

Some hosts may or may not support getting DNS servers and search list from a router advertisement (RDNSS and DNSSL). Best bet is to use DHCPv6 for these. You’ll need DHCP anyways to serve up the boot file URL so FlexBoot knows where to fetch its configuration.

DHCPv6 configuration

I still use the ISC DHCP server (someday I’ll switch to the KEA DHCP server, you should too). At a minimum you’ll need to set these options in your /etc/dhcp/dhcpd6.conf configuration, something like this:

option dhcp6.name-servers 2401:beef:11:a53, 2401:beef:11:b53;
option dhcp6.domain-search "wann.net";
option dhcp6.user-class code 15 = string;
option dhcp6.bootfile-url code 59 = string;
option dhcp6.client-arch-type code 61 = array of unsigned integer 16;

if option dhcp6.client-arch-type = 00:07 {
  # Fetch efi shim over tftp if uefi booting
  option dhcp6.bootfile-url "bootx64.efi";
} else if exists dhcp6.user-class and
          substring(option dhcp6.user-class, 2, 4) = "iPXE" {
  option dhcp6.bootfile-url "http://[2401::beef:20::20]/ipxe/ipxe-${net0/mac}.cfg";
}

preferred-lifetime 604800;
option dhcp-renewal-time 3600;
option dhcp-rebinding-time 7200;
allow leasequery;

subnet6 2001:470:d00d:ddff::/64 {
  range6 2001:470:d00d:ddff::f00 2001:470:d00d:ddff::fff;

  # host reservations here
}

In this example if we notice the user-class in the DHCP solicit message is from iPXE, we’ll serve up a bootfile-url with a HTTP URL to an iPXE script. This is the same thing as returning next-server and filename in a DHCPv4 response. Other options configured here means we’ll return DNS servers, the domain search list, and a v6 address from a pool. If the host did a UEFI boot we’ll serve up the standard bootx64.efi shim over TFTP.

The bootfile-url can be anything iPXE supports, such as a HTTP URL or just a filename for TFTP downloading.

(I’m cheating here and using a pool instead of static reservations or SLAAC.)

A rant on DHCP client identifiers for static reservations: the options available in DHCPv6 are a pain in the ass. In v4 land things were simple, you could map the NIC MAC address to an IP address in the dhcp server. For newer versions of iPXE they use “DUID-UUID” (from RFC6355) which shows up as “client-ID type 4”.

DUID-UUID is a terrible choice for servers.

There’s no way to predict what the DUID-UUID will be beforehand, preventing you from pre-populating your inventory databases, nothing that links it to the physical MAC address, and impossible to do static DHCP reservations.

The UUID is generated by magic and does not contain any sort of MAC information you could possibly parse out. I don’t have a good answer on how to get the DUID-UUID ahead of time to configure in your DHCPv6 server configuration and this makes me angry.

Example DHCP6 solicit request from a Mellanox NIC with type 4 client identifier:

IP6 (hlim 255, next-header UDP (17) payload length: 78) fe80::202:c9ff:fe45:2620.546 > ff02::1:2.547: [udp sum ok]
  dhcp6 solicit (xid=fd3d3 (client-ID type 4)
                (IA_NA IAID:4284751403 T1:0 T2:0)
                (option-request DNS-server DNS-search-list opt_59 opt_60)
                (user-class)
                (elapsed-time 0))

DUID-LL or even DUID-LLT (both of which have the NIC’s link-local address) is much better for doing static reservations.

If you have the MAC of your system in your inventory system, you can configure the system->IP address mapping easily. Even though RFC3315 says you “must not”, you can at least parse out the MAC address and have a reasonable MAC->host mapping. DHCP servers are starting to support this even though it’s contrary to the RFCs.

What if you replace the NIC in your server? Well, your inventory database should represent this fact and hold its new MAC address accordingly.

iPXE configuration

The NIC will download an iPXE configuration file and you can do all sorts of scripting inside it. This is pretty powerful and you can do all sorts of clever things such as booting over NFS, iSCSI, AoE, etc, but I’m going to do dead simple CentOS kickstart booting over HTTP.

One thing in particular I do is return a static URL to the host: http://[2401::beef:20::20]/ipxe/ipxe-${net0/mac}.cfg. This way I never have to change my DHCPv6 configuration or have any host-dependent config, it will work for any new host that comes along.

The work happens on the webserver, where it will serve up a file from disk with the host MAC address in the name, e.g.  http://[2401::beef:20::20]/ipxe/ipxe-00:02:c9:45:26:20.cfg. If that file exists, iPXE will start interpreting the output. If it doesn’t exist, iPXE will exit.

(At large scale you’ll very likely want some dynamic script that generates these responses on the fly rather than creating them on disk. You can change up the URL and parameters however you want.)

The contents of the ipxe-*.cfg script looks like this:

#!ipxe
echo ****
echo **** iPXE configuration
echo ****
kernel http://[2401::beef:20::20]/dist/images/centos/7/x86_64/vmlinuz noipv4 ip=dhcp6 console=tty0 console=ttyS1,115200n8 BOOTIF=${net0/mac} biosdevname=0 net.ifnames=0 inst.text inst.selinux=0 inst.sshd inst.ks=http://.../ks.cfg
initrd http://[2401::beef:20::20]/dist/images/centos/7/x86_64/initrd.img 
boot

It’s all standard kernel command line options like you’d have in a pxelinux.cfg config file. Configure URLs to suite your setup and paths to your kernel (vmlinuz) and ramdisk (initrd.img).

I will say make absolutely sure the first line is a hash bang-ipxe and not hash bang-pxe (the “i” is easy to overlook!), nor a empty line at the top. This completely breaks iPXE in a non-obvious way.

Toggle booting OS from disk or doing kickstart install

In the above example, iPXE will always try to fetch the configuration file on boot. If it exists then the host will do a kickstart installation. If it doesn’t exist, iPXE will exist and fall through to the OS on disk (if the boot order is setup correctly).

(Some people like having fancy boot menus where they select an option to install an OS or boot from local disk. I’m not one of those people and consider it a failure if I ever have to touch console on a server, even for installs.)

Another option for enabling/disabling kickstart would be suppressing the DHCP response so that only a host intended to be net-installed would get a DHCP answer, otherwise it falls through to local disk. This is DHCP-server specific, you’re on your own, but it is a great idea.

The end result: actually kickstarting CentOS

When it’s all said and done, it looks like this on console when you do an netboot/install with FlexBoot completely over IPv6 with HTTP:

FlexBoot v3.4.718
FlexBoot http://mellanox.com
Features: DNS HTTP iSCSI TFTP VLAN ELF MBOOT PXE bzImage COMBOOT PXEXT
net0: 00:02:c9:45:26:20
Using ConnectX-3 on PCI04:00.0 (open)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (http://ipxe.org/1a086101)]
Waiting for link-up on net0..... ok
Configuring (net0 00:02:c9:45:26:20)... ok
net0: fe80::202:c9ff:fe45:2620/64
net0: 2001:470:d00d:ddff:202:c9ff:fe45:2620/64 gw fe80::202:c9ff:fe45:2641
net1: fe80::202:c9ff:fe45:2621/64 (inaccessible)
Filename: http://[2401::beef:20::20]/ipxe/ipxe-00:02:c9:45:26:20.cfg
http://[2401::beef:20::20]/ipxe/ipxe-00%3A02%3Ac9%3A45%3A26%3A20.cfg... ok
ipxe-00:02:c9:45:26:20.cfg : 487 bytes [script]
****
**** iPXE configuration
****
http://[2401::beef:20::20]/dist/images/centos/7/x86_64/vmlinuz... ok
http://[2401::beef:20::20]/dist/images/centos/7/x86_64/initrd.img... 99%

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-327.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Thu Nov 19 22:10:57 UTC 2015
[    0.000000] Command line: console=tty0 console=ttyS1,115200n8 biosdevname=0 net.ifnames=0 inst.text inst.selinux=0 inst.sshd inst.ks=http://2401::beef:20::20/ks.cfg
...

Leave a Reply