Feed on
Posts
Comments

mellanox-flexboot-ipv6

If you have Mellanox ConnectX-3 or ConnectX-4 NICs in your servers, I discovered it’s possible to do IPv6 OS installations via PXE. FlexBoot is their on-board PXE implementation that ships on their NICs and it’s based on iPXE. It turns out that as of FlexBoot version 3.4.718 from January 2016 they’ve added beta IPv6 support. If you have a motherboard that doesn’t support UEFI IPv6 PXE, you can configure your system to boot FlexBoot from the expansion ROM instead. This will let you do netboots and OS installations over v6 natively and eliminates the need for chain loading or using PXELINUX.

The catch is that the option is off by default and you must enter the FlexBoot menu during boot (Ctrl-B) to enable it. The v6 beta support addition was mentioned in the FlexBoot release notes, but directions on how to actually enable it was buried in the PreBoot User Manual. sigh.

mellanox-flexboot-release-notes

FlexBoot 3.4.718 release notes

 

FlexBoot (http://mellanox.com) 04:00.0 3D00 PCI3.00 PnP PMM+74250020+74269020 C800
Press Ctrl-B to configure FlexBoot v3.4.718 (PCI 04:00.0)...

mellanox-flexboot-system-setup

FlexBoot system setup screen

 

mellanox-flexboot-net0-setup

FlexBoot net0 settings

 

mellanox-flexboot-ipv6

FlexBoot net0 NIC configuration, IPv4/IPv6 support

The NIC setting supports IPv4, IPv4+IPv6, and IPv6-only configurations. Unfortunately there doesn’t seem to be a way to configure this from an OS to automate turning this option on. The FlexBoot source code is available from Mellanox so maybe you can compile it with v6 support turned on and burn it to NICs. Hopefully after it comes out of beta this option will be on by default.

Note: When both v4 and v6 are configured, it sends out both a DHCPv4 request and a v6 router solicitation at the same time on boot. If you’re v6 only or your v4 DHCP server doesn’t respond it takes 10-20 seconds to eventually time out on v4 and proceed with the v6 address it gets.

That’s the tl;dr version of enabling v6 on the NIC.

Prerequisites

Anyways, for actually kickstart installing CentOS/Fedora/RHEL with this a few things need to happen. There’s a lot of options and ways to do this, depending how much of an existing kickstart server setup you have and how much IPv6-ready infrastructure you have already running.

For example, with iPXE/FlexBoot you can fetch kernels and ramdisks over HTTP, TFTP, or NFS, and can also download scripts to do fancy things before booting or downloading things.

At a minimum, you’ll need to configure at least these things:

  • DHCPv6 server: serve up the bootfile-url to FlexBoot/iPXE. This is equivalent to the next-server and filename options in DHCPv4.
  • HTTP or TFTP server to serve up the iPXE configuration script.
  • HTTP, TFTP, or NFS to serve up the kernel, ramdisk (initrd), and your kickstart configuration.
  • Bootloader order to do network booting first, then fall through to disk.

For purposes of this post I’m going to assume you already have a working kickstart setup, I won’t go into details of how to set one up from scratch. I will describe what’s needed to an extend an existing kickstart setup to do IPv6 installs with iPXE/FlexBoot over HTTP. Hopefully you can adapt this for your environment.

Most of this is applicable for all sorts of IPv6 PXE installations, not just Mellanox FlexBoot and iPXE based things.

IPv6 networking on the LAN

When you configure your router, layer3 top-of-rack switch, or a box running radvd, you’ll need to configure router advertisements with the “Other” and “Managed” config flags turned on.

Managed” will tell the target host to request its IPv6 address from a DHCPv6 server instead of using SLAAC. In a pinch you can get away with installing CentOS using SLAAC, you just won’t easily know what address it’s going to use compared to a DHCP reservation.

Other” is the important bit, as it tells the target host that (non-address related) configuration information is available from the DHCPv6 server. In the case of Mellanox NICs, they’re going to request the DNS servers, DNS search list, option 59 (boot file URL), and option 60 (boot file parameters, not used here).

Some hosts may or may not support getting DNS servers and search list from a router advertisement (RDNSS and DNSSL). Best bet is to use DHCPv6 for these. You’ll need DHCP anyways to serve up the boot file URL so FlexBoot knows where to fetch its configuration.

DHCPv6 configuration

I still use the ISC DHCP server (someday I’ll switch to the KEA DHCP server, you should too). At a minimum you’ll need to set these options in your /etc/dhcp/dhcpd6.conf configuration, something like this:

option dhcp6.name-servers 2401:beef:11:a53, 2401:beef:11:b53;
option dhcp6.domain-search "wann.net";
option dhcp6.user-class code 15 = string;
option dhcp6.bootfile-url code 59 = string;
option dhcp6.client-arch-type code 61 = array of unsigned integer 16;

if option dhcp6.client-arch-type = 00:07 {
  # Fetch efi shim over tftp if uefi booting
  option dhcp6.bootfile-url "bootx64.efi";
} else if exists dhcp6.user-class and
          substring(option dhcp6.user-class, 2, 4) = "iPXE" {
  option dhcp6.bootfile-url "http://[2401::beef:20::20]/ipxe/ipxe-${net0/mac}.cfg";
}

preferred-lifetime 604800;
option dhcp-renewal-time 3600;
option dhcp-rebinding-time 7200;
allow leasequery;

subnet6 2001:470:d00d:ddff::/64 {
  range6 2001:470:d00d:ddff::f00 2001:470:d00d:ddff::fff;

  # host reservations here
}

In this example if we notice the user-class in the DHCP solicit message is from iPXE, we’ll serve up a bootfile-url with a HTTP URL to an iPXE script. This is the same thing as returning next-server and filename in a DHCPv4 response. Other options configured here means we’ll return DNS servers, the domain search list, and a v6 address from a pool. If the host did a UEFI boot we’ll serve up the standard bootx64.efi shim over TFTP.

The bootfile-url can be anything iPXE supports, such as a HTTP URL or just a filename for TFTP downloading.

(I’m cheating here and using a pool instead of static reservations or SLAAC.)

A rant on DHCP client identifiers for static reservations: the options available in DHCPv6 are a pain in the ass. In v4 land things were simple, you could map the NIC MAC address to an IP address in the dhcp server. For newer versions of iPXE they use “DUID-UUID” (from RFC6355) which shows up as “client-ID type 4”.

DUID-UUID is a terrible choice for servers.

There’s no way to predict what the DUID-UUID will be beforehand, preventing you from pre-populating your inventory databases, nothing that links it to the physical MAC address, and impossible to do static DHCP reservations.

The UUID is generated by magic and does not contain any sort of MAC information you could possibly parse out. I don’t have a good answer on how to get the DUID-UUID ahead of time to configure in your DHCPv6 server configuration and this makes me angry.

Example DHCP6 solicit request from a Mellanox NIC with type 4 client identifier:

IP6 (hlim 255, next-header UDP (17) payload length: 78) fe80::202:c9ff:fe45:2620.546 > ff02::1:2.547: [udp sum ok]
  dhcp6 solicit (xid=fd3d3 (client-ID type 4)
                (IA_NA IAID:4284751403 T1:0 T2:0)
                (option-request DNS-server DNS-search-list opt_59 opt_60)
                (user-class)
                (elapsed-time 0))

DUID-LL or even DUID-LLT (both of which have the NIC’s link-local address) is much better for doing static reservations.

If you have the MAC of your system in your inventory system, you can configure the system->IP address mapping easily. Even though RFC3315 says you “must not”, you can at least parse out the MAC address and have a reasonable MAC->host mapping. DHCP servers are starting to support this even though it’s contrary to the RFCs.

What if you replace the NIC in your server? Well, your inventory database should represent this fact and hold its new MAC address accordingly.

iPXE configuration

The NIC will download an iPXE configuration file and you can do all sorts of scripting inside it. This is pretty powerful and you can do all sorts of clever things such as booting over NFS, iSCSI, AoE, etc, but I’m going to do dead simple CentOS kickstart booting over HTTP.

One thing in particular I do is return a static URL to the host: http://[2401::beef:20::20]/ipxe/ipxe-${net0/mac}.cfg. This way I never have to change my DHCPv6 configuration or have any host-dependent config, it will work for any new host that comes along.

The work happens on the webserver, where it will serve up a file from disk with the host MAC address in the name, e.g.  http://[2401::beef:20::20]/ipxe/ipxe-00:02:c9:45:26:20.cfg. If that file exists, iPXE will start interpreting the output. If it doesn’t exist, iPXE will exit.

(At large scale you’ll very likely want some dynamic script that generates these responses on the fly rather than creating them on disk. You can change up the URL and parameters however you want.)

The contents of the ipxe-*.cfg script looks like this:

#!ipxe
echo ****
echo **** iPXE configuration
echo ****
kernel http://[2401::beef:20::20]/dist/images/centos/7/x86_64/vmlinuz noipv4 ip=dhcp6 console=tty0 console=ttyS1,115200n8 BOOTIF=${net0/mac} biosdevname=0 net.ifnames=0 inst.text inst.selinux=0 inst.sshd inst.ks=http://.../ks.cfg
initrd http://[2401::beef:20::20]/dist/images/centos/7/x86_64/initrd.img 
boot

It’s all standard kernel command line options like you’d have in a pxelinux.cfg config file. Configure URLs to suite your setup and paths to your kernel (vmlinuz) and ramdisk (initrd.img).

I will say make absolutely sure the first line is a hash bang-ipxe and not hash bang-pxe (the “i” is easy to overlook!), nor a empty line at the top. This completely breaks iPXE in a non-obvious way.

Toggle booting OS from disk or doing kickstart install

In the above example, iPXE will always try to fetch the configuration file on boot. If it exists then the host will do a kickstart installation. If it doesn’t exist, iPXE will exist and fall through to the OS on disk (if the boot order is setup correctly).

(Some people like having fancy boot menus where they select an option to install an OS or boot from local disk. I’m not one of those people and consider it a failure if I ever have to touch console on a server, even for installs.)

Another option for enabling/disabling kickstart would be suppressing the DHCP response so that only a host intended to be net-installed would get a DHCP answer, otherwise it falls through to local disk. This is DHCP-server specific, you’re on your own, but it is a great idea.

The end result: actually kickstarting CentOS

When it’s all said and done, it looks like this on console when you do an netboot/install with FlexBoot completely over IPv6 with HTTP:

FlexBoot v3.4.718
FlexBoot http://mellanox.com
Features: DNS HTTP iSCSI TFTP VLAN ELF MBOOT PXE bzImage COMBOOT PXEXT
net0: 00:02:c9:45:26:20
Using ConnectX-3 on PCI04:00.0 (open)
  [Link:down, TX:0 TXE:0 RX:0 RXE:0]
  [Link status: Unknown (http://ipxe.org/1a086101)]
Waiting for link-up on net0..... ok
Configuring (net0 00:02:c9:45:26:20)... ok
net0: fe80::202:c9ff:fe45:2620/64
net0: 2001:470:d00d:ddff:202:c9ff:fe45:2620/64 gw fe80::202:c9ff:fe45:2641
net1: fe80::202:c9ff:fe45:2621/64 (inaccessible)
Filename: http://[2401::beef:20::20]/ipxe/ipxe-00:02:c9:45:26:20.cfg
http://[2401::beef:20::20]/ipxe/ipxe-00%3A02%3Ac9%3A45%3A26%3A20.cfg... ok
ipxe-00:02:c9:45:26:20.cfg : 487 bytes [script]
****
**** iPXE configuration
****
http://[2401::beef:20::20]/dist/images/centos/7/x86_64/vmlinuz... ok
http://[2401::beef:20::20]/dist/images/centos/7/x86_64/initrd.img... 99%

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.10.0-327.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) #1 SMP Thu Nov 19 22:10:57 UTC 2015
[    0.000000] Command line: console=tty0 console=ttyS1,115200n8 biosdevname=0 net.ifnames=0 inst.text inst.selinux=0 inst.sshd inst.ks=http://2401::beef:20::20/ks.cfg
...

asrock-aptio-uefi

Wooo UEFI IPv6 PXE

I wanted a new Avoton motherboard for my OpenIndiana home NAS with lots of on-board SATA so I could use the PCIe slot for a 10-gigabit NIC. I needed seven SATA ports, six for the data disks and one for the OS drive. The standard mini-ITX configuration seems to max out at six, and I would’ve settled for six plus an on-board M.2 socket but these don’t seem to exist. I ran across ASRock’s C2550D4I board which has 12 on-board SATA via a combination of a couple extra Marvel SATA chips. It’s a little overboard but is as close as I could get.

The out of the box experience wasn’t that great. For whatever reason the VGA port wouldn’t work until I jiggled the connector. My USB keyboard didn’t work after boot until I unplugged it and plugged it right back in immediately after POST. Then after the system came up one of the Marvel chips didn’t come online or something which marked four disks offline and made my zpool sad.

On the second boot zpool was happy but VGA and USB was still touchy. Upgrading the BMC and UEFI firmware seemed to help with VGA, but keyboard was still not reliable. Other than that the system has been up and solid now. I got serial-over-LAN support working for POST and GRUB, but for the life of me I can’t serial console to work in OpenIndiana/illumos on ttya, ttyb, ttyc. poop.

The good thing that surprised me about this board is that it supports UEFI IPv6 PXE. Not even my SuperMicro Avoton or my brand new Xeon-D board do this. This would let a person completely install the system on a v6-only network. So I guess if you care about IPv6, buy ASRock boards for now.

edit: this seems to be because the ASRock board uses the Intel i210 Ethernet controller instead of the interfaces built into the SoCs. I suspect the SoC for both the Avoton C2550 and Xeon D-1520s don’t have UEFI drivers for the interfaces that support this.

Stop doing this!

 

The Anaconda 19 (now v21) installer in RHEL 7 / CentOS 7 is a great improvement over Anaconda 13 that was used in CentOS 6. Among other fixes it was completely overhauled along the way. One thing lacking in CentOS 6 was the ability to perform an automatic kickstart installation over an IPv6-only network. Some bits within the kickstart configuration may have been fetched over v6 but the whole process from kernel boot to completion needed to be dual-stacked. For example, the ipv6 kernel module wasn’t loaded before Anaconda’s loader tried to download install.img, so it failed on a v6-only network. However later in stage2 we could do things like download from v6 package repos.

Installing completely over v6 is now possible in CentOS 7 although it takes some tweaks to be reliable in the datacenter.

A few assumptions and preconditions for this post:

  • Servers need static IP addresses, for DNS mapping and service discovery. SLAAC is out because if the NIC is swapped out for a repair, the address changes.
  • During PXE boot a server is given its IP address from a DHCPv6 server. This could be a pool or static reservations. I prefer the latter.
  • Depending on the host to get a IPv6 default gateway from the LAN from the router (or top-of-rack L3 switch).
  • Fetching the kernel and initrd over TFTP or HTTP. I install packages over HTTP, but you’re an NFS shop and it supports v6 it should work.
  • Most of this is applicable for Fedora too for things later than v13 and of course RHEL7.
  • Your package repos, nameservers, kickstart server and other resources used during kickstart are accessible over v6, either singly or dual-stacked.
  • Only access to the system is serial console and SSH only, as that’s what real datacenters have. No graphical UI, no VNC, no KVM, no VGA.

The main problems I encountered with Dracut and Anaconda doing IPv6-only installs was due to race conditions of bringing up the NIC and proceeding to download things before it had full routing. It can take a second or two for IPv6 neighbor discovery protocol to do its thing, do duplicate address detection (DAD), and learn the v6 gateway from a router advertisement. This would frequently cause fetching the kickstart configuration to fail, cause package repos to be marked as unusable, or die mid-installation.

Fortunately there are mainly four key problems to watch out for and they’re easily hacked around with some simple shell script. Once they’re addressed you’ll have no problem installing CentOS over a IPv6-only network. It does involve rebuilding both the initrd ramdisk and stage2 squashfs image until things are fixed upstream.

The even better news is that at least two of the fixes for this have been either accepted or merged upstream so hopefully this post will eventually be obsolete!

Kernel and initrd

The PXE specification itself is IPv4 only. The usual PXELINUX typically used for kickstart isn’t an option here, even newer versions with the lwip stack, because it’s still v4-only. To download the kernel and initial RAMdisk over a v6 network you’re left with things like UEFI IPv6 PXE (which not many boards support yet), iPXE, chain-loading v4 PXE -> v6 iPXE or better yet, GRUB2 with recently added IPv6 support. Hopefully in the future I’ll post more about the options here.

Whichever network boot program you use, in CentOS 7 the kernel command line options you’d normally give to the NBP to pass on to the stage1 ramdisk have changed syntax a bit and some options have been deprecated since CentOS 6.

For example, if you want to statically configure an IP address, prefix, and name servers on the kernel command line, they’d look something like this:

noipv4 ip=[2401:beef:11::31:0]:::64:::none nameserver=2401:beef:11:a53 nameserver=2401:beef:11:b53 \
  net.ifnames=0 biosdevname=0 inst.ssh inst.ks=http://2401:beef:20::20/ks.cfg

This tells dracut to boot with a static IP address, a /64 prefix, disable any DHCPv4/DHCPv6 requests, no SLAAC, and use the two nameservers for resolving hostnames, respectively, and the location of the kickstart configuration. It also disables the new persistent Ethernet naming scheme. I personally don’t like having NICs named “enp0s1”, “eth0” is fine and easier to script with. Also “inst.ssh” enables SSH in Anaconda so you can ssh to the system while it’s installing.

This command line config is only used during installation and is completely independent to the network configuration you specify in the Anaconda kickstart config file which gets configured to the target system.

I strongly prefer performing installations with a static IP address (e.g. dhcp with a reservation). It’s very handy when you have processes on a build server that want to log in to interrogate the build process, if you need to SSH in to see why things broke, or to correlate things like logfiles.

Stage1 (initrd.img) and downloads

The kernel and initrd get fetched over the network and loaded into memory. Systemd kicks off Dracut which does a bunch of prerequisite steps for preparing to start the installer. Because we’re doing a network install, Dracut initializes the NIC and begins downloading the kickstart and stage2 image which contains the Anaconda installer … or so you think.

Gotcha #1: In reality what happens is the NIC gets configured with an IPv6 address and because the NIC is now considered “online” Dracut scripts proceeds immediately to download the kickstart config via the hook script 11-fetch-kickstart-net.sh. The problem is we very likely haven’t had time to do DAD or discover our v6 default gateway. This causes Dracut to fail and eventually drop to an emergency shell. The telltale sign of this problem is the “Network is unreachable” error from curl:

dracut-initqueue[1219]: curl: (7) Failed to connect to 2401:beef:20::20: Network is unreachable
dracut-initqueue[1219]: Warning: failed to fetch kickstart from http://2401::beef:20::20/ks.cfg

My fix for this has to be add a Dracut hook script named 10-network-sleep-fix.sh into the initrd that does nothing but literally sleep 4 seconds. I also print out the default gateway as a debugging aid.

/usr/lib/dracut/hooks/initqueue/online/10-network-sleep-fix.sh:

#!/bin/bash
# 10-network-sleep-fix.sh
#
# Goes in /usr/lib/dracut/hooks/initqueue/online/10-network-sleep-fix.sh
# Sleep four seconds after bringing up NIC to give time to get a v6 gateway
# Print to both stdout (for journal) and stderr (for console)

echo "*****" 1>&2
echo 1>&2
echo "****: hack: sleeping 4 seconds to ensure network is usable before" \
 "fetching kickstart+stage2" 1>&2

sleep 4

echo "**** Our v6 default gateway is: $(ip -6 route show | grep default)" 1>&2
echo "****" 1>&2
echo "****" 1>&2

This will cause the Dracut hooks to pause long enough to get us a default gateway. From my experience < 4 seconds was too short; four seems to be the sweet spot between constant success and not excessively stringing out the boot process.

Upstream bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1292623

Anaconda / stage2 (squashfs.img)

Ok! Dracut has downloaded our LiveOS stage2 image (e.g. os/x86_64/LiveOS/squashfs.img) which contains all of the Anaconda installer code and performed a pivot-root over to it. Here’s an example of a couple of directives in a kickstart configuration to statically define our IPv6 addresses:

timezone America/Los_Angeles --isUtc --ntpservers=2401:beef:20::a123,2401:beef:20::b123
network --hostname=newbox.wann.net --bootproto=static --ipv6=2001:40:8022:1::11 \
  --nameserver=2401:beef:20::a53,2001:beef:20::b53 --activate

If you have a need to install a dual-stacked server with v4 and v6, you can still specify –ip, –netmask, and –gateway options in addition to the v6 options.

The new systemd instance in stage2 will start up a nice tmux session (if you’re on console) and start Anaconda. Anaconda will then start up NetworkManager to re-initialize our NICs to take control for purposes of the install.

Here’s where we’ll hit a few more problems that’ll cause us to fail.

Gotcha #2: Dracut will write out its network configuration to /etc/sysconfig/network-scripts/ifcfg-eth0 with an empty “IPADDR=” line and “BOOTPROTO=static“. This causes NetworkManager to think there’s a v4 configuration to import as well as to treat our currently “UP” NIC as another connection. Rather than use the existing UUID from ifcfg-eth0, NetworkManager creates a second connection with a new UUID. Anaconda then dies because it can’t find the original UUID.

This throws an Anaconda “SettingsNotFoundError” traceback that looks something like this:

Traceback (most recent call first):
  File "/usr/lib64/python2.7/site-packages/pyanaconda/nm.py", line 707, in nm_activate_device_connection
    raise SettingsNotFoundError(con_uuid)
  File "/usr/lib64/python2.7/site-packages/pyanaconda/network.py", line 1209, in apply_kickstart
    nm.nm_activate_device_connection(dev_name, con_uuid)
 ...
SettingsNotFoundError: SettingsNotFoundError('5cce9753-76ff-1f2e-8e09-918a15d4229d',)

Fortunately this has been fixed upstream in NetworkManager 1.0, but this hasn’t been backported to CentOS 7 yet: https://mail.gnome.org/archives/networkmanager-list/2015-October/msg00015.html

Until that’s done, the fix here is to create a really simple systemd service that executes before Anaconda loads (“anaconda.target”) that seds out the BOOTPROTO line. Drop these two files into the stage2 image

fix-ipv6.service:
# Goes into /usr/lib/systemd/system/fix-ipv6.service and
# /etc/systemd/system/basic.target.wants/fix-ipv6.service is a symlink
# to this script.
#

[Unit]
Description=IPv6-only ifcfg-eth0 hack
Before=NetworkManager.service

[Service]
Type=oneshot
ExecStart=/etc/sysconfig/network-scripts/fix-ipv6-only.sh

fix-ipv6-only.sh:
#!/bin/bash
#
# Goes into /etc/sysconfig/network-scripts/fix-ipv6-only.sh
#
# If we have no IPADDR= set (v6-only), remove BOOTPROTO so
# NetworkManager will parse the config file correctly
#
if [[ $(grep ^IPADDR=$ /etc/sysconfig/network-scripts/ifcfg-eth0) ]]; then
  echo "XXX hack: no IPv4 addr set (IPADDR=) in ifcfg-eth0, fixing for v6-only"
  sed -i '/BOOTPROTO=.*/d' /etc/sysconfig/network-scripts/ifcfg-eth0
fi

Now Anaconda and NetworkManager can properly find the NICs to try to begin the installation. Between %pre scripts and package downloads will be areas with the final two gotchas to hack around.

Gotcha #3: NetworkManager will reinitialize our NIC and cause us to lose our v6 default gateway momentarily. This causes a race condition because while NM is finishing bringing up the NIC to a fully CONNECTED_GLOBAL state, Anaconda immediately starts trying to download package repo metadata (“.treeinfo“). If the package repos are not on the same LAN as the host you’re installing, you will likely fail here. Because .treeinfo fails to download, Anaconda will mark the repository as unusable. This results in a “software selection failure” on console.

3) [!] Software selection (Installation source not set up)
4) [!] Installation source (Error setting up software source)

There’s not a perfect fix here as it becomes tricky to know what connected state we need to be in before proceeding. A good compromise has been to add retry logic to the .treeinfo portion of Anaconda. There’s already retry code in Anaconda for downloading individual packages. I replicated this within packaging/__init__.py to handle retrying .treeinfo until we have working routing.

This is enough to start getting kickstart to execute post scripts and maybe install a few packages but we’re not out of the woods yet.

Upstream bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1292613. My patch was accepted but hasn’t been merged in yet.

Gotcha #4: NetworkManager doesn’t support both static IPv6 addressing and dynamic route selection. Particularly if the Anaconda installer environment is running with a static v6 address and no v6 gateway is specified on the kernel command line, NetworkManager sets the sysctls “net.ipv6.conf.eth0.accept_ra” and “accept_ra_defrtr” to 0. This slams the door shut on learning a gateway via router advertisements. If a default gateway was learned prior to these sysctls being disabled, things like package downloads may work for a short period of time until the TTL expires or it gets flushed.

The work around for this is a flat out hack. At the top of my common %pre script I have a background while loop that does nothing but set these values to 1 over and over again. This makes up for the shortcoming in NetworkManager and immediately re-sets accept_ra and accept_ra_defrtr sysctls to enable learning a gateway via router advertisement. It looks something like this:

%pre
...
(
  for i in {1..300}; do
    date
    sysctl -w net.ipv6.conf.eth0.accept_ra=1
    sysctl -w net.ipv6.conf.eth0.accept_ra_defrtr=1
    sleep 1
  done

) > /tmp/networkmanager-hack.log 2>&1 &
...

This will run in the background for five minutes, allowing for any lengthy pre-script operations to happen (e.g. RAID setup) in the interim. It redirects all of its output to a log file so it doesn’t pollute preinstall.log

Upstream bug report: https://bugzilla.gnome.org/show_bug.cgi?id=747814

Fin

And that’s it. With these four fixes you can completely install CentOS 7 over an IPv6-only network. I’ve submitted bug reports upstream and have been working to get these issues resolved so people in the future can install over v6 out of the box.

Addendum: TL;DR for rebuilding initrd and squashfs.img

initrd.img

This image is usually a gzip- or xz-compressed archive. Uncompress it and extract it to a directory with cpio. Rebuilding it is a matter of rebuilding the archive with cpio and re-compressing it with gzip or xz.

# cp $somewhere/initrd.img /tmp ; cd /tmp
# mkdir init.fs ; cd init.fs
## Extracts contents of initrd.img to init.fs directory
# xz -dc ../initrd.img | cpio -vid
 OR
# gzip -dc ../initrd.img | cpio -vid
## hack hack hack
# find . | cpio -o -H newc | gzip -9 > ../initrd-new.img

Pro tip: you don’t have to keep the initrd.img name, you can call it whatever you want. If you make changes that are different than the image distributed by upstream, this is a good idea. Just remember to update the filename in your PXE or GRUB configuration used for kickstart.

squashfs.img

The LiveOS squashfs image is a squash filesystem with an ext4 sparse image inside it. This means you can’t just mount the squashfs.img, make modifications and unmount it. You must mount it, make a copy of the rootfs.img within, make changes to the copied rootfs.img and create a new squashfs image.

# cp $somewhere/os/x86_64/LiveOS/squashfs.img /tmp ; cd /tmp
# mkdir rootfs-img squashfs-img LiveOS
# mount -o loop squashfs.img squashfs-img
# cp squashfs-img/LiveOS/rootfs.img .
# mount -o loop rootfs.img rootfs-img
# cd rootfs-img
## hack hack hack
# cd /tmp
# umount rootfs-img
# cp rootfs.img LiveOS/rootfs.img
# mksquashfs LiveOS squashfs-new.img -comp xz -keep-as-directory

Pro tip: again you can keep your modified squashfs.img in a separate location than the one that came with the distribution. The twist here is that your squashfs.img must be in a subdirectory named LiveOS. This directory can live wherever you want, e.g. http://buildserver/centos/7/LOLCATS/LiveOS/squashfs.img. On the kernel command line for PXE or GRUB, you’ll need to specify the inst.stage2 directive pointing at the directory that contains LiveOS, e.g. inst.stage2=http://buildserver/centos/7/LOLCATS/.

In practice I keep each new squashfs image in a directory named with a release number such as “/7.x/7.2r5/LiveOS” so I can make changes to incremental changes to Anaconda and keep them organized.

ikeacluster updates

I’ve made a few updates and overhauls to ikeacluster over the last year. Now the cluster is its own layer 3 rack running BGP to my home network, neater cabling, more bandwidth and uses some new Avoton motherboards.

https://binaryfury.wann.net/ikeacluster/#updates

Equipment teardowns

A quick list of equipment tear downs and photographs I’ve done for people who are interested in the internals:

unifi-ssl-header

For the longest time I wondered why Chrome would never save the password of my Ubiquiti UniFi controllers’ web interfaces. It turns out because the UniFi controller software ships out of the box with a self-signed SSL/TLS cert that’s untrusted, Chrome does a smart thing and won’t prompt you to save the password for the page.

The way to fix this (and otherwise let your browser trust the web UI) is to install your own certificate into the UniFi controller key store, either buy one from a commercial CA or your own organization’s CA that your browsers trust. Unfortunately it looks like there’s a lot of confusion on how to do this, even on Ubiquiti’s help pages there are articles that are titled for UniFi but are really geared for EdgeMAX products.

However you wind up with a trusted cert to import, here’s how to do it for UniFi controllers running on Mac OS X or Linux. Really both are the same procedure since UniFi uses Java’s keystore under the hood on both platforms, the paths for files are different. Ideally you’ll want to script this or set up something like a Chef recipe to manage the files for you, because you’ll need to repeat this whenever the cert expires or the key gets compromised.

Assumptions:

  • You have a way to get a trusted certificate to upload. How to self sign or setting up a CA is not covered here.
  • The UniFi (Java) keystore expects to import certificates in DER (binary) format. If you have certificates in PEM (which is really BASE64/ ASCII armored DER), you’ll need to use openssl or something to convert the PEM files to DER.
  • You’ll need to use the UniFi tool to generate certificate requests (CSRs). I didn’t put any time into looking at importing completely new private keys into the keystore, I just signed the CSR that was generated.
  • I run my own wann.net certificate authority (CA) for issuing certs for all of my devices. My browsers on all of my laptops and phones already trust this.
  • You know how to work with the CLI

Mac OS X

Java (JAR) contents of the UniFi controller are installed to /Applications/UniFi.app/Contents/Resources/. The lib/ace.jar is a UniFi-provided Java tool to manipulate the controller and key store.

Certificate requests and the keystore is stored in the data/ subdirectory which is a symlink to a user’s Library/Application Support directory, e.g. /Users/bwann/Library/Application Support/UniFi/data. (This is what preserves controller data between installs.) You must be in the main Resources/ directory before you can work with the UniFi keystore, else the tool gets unhappy with paths and can’t find things.

Generating a certificate request from the UniFi controller

The UniFi controller can generate a CSR for you, and it’ll keep the corresponding key in the local keystore.

# cd /Applications/UniFi.app/Contents/Resources/
# java -jar lib/ace.jar new_cert unifi.wann.net wann.net Fremont CA US
Certificate for unifi.wann.net generated

You should now have CSRs in PEM and DER format in the data/ directory:

# ls -l data/unifi*
-rw-r--r-- 1 root staff 708 Dec 31 14:42 data/unifi_certificate.csr.der
-rw-r--r-- 1 root staff 1042 Dec 31 14:42 data/unifi_certificate.csr.pem
#

Take the CSR (whichever format you prefer) and sign it with your CA.

Converting PEM certificates to DER

If you’re running your own CA, you’ll need to convert your CA’s public root key to DER format too in order to import it. In my case I always work with PEM certificates, so I need to convert both my newly signed certificate and root certificate:

# openssl x509 -outform der -in data/wannnet-ca-current-cert.pem -out data/wannnet-ca-current-cert.der
# openssl x509 -outform der -in data/unifi_certificate.cert.pem -out data/unifi_certificate.cert.der

You can store both of these DER files in the data/ directory.

Importing the certificates

Use the import_cert argument to ace.jar to import both the root CA and host certificate:

# java -jar lib/ace.jar import_cert data/unifi_certificate-cert.der data/wannnet-ca-current-cert.der
parse wannnet-ca-current-cert.der (DER, 1 certs): EMAILADDRESS=pk@wann.net, OU=wann.net CA, O=wann.net, L=Fremont, ST=California, C=US
parse unifi_certificate-cert.der (DER, 1 certs): CN=unifi.wann.net
Importing signed cert[unifi.wann.net]
... issued by [EMAILADDRESS=pk@wann.net, OU=wann.net CA, O=wann.net, L=Fremont, ST=California, C=US]
Certificates successfuly imported. Please restart the UniFi Controller.

Restart the UniFi controller. Done!

Linux (CentOS/Debian)

Basically the exact same process on CentOS/Debian/Ubuntu, except the paths to UniFi data is different. On at least Ubuntu the main binaries of the controller are installed to /usr/lib/unifi/, with /usr/lib/unifi/data/ being a symlink to /var/lib/unifi/.

Generating a certificate request from the UniFi controller

# cd /usr/lib/unifi
# java -jar lib/ace.jar new_cert unifi.wann.net wann.net Fremont CA US
 Certificate for unifi.wann.net generated
You should now have CSRs in PEM and DER format in the data/ directory:
# ls -l data/unifi*
-rw-r--r-- 1 root root 712 Dec 31 18:04 data/unifi_certificate.csr.der
-rw-r--r-- 1 root root 1050 Dec 31 18:04 data/unifi_certificate.csr.pem

Take the CSR (whichever format you prefer) and sign it with your CA.

Converting PEM certificates to DER

Follow the exact same steps in the OS X section to use openssl to convert from PEM to DER if necessary.

Importing the certificates

Use the import_cert argument to ace.jar to import both the root CA and host certificate:

# java -jar lib/ace.jar import_cert data/unifi_certificate-cert.der data/wannnet-ca-current-cert.der
 parse wannnet-ca-current-cert.der (DER, 1 certs): EMAILADDRESS=pk@wann.net, OU=wann.net CA, O=wann.net, L=Fremont, ST=California, C=US
 parse unifi_certificate-cert.der (DER, 1 certs): CN=unifi.wann.net
 Importing signed cert[unifi.wann.net]
 ... issued by [EMAILADDRESS=pk@wann.net, OU=wann.net CA, O=wann.net, L=Fremont, ST=California, C=US]
 Certificates successfuly imported. Please restart the UniFi Controller.

Restart the UniFi controller

service unifi restart

Done!

Using keytool

There’s a Java utility called keytool usually on your system you can use to view or work with the key store stored by the UniFi controller. For sake of compatibility and time I elected to use the import function of lib/ace.jar, but for the #yolo crowd you can play with this to make modifications to the keystore directly.

For example,  to list which certificates are in the key store file (by default there’s no keystore password):

keytool -list -keystore data/keystore

Verbose listing with certificate details:

keytool -list -v -keystore data/keystore

Speed!

My old dedicated server is quite old and ass slow. I finally got around to moving my website elsewhere running Nginx+HHVM, and now it’s tolerable once again! I can finally enforce 100% https without killing the CPU.

June rocks

June was such an exciting month and the good news over summer keeps getting better:

  • People finally wise up and take down the Confederate flag (the “rebel flag” where I grew up)
  • The US Supreme Court allows gay marriage (whether or not they hijacked diplomacy is another thing)
  • ESA’s Philae comet lander makes contact with Rosetta after several months of hibernation
  • The Oklahoma Supreme Court says the 10 Commandments monument at the state capitol must come down
  • ARIN basically ran out of IPv4 addresses at the end of June, only /23 and /24s remain
  • 50% of Xfinity/Comcast Facebook users now reach Facebook over IPv6 (tell me again IPv6 will never take off?)
  • The Internet largely absorbed the 2015 leap second without dying
  • My annual physical says I’m not dying, but need to get more exercise
  • I scheduled vacation at the end of July

Operations reading list

I love books. These days I buy most of my books for Kindle, but I still buy paper for books I really like and want to keep around. Tech books are notorious for being obsolete a couple of years after printing, but there are still several timeless books I use for reference and would recommend for anyone in UNIX/Linux systems engineering or networking, new or jaded veteran. Some are older than others, but here’s a few that have served me well:

Systems:

If you deal with the internet you must have a very solid understanding of the protocols involved, from ARP to TCP. By the time you’ve been in the industry for several years, you’ll encounter problems with every part of the stack covered by this book, along with lower levels such as Ethernet. tcpdump and other packet sniffers will be your best friend and you should use them liberally. My first edition of this book only covers IPv4 but the second edition covers IPv6 now, which you should be using!

(a/k/a “APUE”) The Internet is built on UNIX and C. This is more of a reference book rather than one you’d sit down and read, but I enjoy reading random bits when I’m curious or want more background on something. The book covers a lot detail of how the UNIX userland environment works with the kernel, giving snippets of C code to show exactly how something like syscalls are implemented under the hood. Ever ran strace and wondered what open(), write(), mkdir(), bind(), connect(), fork(), SIGUSR1 are? This book will show you in simple C code what’s going on.

Three recent additions this past year:

APUE was geared at a general System V / BSD UNIX audience. This book is very similar to APUE, but geared toward a Linux audience. It goes into the same level of detail and explaining things in C code as APUE. It’s a huge book coming in at 1,500+ pages so make room on your bookshelf for it.

Brendan has given many talks and authored several pieces on systems performance, benchmarking, and really digging in deep to troubleshooting bottlenecks. He authored DTrace and if you’ve ever seen the interesting “guy screaming at hard drives” (which shows effect of vibration on disk latency) video on YouTube, that’s him. You can’t change something if you can’t measure it, and this book explains how to get valid data to analyze performance of applications, CPU, memory, disks, kernels and networking. It also covers applications in a cloud environment and gives good insight on how virtualized kernels or system calls can impact performance.

In particular, I really like this book because it covers things from both a Linux and Solaris kernel perspective. I’ve used both over my career and while my Solaris is rusty this gives useful comparisons to get me through problems. I’ve heard Brendan speak a couple of times and his slides (and talk) from SCaLE 11x on Linux Performance Analysis are a great read. There are some very useful illustrations that show which tool to use for the job, e.g. in troubleshooting an issue do I use perf? iostat? sar? tcpdump? netstat? nicstat? strace?

I first ran across Sherri Davidoff by listening to her talk at DEF CON 21 about the do-it-yourself cellular sniffer^W IDS and later found her book. Most systems people are blissfully ignorant beyond the Ethernet interface of their servers. This doesn’t cut it anymore in a land of distributed systems, so you need to understand how to troubleshoot issues on the network too. This book is primarily written for doing forensic analysis and gathering evidence of events for an investigation, but there are still a lot of parallels in troubleshooting a production environment. Some of the same techniques for carefully collecting evidence and gathering logs are fantastic for writing up a root cause analysis, so some bad thing doesn’t happen again[tm].

I like this book because it covers traffic and packet analysis, a TL;DR of network protocols in real life, and the various network devices that data can flow through. This is the only practical book I’ve read that explains why you’d want to do flow analysis (e.g. NetFlow, sflow) to detect problems or see application activity, along with examples of using nfdump/NfSen. It covers intrusion detection, snort, switches, routers, firewalls, logging, tunneling, all good stuff.

Networks:

In a previous life I was dedicated to network engineering in a managed hosting environment for a few years with lots of snowflake customers. I touched a wide swath of different types of gear from multiple vendors, hardware load balancers, VPNs, firewalls, L2/L3 switches, routers, huge L2 domains with hundreds of VLANs. Enough to do the job, but not a master at any. I caused my share of outages with spanning tree before I got a real grasp of what was going on. These books are a bit dated since Cisco and IOS isn’t as dominate as it once was (thank god), but they still have useful network stuff that transfers to other platforms.

My go-to book for Cisco firewalls back in the day. I dealt a lot with all three platforms and it was often quicker to just grab this book than dig around on Cisco’s website for configuration examples. My book is all marked up with notes and bookmarks for packet processing order of operations, NAT and SNAT configuration, failover pairs, and logging. It was good because it usually gave the equivalent PIX, ASA, and FWSM (Cat 6500 Firewall Service Module) commands together when explaining how to configure something.

Oddly absent from this book was a treatment of VPNs, there’s barely any mention of IPsec. I have the companion book “The Complete Cisco VPN Configuration Guide” but was disappointed at its coverage of IPsec and SSL/DTLS VPNs, especially when it came to troubleshooting on firewalls. A good hunk of the book is centered around the Cisco VPN 3000 Concentrator which is way obsolete now.

This was my savior in learning the guts of layer 2 Ethernet and spanning tree in its various flavors. STP, PVST+, Rapid STP, MST, BPDUs, STP roots, enough trees to make a forest. Then there’s VLANs, VLAN trunking, 802.1q tagging, Q-in-Q, private VLANs and multicast. Then it goes into covering CatOS and IOS on the beloved, trusty workhorse of the 2000s, the Catalyst 6500 series of switches. I never did get that CCNP.

  • Designing Content Switching Solutions, by Naseh and Kahn

This book is positively dated now, but if you find yourself still managing an ancient Cisco load balancer (e.g. CSS 11501, CSM for 6500, or firewall load balancing), this is your book. Beyond this it gets into HTTP/RTSP/streaming/layer 7 load balancing, SSL offloading and global load balancing. Now that I think about it, don’t buy this book. Offloading SSL to a hardware load balancer is a terrible thing you don’t want to do. Your farm of Intel Xeons can handle the crypto overhead much better than a puny RISC processor from 2001. The world is much better now and standard Linux servers are the new load balancer.

It’s a classic that practically everyone in the 1990s learned BGP from. Heck, it even includes a CIDR conversion table in the front flap and explains what NAPS were. Nevertheless, it explains various scenarios and topologies where you’d use BGP internally and externally, and how the protocol behaves to control routes. The world has moved to running MPLS within the backbone, but BGP is still alive and kicking on the edges. In fact at work we use BGP right down to the rack switch and inject VIPs onto the network via BGP.

Notable mentions:

Sometimes I just want to read a book with Kindle on an airplane or at breakfast, because I’m that kind of guy.

I hate and love Kerberos, mainly because I was clueless and tossed into the deep end to support it. I want to love it more because distributed authentication and authorization are super useful in a large systems environment and I don’t know how I’d live without it now, so I bought this to read. So far it doesn’t disappoint in how to setup Kerberos realms, KDCs, slaves, and all that fun stuff.

I don’t put on my DBA hat very often and usually touch MySQL seldom enough I have to go remember how to set up replication. If I supported it again, this would probably be the book I’d be reading.

With EdgeOS 1.6 for the EdgeRouter line, Ubiquiti upgraded the Debian distribution from squeeze to wheezy. Along with a more modern 3.10 kernel this gets us a newer version of Ruby too, 1.9.1. During upgrades the system is blown away so I lost my Chef client. I had problems re-bootstrapping my routers until I finally realized that /etc/apt/sources.list still pointed at squeeze repos. I asked Ubiquiti about this and they say it’s intentional that you have to update sources.list to fetch from the new repo.

How to fail
Stepping back in time before I figured this out, this is what transpired.

When I would try to build gems things went sideways; the running system was wheezy but it was trying to install packages from the squeeze distribution. As such there were a lot of version conflicts and packages just refused to install. For the record, these are the sort of errors I was running into with a repo mismatch (mainly around libc6 and libc6-dev when trying to install ruby):

root@gw2:/home/ubnt# apt-get install ruby ruby-dev git ruby1.8-dev
...
The following packages have unmet dependencies:
 ruby1.8-dev : Depends: libc6-dev but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
root@gw2:/home/ubnt#

Trying to install libc6-dev fails:

root@gw2:/home/ubnt# apt-get install libc6-dev
The following packages have unmet dependencies:
 libc6-dev : Depends: libc6 (= 2.11.3-4) but 2.13-38+deb7u6 is to be installed
             Depends: libc-dev-bin (= 2.11.3-4) but it is not going to be installed
             Recommends: gcc but it is not going to be installed or
                         c-compiler
E: Unable to correct problems, you have held broken packages.
root@gw2:/home/ubnt#

You can pretend the problem doesn’t exist and ignore the problem by installing ruby 1.8 without the -dev package, but this will blow up on you later when you try to build gems such as ohai:

root@gw2:/tmp/rubygems-2.4.1# gem install ohai --no-rdoc --no-ri --verbose
...
/usr/lib/ruby/gems/1.8/gems/ffi-1.9.6/spec/ffi/variadic_spec.rb
/usr/lib/ruby/gems/1.8/gems/ffi-1.9.6/spec/spec.opts
Building native extensions.  This could take a while...
/usr/bin/ruby1.8 -r ./siteconf20141213-14984-bl1gm-0.rb extconf.rb
extconf.rb:4:in `require': no such file to load -- mkmf (LoadError)
	from extconf.rb:4
ERROR:  Error installing ohai:
	ERROR: Failed to build gem native extension.

    Building has failed. See above output for more information on the failure.
extconf failed, exit code 1

Gem files will remain installed in /usr/lib/ruby/gems/1.8/gems/ffi-1.9.6 for inspection.
Results logged to /usr/lib/ruby/gems/1.8/extensions/mips-linux/1.8/ffi-1.9.6/gem_make.out
root@gw2:/tmp/rubygems-2.4.1#

Aaaaand this fails because mkmf (Ruby MakeMakefile module) is provided by the ruby-dev package we couldn’t install earlier.

root@gw1:/home/ubnt# dpkg-query -L ruby1.9.1-dev | grep mkmf
/usr/lib/ruby/1.9.1/mkmf.rb
root@gw1:/home/ubnt#

So the lesson here is to make sure you’re fetching package from the correct repo. If you’ve found yourself in this situation, you’ll want to back things out and install the correct versions. First thing you want to do is dpkg --purge ruby1.8 libruby1.8 remove ruby 1.8. Then fix your apt sources and start all over again.

Chef/Ruby version caveat
One thing worth mentioning here is that you won’t be able to run the latest hotness, Chef client 12. The wheezy distro only has ruby 1.9.1, and Chef 12 requires ruby 2.0. The best I’ve been able to install from rubygems is Ohai v7.4.0 and Chef client 11.16.4.

gem install ohai --no-rdoc --no-ri --verbose -v 7.4.0
gem install chef --no-rdoc --no-ri --verbose -v 11.16.4

« Newer Posts - Older Posts »