Feed on

Operations reading list

I love books. These days I buy most of my books for Kindle, but I still buy paper for books I really like and want to keep around. Tech books are notorious for being obsolete a couple of years after printing, but there are still several timeless books I use for reference and would recommend for anyone in UNIX/Linux systems engineering or networking, new or jaded veteran. Some are older than others, but here’s a few that have served me well:


If you deal with the internet you must have a very solid understanding of the protocols involved, from ARP to TCP. By the time you’ve been in the industry for several years, you’ll encounter problems with every part of the stack covered by this book, along with lower levels such as Ethernet. tcpdump and other packet sniffers will be your best friend and you should use them liberally. My first edition of this book only covers IPv4 but the second edition covers IPv6 now, which you should be using!

(a/k/a “APUE”) The Internet is built on UNIX and C. This is more of a reference book rather than one you’d sit down and read, but I enjoy reading random bits when I’m curious or want more background on something. The book covers a lot detail of how the UNIX userland environment works with the kernel, giving snippets of C code to show exactly how something like syscalls are implemented under the hood. Ever ran strace and wondered what open(), write(), mkdir(), bind(), connect(), fork(), SIGUSR1 are? This book will show you in simple C code what’s going on.

Three recent additions this past year:

APUE was geared at a general System V / BSD UNIX audience. This book is very similar to APUE, but geared toward a Linux audience. It goes into the same level of detail and explaining things in C code as APUE. It’s a huge book coming in at 1,500+ pages so make room on your bookshelf for it.

Brendan has given many talks and authored several pieces on systems performance, benchmarking, and really digging in deep to troubleshooting bottlenecks. He authored DTrace and if you’ve ever seen the interesting “guy screaming at hard drives” (which shows effect of vibration on disk latency) video on YouTube, that’s him. You can’t change something if you can’t measure it, and this book explains how to get valid data to analyze performance of applications, CPU, memory, disks, kernels and networking. It also covers applications in a cloud environment and gives good insight on how virtualized kernels or system calls can impact performance.

In particular, I really like this book because it covers things from both a Linux and Solaris kernel perspective. I’ve used both over my career and while my Solaris is rusty this gives useful comparisons to get me through problems. I’ve heard Brendan speak a couple of times and his slides (and talk) from SCaLE 11x on Linux Performance Analysis are a great read. There are some very useful illustrations that show which tool to use for the job, e.g. in troubleshooting an issue do I use perf? iostat? sar? tcpdump? netstat? nicstat? strace?

I first ran across Sherri Davidoff by listening to her talk at DEF CON 21 about the do-it-yourself cellular sniffer^W IDS and later found her book. Most systems people are blissfully ignorant beyond the Ethernet interface of their servers. This doesn’t cut it anymore in a land of distributed systems, so you need to understand how to troubleshoot issues on the network too. This book is primarily written for doing forensic analysis and gathering evidence of events for an investigation, but there are still a lot of parallels in troubleshooting a production environment. Some of the same techniques for carefully collecting evidence and gathering logs are fantastic for writing up a root cause analysis, so some bad thing doesn’t happen again[tm].

I like this book because it covers traffic and packet analysis, a TL;DR of network protocols in real life, and the various network devices that data can flow through. This is the only practical book I’ve read that explains why you’d want to do flow analysis (e.g. NetFlow, sflow) to detect problems or see application activity, along with examples of using nfdump/NfSen. It covers intrusion detection, snort, switches, routers, firewalls, logging, tunneling, all good stuff.


In a previous life I was dedicated to network engineering in a managed hosting environment for a few years with lots of snowflake customers. I touched a wide swath of different types of gear from multiple vendors, hardware load balancers, VPNs, firewalls, L2/L3 switches, routers, huge L2 domains with hundreds of VLANs. Enough to do the job, but not a master at any. I caused my share of outages with spanning tree before I got a real grasp of what was going on. These books are a bit dated since Cisco and IOS isn’t as dominate as it once was (thank god), but they still have useful network stuff that transfers to other platforms.

My go-to book for Cisco firewalls back in the day. I dealt a lot with all three platforms and it was often quicker to just grab this book than dig around on Cisco’s website for configuration examples. My book is all marked up with notes and bookmarks for packet processing order of operations, NAT and SNAT configuration, failover pairs, and logging. It was good because it usually gave the equivalent PIX, ASA, and FWSM (Cat 6500 Firewall Service Module) commands together when explaining how to configure something.

Oddly absent from this book was a treatment of VPNs, there’s barely any mention of IPsec. I have the companion book “The Complete Cisco VPN Configuration Guide” but was disappointed at its coverage of IPsec and SSL/DTLS VPNs, especially when it came to troubleshooting on firewalls. A good hunk of the book is centered around the Cisco VPN 3000 Concentrator which is way obsolete now.

This was my savior in learning the guts of layer 2 Ethernet and spanning tree in its various flavors. STP, PVST+, Rapid STP, MST, BPDUs, STP roots, enough trees to make a forest. Then there’s VLANs, VLAN trunking, 802.1q tagging, Q-in-Q, private VLANs and multicast. Then it goes into covering CatOS and IOS on the beloved, trusty workhorse of the 2000s, the Catalyst 6500 series of switches. I never did get that CCNP.

  • Designing Content Switching Solutions, by Naseh and Kahn

This book is positively dated now, but if you find yourself still managing an ancient Cisco load balancer (e.g. CSS 11501, CSM for 6500, or firewall load balancing), this is your book. Beyond this it gets into HTTP/RTSP/streaming/layer 7 load balancing, SSL offloading and global load balancing. Now that I think about it, don’t buy this book. Offloading SSL to a hardware load balancer is a terrible thing you don’t want to do. Your farm of Intel Xeons can handle the crypto overhead much better than a puny RISC processor from 2001. The world is much better now and standard Linux servers are the new load balancer.

It’s a classic that practically everyone in the 1990s learned BGP from. Heck, it even includes a CIDR conversion table in the front flap and explains what NAPS were. Nevertheless, it explains various scenarios and topologies where you’d use BGP internally and externally, and how the protocol behaves to control routes. The world has moved to running MPLS within the backbone, but BGP is still alive and kicking on the edges. In fact at work we use BGP right down to the rack switch and inject VIPs onto the network via BGP.

Notable mentions:

Sometimes I just want to read a book with Kindle on an airplane or at breakfast, because I’m that kind of guy.

I hate and love Kerberos, mainly because I was clueless and tossed into the deep end to support it. I want to love it more because distributed authentication and authorization are super useful in a large systems environment and I don’t know how I’d live without it now, so I bought this to read. So far it doesn’t disappoint in how to setup Kerberos realms, KDCs, slaves, and all that fun stuff.

I don’t put on my DBA hat very often and usually touch MySQL seldom enough I have to go remember how to set up replication. If I supported it again, this would probably be the book I’d be reading.

Leave a Reply