(Rant warning) TL;DR I gripe at how complicated it gets and offer no solutions. I really do like what Let’s Encrypt offers. Just getting there figuring out what options work and don’t work is work. I don’t know how the muggles manage it.
TL;DR 2: HTTP-01 was out because of internal sites. DNS-01 was the only option, but I don’t use 3rd party DNS with APIs to handle automated challenge updates. Wound up installing a standalone ACME-DNS server for challenge responses.
I finally got annoyed enough at my TLS certificates that I started seriously trying to use Let’s Encrypt and ACME. I only have a couple of normal public-facing websites running on port 443 on the Internet, but internally I have a small army of Ubiquiti EdgeRouters, switches, wireless bridges, UniFi wireless controllers, Raspberry Pis, and other software with web servers on ports other than 80/443 that all need certificates. For years I’ve ran my own private CA to issue certificates, but it’s the same problem as commercial certificates to issue them and load them on all of my devices. Some browsers like Chrome now bitch at self-signed certificates in some cases too, so that’s not really a fix either.
One could say “but ha ha only suckers use web interfaces on routers”, which is true, but on occasion I do use them I’m reminded of the stupid problem of a long expired certificate and have to jump through the browser warning hoops every time that yes I’m yolo’ing to this allegedly sketchy device. A lot of the Ubiquiti stuff is only manageable over WebUI. This goes double if I’m on a new device that doesn’t have my private CA root certificates installed, or Android which makes it really difficult to install a private root CA. Triple combo pain if you’ve accidentally configured HTTP Strict Transport Security with a long lifetime to cover sub-domains and you try to reach something internally with a hostname with that domain with a bad certificate and the browser is like fuck you I won’t let you visit this site at all! This is where Chome’s ‘thisisunsafe’ override really comes in handy! I just want stuff to work and go about my day man, and not leak passwords.
No to HTTP-01
I can’t just throw certbot or acme.sh everywhere and call the problem solved. First of all not all of my devices are exposed to the Internet to accept HTTP-01 challenges from random sources, not to mention the non port-443 services. (God bless them for having a mix of IPv4 and IPv6 probe sources to handle IPv6-only endpoints which helps.) For the whole existence of Let’s Encrypt every year I thought about this problem I would look up how to do run my own ACME server with my private CA, groan at the apparent effort learning the whole ACME protocol and leave it.
DNS-01 with caveats
That leaves me with DNS-01 challenges. The problem here is that I don’t use a cloudy/third-party DNS provider that has an API where I can automatically update TXT records for automatic certificate renewals. I run straight up BIND and further my authoritative servers each use independent replicated master files with no slaving. This means any kind of dynamic updates to my DNS would have to go to each DNS server. I do have a few dynamically-updated A/AAAA records in sub-zones and for years I’ve just been running nsupdate twice, one for each authoritative server, and this has worked fine. ACME clients I’ve seen don’t support nsupdate to multiple servers, so this would be a hack to carry around.
I’m not casually replacing BIND nor throwing it all on the cloud. This recently lead to me to thinking ok fine maybe it’s not so bad doing a master/slave of my dynamic zones, that way an ACME client would only have to update one. This then lead me what to do about zone keys distribution. I’d have to copy the same master TSIG key around to all of my devices, or create a TSIG key per domain, sub-zone, or per device/A/AAAA record, which gets tedious and unpalatable.
acme-dns
I retreated and thought surely others have hit this problem too. This lead me to the acme-dns server project. It’s a little standalone DNS server that does nothing but serve up TXT records and has a simple REST API. I set it up on my internal IPv6 network so all of my internal devices can reach the API, and expose the DNS server on port 53 to the Internet so DNS-01 challenges are queryable. In my master zone files I delegate a sub-domain via NS record to the acme-dns IP address, and then create CNAMEs that point at that sub-domain so all challenges go to the acme-dns server. This is where I praise Let’s Encrypt for having IPv6 probes, they can reach my acme-dns server without having to burn a public IPv4 address just for it.
It took me a while to figure out how to actually use the thing. Certbot requires yet another thing to be installed, an acme-dns hook program. Have I mentioned how complicated this whole ecosystem is? acme.sh already includes a hook. Rant 1: good lord that thing is one massive unit of a Bash script. Rant 2: I really do not like it when installation instructions are “here just curl | sh”. It doesn’t just download a single file, it downloads several directories of files and shoves stuff into your crontab. Who knows what else it did. Must find DEB/RPM packages of that sucker.
Ok so now for each and every hostname+FQDN you want a certificate you want, you have to hit the /register endpoint of the acme-dns server first with a curl POST request. This generates a “username”, password, and a string for a sub-domain. This only exists in the acme-dns server database. If for example I had gw1.example.com, I would now add a record in my zone file that says “_acme-challenge.gw1.example.com. IN CNAME asdf-asdf-asdf-asdf-asdf-asdf.acme-dns.example.com.” *.acme-dns.example.com is already delegated via NS record to the acme-dns instance. This is tedious and annoying to do for a bunch of hostnames but it’s for the greater good and only has to be done once fortunately. By default acme-dns uses SQLite (or Postgres), so either way back that sucker up or you’ll have to re-generate every single one of your domain usernames when something dies.
Then for each and every hostname, take the username/password/subdomain, feed them into environment variables and then run acme.sh to issue the certificates. Witness the gigantic scripts in action! Stuff going to the CA, TXT records being fed to acme-DNS, stuff going to the DNS server, stuff coming back from the CA, more stuff going back and forth! If you’re lucky you get a few certificate and key files left. If you’re unlucky, good luck troubleshooting which step in this whole process broke down.
Finally, certificates!
Now you have certificates, what to do with them! This is another whole bear of a problem to tackle because there’s an infinite amount of web servers and directories to insert certificates into. Again there’s a whole ecosystem of Certbot/acme.sh deployment hooks that try to handle your webserver. Also remember by default this is all happening within your home directory, so keys have to be copied to secure system directories owned by root too. This is where I’m at now. I have some devices like Ubiquiti EdgeSwitches that can’t run an ACME client directly, so I have to rig up things to scp over the certificates.
I hope all of this just magically works and auto-renews in 90 days, what a pain to set up!