Alessandro Ghedini /dev/random

Multi-VLAN DHCP+DNS home network setup with dnsmasq

Last week my ISP had an outage, and I discovered to my dismay that local DNS resolution on my home network stopped working too. My ISP having outages is nothing new, but the broken local DNS is, as the whole point of setting it up in the first place was to allow local services to keep working during the not-infrequent ISP oopsies.

My home network is largely run over Unifi hardware, with a Gateway Pro acting as, well, the network’s gateway. It comes with its own local DHCP and DNS setup based on dnsmasq which has worked fairly well in the past, so it’s not entirely clear whether this new failure mode is something that was introduced in a recent update (as I said, I’ve weathered ISP outages before without noticing the issue).

Looking a bit into it, it appears that the dnsmasq instance responsible for local DHCP and DNS is configured to use the gateway’s WAN interface, which seems a little strange, with the result that Unifi’s WAN fallback mechanism seems to disable it when the upstream connection breaks (thus breaking DNS as well).

I did a fair amount of searching and found a few mentions of similar issues from other people, though there didn’t seem to be a solution.

I decided that instead of trying to fix the issue locally on the gateway itself I would just deploy my own dnsmasq installation on separate hardware to prevent future updates from breaking it again.

Easier said than done.

Mo VLANs Mo Problems

Installing dnsmasq on my own host was easy enough, however I soon ran into some issues that didn’t exist in the previous setup.

You see, my network has a number of VLANs to keep local hosts tidy and organized. For example, one VLAN (with disabled Internet access) hosts all of my IoT tchotchkes which helps prevent major security issues, and the host running Home Assistant is connected to both the IoT VLAN for managing these devices, as well as the main network to allow users to interact with it.

With the Gateway Pro setup the Home Assistant host would get two DHCP leases, one for each VLAN, and more critically, its domain name would resolve to one or the other IP address depending on which VLAN the DNS query came from.

This didn’t work in my naive setup, as it turns out that a single dnsmasq instance can only associate a particular hostname with a single DHCP lease file entry, which means that the multi-VLAN host would always resolve to a single IP address regardless of the source VLAN (with only one lease having the correct hostname in the lease database, and the others listing * as the hostname).

The solution was running separate dnsmasq instances for DHCP (one per VLAN) plus a dedicated dnsmasq instance for DNS only, which is actually pretty much how the Unifi gateway setup works as well. The DHCP instances never listen on port 53, they just hand out addresses and use dnsmasq’s dhcp-script hook to write individual /etc/hosts-compatible files into a shared directory.

The DNS instance then uses hostsdir=/run/dnsmasq/hosts.d to pick up all those per-IP hosts files and resolve them. Since each file is keyed by IP address rather than hostname, there’s no conflict when the same device has leases on multiple VLANs, you just get multiple files for different IPs of the same host.

The pieces

Each DHCP instance is enabled as a systemd template unit (dnsmasq-dhcp@<VLAN>.service) so they can be managed independently, and each instance’s configuration is generated from my Ansible inventory.

It looks something like this (with some cruft removed for brevity):

domain=example.com

interface=

port=0

dhcp-authoritative
dhcp-leasefile=/var/lib/dnsmasq/leases-
dhcp-script=/usr/local/bin/dhcp-script.sh
script-on-renewal

Where renders to the netif's name, and to the actual netif device. Also note port=0 which means this instance only does DHCP, never DNS.

The dhcp-script.sh hook runs whenever a lease is added, renewed, or deleted, and writes one file per IP address into /run/dnsmasq/hosts.d/, containing the full hostname (with domain) plus the short name.

I couldn’t find an example of a similar setup anywhere, and the Unifi’s dhcp-script seems to be a binary so I couldn’t quite inspect what it does, so I ended up vibecoding my own script:

And for completeness, the DNS-only instance config is as follows:

domain=example.com

bind-dynamic
bogus-priv
hostsdir=/run/dnsmasq/hosts.d
localise-queries
no-hosts
no-resolv

local=/example.com/
server=127.0.0.1#5053

The hostsdir directive makes dnsmasq load all files from that directory as if they were static host entries, and localise-queries ensures each client gets resolved with the correct domain based on which interface it’s coming from.

dnscrypt-proxy

Finally upstream DNS resolution is handled by dnscrypt-proxy (largely to get all the nice modern goodies like DNS over HTTPS), which I run as a systemd socket-activated service on 127.0.2.1:5053.

The default Debian package listens on 127.0.0.1:53, so I just had to override the socket unit file to change port:

[Socket]
ListenStream=
ListenDatagram=
ListenStream=127.0.0.1:5053
ListenDatagram=127.0.0.1:5053

This avoids conflicting with dnsmasq’s own DNS on port 53.

Anyways, I hope this will be useful to the unfortunate souls (human or otherwise) that decide that hosting their own dnsmasq is “probably just going to take 10 minutes” in the future.