AI assisted documentation

2026-05-15 04:05:30 +00:00
parent 364ba222e2
commit 6953709574
12 changed files with 2029 additions and 0 deletions
@@ -0,0 +1,198 @@
+# Deployment: Docker / docker-compose
+
+The published image is a 14 MB distroless static container with the
+`qu` binary as the entrypoint. It runs as root by default so the
+daemon can bind privileged ports and open ICMP sockets; override with
+`--user` if your host doesn't need that.
+
+## Image references
+
+```
+git.cer.sh/axodouble/quptime:master          # tip of main, multi-arch
+git.cer.sh/axodouble/quptime:v0.1.0          # tagged release
+git.cer.sh/axodouble/quptime:v0.1.0-amd64    # single-arch (if you must pin)
+```
+
+The image embeds `QUPTIME_DIR=/etc/quptime` and declares it a volume —
+treat it as the only piece of state worth persisting.
+
+## Single-node, single-container compose
+
+For a development cluster or a single-node smoke test:
+
+```yaml
+# compose.yaml
+services:
+  quptime:
+    image: git.cer.sh/axodouble/quptime:v0.1.0
+    container_name: quptime
+    restart: unless-stopped
+    ports:
+      - "9901:9901"
+    volumes:
+      - quptime-data:/etc/quptime
+    # ICMP UDP-mode pings need a permissive sysctl on the host:
+    #   sysctl net.ipv4.ping_group_range="0 2147483647"
+    # Or grant CAP_NET_RAW (more accurate, raw ICMP).
+    cap_add:
+      - NET_RAW
+
+volumes:
+  quptime-data:
+```
+
+You must **`qu init` before the daemon will start**. With this compose
+file:
+
+```sh
+docker compose run --rm quptime init --advertise <host-ip>:9901
+docker compose up -d
+docker compose exec quptime qu status
+```
+
+`<host-ip>` must be reachable from every other node — the loopback
+address inside the container is useless to peers.
+
+## Three-node compose on a single host
+
+For local testing of the full quorum machinery without three machines:
+
+```yaml
+# compose.yaml
+x-quptime: &quptime
+  image: git.cer.sh/axodouble/quptime:v0.1.0
+  restart: unless-stopped
+  cap_add:
+    - NET_RAW
+
+services:
+  alpha:
+    <<: *quptime
+    container_name: alpha
+    ports: ["9901:9901"]
+    volumes: ["alpha-data:/etc/quptime"]
+
+  bravo:
+    <<: *quptime
+    container_name: bravo
+    ports: ["9902:9901"]
+    volumes: ["bravo-data:/etc/quptime"]
+
+  charlie:
+    <<: *quptime
+    container_name: charlie
+    ports: ["9903:9901"]
+    volumes: ["charlie-data:/etc/quptime"]
+
+volumes:
+  alpha-data:
+  bravo-data:
+  charlie-data:
+```
+
+Bootstrap:
+
+```sh
+# First node: prints the secret to stdout.
+docker compose run --rm alpha init --advertise alpha:9901
+# Capture the secret (or read it back from alpha-data).
+SECRET=$(docker compose exec alpha cat /etc/quptime/node.yaml | grep cluster_secret | awk '{print $2}')
+
+docker compose run --rm bravo   init --advertise bravo:9901   --secret "$SECRET"
+docker compose run --rm charlie init --advertise charlie:9901 --secret "$SECRET"
+
+docker compose up -d
+
+# Invite from alpha. The hostnames resolve over the compose network.
+docker compose exec alpha qu node add bravo:9901
+sleep 3   # wait for heartbeats before the next add
+docker compose exec alpha qu node add charlie:9901
+
+docker compose exec alpha qu status
+```
+
+For a cluster on three separate hosts, replicate the compose file on
+each box with different `advertise` addresses (the public hostname or
+the overlay IP) and bootstrap the same way.
+
+## Multi-host compose
+
+The natural unit is one compose file per host, each running one
+`qu` container. The minimum-viable file per host:
+
+```yaml
+# /etc/qu-stack/compose.yaml
+services:
+  quptime:
+    image: git.cer.sh/axodouble/quptime:v0.1.0
+    container_name: quptime
+    restart: unless-stopped
+    ports:
+      - "9901:9901"
+    volumes:
+      - /srv/quptime/data:/etc/quptime
+    cap_add:
+      - NET_RAW
+```
+
+Persistence is a bind-mount under `/srv/quptime/data` so backups and
+upgrades hit a known path. See [operations.md](../operations.md) for
+the backup recipe.
+
+Inter-host traffic on TCP/9901 must be reachable. If the boxes don't
+share a private network, prefer the
+[Tailscale recipe](tailscale.md) over exposing 9901 directly — see
+[public-internet.md](public-internet.md) for the threat model if you
+must expose it.
+
+## Behind a reverse proxy
+
+**Don't.** `qu` is mTLS-pinned at the application layer, so a TLS-
+terminating proxy would force the daemon to trust whatever cert the
+proxy presents — defeating fingerprint pinning. If you need a single
+public address per node, use a Layer 4 TCP proxy (`nginx stream`,
+HAProxy `mode tcp`, or a plain firewall NAT) that forwards bytes
+without touching them.
+
+## Image internals
+
+Build locally if you want to inspect what you're running:
+
+```sh
+docker buildx build \
+  --build-arg VERSION=$(git describe --tags --always) \
+  --platform linux/amd64,linux/arm64 \
+  --file docker/Dockerfile \
+  --tag quptime:dev \
+  --load \
+  .
+```
+
+The Dockerfile (see `docker/Dockerfile`) is two stages: a `golang:1.24-alpine`
+builder that cross-compiles with `-trimpath -ldflags "-s -w"`, and a
+`gcr.io/distroless/static-debian12` runtime. No shell, no package
+manager, no SSH; you cannot `docker exec -it sh` into it. Use
+`docker exec quptime qu ...` for everything.
+
+## Healthcheck
+
+The container exits non-zero if the daemon crashes, so the default
+`restart: unless-stopped` policy is enough for liveness. A more
+useful readiness check requires the binary to be in your healthchecker:
+
+```yaml
+healthcheck:
+  test: ["CMD", "/usr/local/bin/qu", "status"]
+  interval: 30s
+  timeout: 5s
+  retries: 3
+  start_period: 10s
+```
+
+`qu status` exits 0 when the daemon socket is reachable and the
+control RPC succeeds — it does **not** fail on quorum loss. That's
+intentional: restarting a quorum-less node won't bring quorum back,
+and a healthcheck that flaps a follower in and out of `unhealthy`
+state every time the master is briefly unreachable is worse than no
+check. If you want a stricter readiness signal, pipe `qu status`
+through `grep -q 'quorum     true'`.
@@ -0,0 +1,180 @@
+# Deployment: public-internet exposure
+
+If your nodes do not share a private network and you can't put an
+overlay between them (see [tailscale.md](tailscale.md)), this is the
+recipe for exposing TCP/9901 directly to the open internet without
+losing sleep.
+
+The short version: `qu` is designed for this — every inbound call is
+mTLS-pinned at the application layer and gated by the cluster secret
+— but defence in depth is cheap and you should take it.
+
+## Threat model in one paragraph
+
+Anyone on the internet can establish a TLS connection to `:9901`
+because the daemon must accept handshakes from currently-untrusted
+peers (otherwise no node could ever join). The RPC dispatcher then
+rejects every method except `Join` for callers whose fingerprint
+isn't in `trust.yaml`. `Join` itself is gated by the **cluster
+secret**, compared in constant time. So the realistic attack surface
+is:
+
+1. The TLS 1.3 stack accepting handshakes from arbitrary peers.
+2. The `Join` handler's secret check and downstream cert ingestion.
+3. The blast radius of a leaked cluster secret (an attacker who has
+   it can enrol themselves as a peer and propose mutations, which is
+   game over).
+
+What can't trivially happen:
+
+- A random attacker observing or modifying cluster traffic — TLS 1.3
+  with fingerprint pinning sees to that.
+- A random attacker calling any method other than `Join` — the RPC
+  dispatcher refuses.
+
+What you should still do:
+
+- Treat `node.yaml.cluster_secret` like an SSH host key. Out-of-band
+  distribution only. Never in git, never in CI logs, never in chat.
+- Rate-limit and IP-allowlist where you can. The `Join` handler does
+  not currently rate-limit at the application layer, so a determined
+  attacker could try secrets at TLS-handshake rate.
+- Run on a non-default port if your operations workflow allows it.
+  Doesn't add security, but reduces background internet noise in the
+  logs and makes IDS / WAF rules cleaner.
+
+## Firewall
+
+### nftables (recommended)
+
+A drop-in `/etc/nftables.d/quptime.nft`:
+
+```nft
+table inet filter {
+  set quptime_peers {
+    type ipv4_addr
+    elements = { 198.51.100.10, 198.51.100.11, 198.51.100.12 }
+  }
+
+  chain quptime_input {
+    # Drop everything that didn't come from a known peer.
+    ip saddr @quptime_peers tcp dport 9901 accept
+    tcp dport 9901 log prefix "quptime-drop: " level info drop
+  }
+
+  chain input {
+    type filter hook input priority 0; policy drop;
+    ct state established,related accept
+    iif lo accept
+    jump quptime_input
+    # ... your other rules
+  }
+}
+```
+
+The allowlist is the highest-ROI mitigation by far — if you maintain
+fixed IPs for your monitor nodes, use this and move on.
+
+### ufw
+
+```sh
+sudo ufw allow from 198.51.100.10 to any port 9901 proto tcp
+sudo ufw allow from 198.51.100.11 to any port 9901 proto tcp
+sudo ufw allow from 198.51.100.12 to any port 9901 proto tcp
+```
+
+### Dynamic peer IPs
+
+If peer IPs aren't fixed (e.g., one node is on a home connection with
+a rotating address), you have three options ranked by preference:
+
+1. Use an overlay instead — see [tailscale.md](tailscale.md). This is
+   the right answer.
+2. DNS-based allowlisting (`ipset`-from-DNS or a small reconciler that
+   re-resolves an allowlist hostname every minute). Beware: a
+   compromised DNS resolver becomes a compromise of the allowlist.
+3. Drop the allowlist and rely solely on the cluster secret + mTLS.
+   This is what `qu` is designed to survive; just be sure the secret
+   actually has the entropy `qu init` generated for it (32 random
+   bytes, base64-encoded).
+
+## Rate-limiting failed handshakes
+
+`qu` does not currently rate-limit `Join` attempts at the application
+layer. You can do it at the firewall, which catches both connect
+floods and slow brute-force:
+
+```nft
+table inet filter {
+  chain quptime_input {
+    tcp dport 9901 ct state new \
+      meter quptime_ratemeter { ip saddr limit rate over 10/second } \
+      log prefix "quptime-rate: " drop
+    tcp dport 9901 accept
+  }
+}
+```
+
+Or `fail2ban` with a tiny custom filter that watches `journalctl -u
+quptime` for repeated `peer rejected join` lines:
+
+```ini
+# /etc/fail2ban/filter.d/quptime.conf
+[Definition]
+failregex = ^.*quptime:.*peer rejected join.*from <ADDR>.*$
+```
+
+```ini
+# /etc/fail2ban/jail.d/quptime.local
+[quptime]
+enabled  = true
+filter   = quptime
+backend  = systemd
+journalmatch = _SYSTEMD_UNIT=quptime.service
+maxretry = 3
+findtime = 600
+bantime  = 86400
+```
+
+Note: the daemon doesn't currently log the *peer address* on rejected
+joins. The log filter above is illustrative; check what your version
+actually emits before relying on it.
+
+## Secret hygiene
+
+The single most important thing on a public-internet deployment:
+
+- **Generate the secret on the first node.** `qu init` with no
+  `--secret` produces 32 random bytes from `crypto/rand`, base64-
+  encoded. Don't replace that with something memorable.
+- **Transport out of band.** Paste it into your secret manager
+  immediately; share via 1Password / Vault / encrypted email.
+- **Rotate if anyone with access has left.** Rotation isn't a CLI
+  command; do it the brute-force way: `qu init` a fresh cluster on
+  new ports, re-add every check via `cluster.yaml` export, swap DNS.
+- **One secret per cluster.** Do not reuse the secret across staging
+  and prod, or across customers if you run several clusters.
+
+## Non-default ports
+
+```sh
+# Each node, in node.yaml — or pass --port on init.
+qu init --advertise alpha.example.com:51234 --port 51234
+```
+
+Open the corresponding firewall rule, restart the daemon. The
+cluster doesn't require uniform ports across nodes; each peer's
+`advertise` field tells everyone else what to dial.
+
+## What you should monitor on a public deployment
+
+- `term` from `qu status` — if it's ticking up frequently the master
+  is flapping, which probably means at least one peer's network is
+  unstable. Could be benign, could be a probe attempt.
+- The firewall drop counter on the `quptime-drop` rule above.
+- The number of TLS handshakes on `:9901`. A spike in handshakes that
+  don't progress to a successful RPC is the signature of a brute-force
+  on the cluster secret.
+
+For the operational side — backups, upgrades, recovery — see
+[operations.md](../operations.md).
@@ -0,0 +1,250 @@
+# Deployment: systemd on bare metal / VM
+
+The canonical way to run `qu` on a Linux host. Single static binary,
+managed by systemd, with a hardened unit file. Most production users
+should start here.
+
+## Audience and assumptions
+
+- You have root (or `sudo`) on the host.
+- You have at least three hosts that can reach each other on TCP/9901.
+  (Three is the minimum for a useful quorum; fewer is fine for
+  development but a 2-node cluster offers no consensus protection.)
+- The hosts have a way to authenticate each other — direct IP or a
+  resolvable hostname is fine. For overlay networks see
+  [tailscale.md](tailscale.md).
+
+## Install the binary
+
+See [installation.md](../installation.md). The official `install.sh`
+script writes a *minimal* unit file that's fine for development. For
+production replace it with the hardened version below.
+
+## Create a dedicated user
+
+Running as a dedicated unprivileged user is best practice, but ICMP
+support adds a wrinkle — see the next section.
+
+```sh
+sudo useradd --system --no-create-home --shell /usr/sbin/nologin quptime
+sudo install -d -o quptime -g quptime -m 0750 /etc/quptime
+sudo install -d -o quptime -g quptime -m 0750 /var/run/quptime
+```
+
+## ICMP capabilities
+
+ICMP probes have two implementations:
+
+1. **Unprivileged UDP pings** — Linux's `dgram` ICMP socket. Works on
+   any modern kernel without elevated privileges, but only if
+   `net.ipv4.ping_group_range` includes the daemon's GID. This is the
+   default in `qu`.
+2. **Raw ICMP** — requires `CAP_NET_RAW`, more accurate latency
+   numbers and works for IPv6 from arbitrary kernels.
+
+The simplest path: stick with unprivileged pings and widen
+`ping_group_range`. Sysctl, persistent across reboots:
+
+```sh
+# /etc/sysctl.d/10-quptime.conf
+net.ipv4.ping_group_range = 0 2147483647
+```
+
+```sh
+sudo sysctl --system
+```
+
+If you need raw ICMP instead, grant the capability on the binary:
+
+```sh
+sudo setcap cap_net_raw=+ep /usr/local/bin/qu
+```
+
+Note that `setcap` is overwritten by every `qu` upgrade — bake the
+`setcap` call into your deploy script, or re-run it after each
+package update.
+
+## Hardened unit file
+
+Drop this in `/etc/systemd/system/quptime.service`:
+
+```ini
+[Unit]
+Description=QUptime distributed uptime monitor
+Documentation=https://git.cer.sh/axodouble/quptime
+Wants=network-online.target
+After=network-online.target
+
+[Service]
+Type=simple
+ExecStart=/usr/local/bin/qu serve
+Restart=always
+RestartSec=5s
+
+User=quptime
+Group=quptime
+
+# Where state lives. RuntimeDirectory creates /var/run/quptime/ each
+# boot owned by User:Group with mode 0750.
+Environment=QUPTIME_DIR=/etc/quptime
+RuntimeDirectory=quptime
+RuntimeDirectoryMode=0750
+ReadWritePaths=/etc/quptime /var/run/quptime
+
+# Hardening. Comment out individual directives if a probe needs
+# something we've revoked.
+NoNewPrivileges=true
+ProtectSystem=strict
+ProtectHome=true
+PrivateTmp=true
+PrivateDevices=true
+ProtectKernelTunables=true
+ProtectKernelModules=true
+ProtectControlGroups=true
+ProtectClock=true
+ProtectHostname=true
+RestrictNamespaces=true
+RestrictRealtime=true
+RestrictSUIDSGID=true
+LockPersonality=true
+MemoryDenyWriteExecute=true
+
+# Network access is required (we're a network monitor). Keep address
+# families minimal — AF_NETLINK is needed for some libc lookups.
+RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
+
+# If you need raw ICMP, *also* uncomment:
+# AmbientCapabilities=CAP_NET_RAW
+# CapabilityBoundingSet=CAP_NET_RAW
+# Otherwise drop all capabilities:
+CapabilityBoundingSet=
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Reload systemd and enable:
+
+```sh
+sudo systemctl daemon-reload
+sudo systemctl enable quptime.service
+```
+
+## Initialise the node
+
+**Don't start the service yet** — `qu init` must run first, and it
+must run as the `quptime` user so it creates files with the right
+ownership.
+
+On the **first** host (it will print a secret; copy it):
+
+```sh
+sudo -u quptime QUPTIME_DIR=/etc/quptime \
+  qu init --advertise alpha.example.com:9901
+```
+
+On every **other** host (paste the secret):
+
+```sh
+sudo -u quptime QUPTIME_DIR=/etc/quptime \
+  qu init --advertise bravo.example.com:9901 --secret '<paste>'
+
+sudo -u quptime QUPTIME_DIR=/etc/quptime \
+  qu init --advertise charlie.example.com:9901 --secret '<paste>'
+```
+
+## Open the firewall
+
+`qu` needs TCP/9901 reachable between cluster members. Adjust to your
+firewall:
+
+```sh
+# ufw
+sudo ufw allow from <peer-ip> to any port 9901 proto tcp
+
+# firewalld
+sudo firewall-cmd --permanent --zone=internal \
+  --add-rich-rule='rule family=ipv4 source address=<peer-ip> port port=9901 protocol=tcp accept'
+sudo firewall-cmd --reload
+
+# nftables (drop-in)
+table inet filter {
+  chain input {
+    ip saddr { 10.0.0.10, 10.0.0.11, 10.0.0.12 } tcp dport 9901 accept
+  }
+}
+```
+
+For exposing 9901 to the open internet see
+[public-internet.md](public-internet.md).
+
+## Start the daemon
+
+```sh
+sudo systemctl start quptime
+sudo systemctl status quptime
+journalctl -u quptime -f
+```
+
+## Invite peers
+
+From one node (typically `alpha`):
+
+```sh
+sudo -u quptime qu node add bravo.example.com:9901
+# Pause a few seconds so heartbeats reach the new peer before the next add —
+# otherwise the "needs ≥2 live to mutate" check rejects the second invite.
+sudo -u quptime qu node add charlie.example.com:9901
+```
+
+`qu node add` prints each remote's fingerprint and asks for SSH-style
+confirmation. Verify it matches an out-of-band channel (the remote
+operator can show their fingerprint with
+`sudo -u quptime qu status` or by reading `trust.yaml`).
+
+## Verify
+
+```sh
+sudo -u quptime qu status
+```
+
+Expect to see all three peers `live=true` and one of them as
+`master`.
+
+## Log scraping
+
+`journalctl -u quptime` is the canonical log stream. Notable lines:
+
+| Pattern                                                       | Meaning                                                   |
+| ------------------------------------------------------------- | --------------------------------------------------------- |
+| `listening on ... as node ...`                                | Daemon up.                                                |
+| `manual-edit: cluster.yaml changed externally — replicating…` | An operator edited `cluster.yaml` directly.               |
+| `manual-edit: parse cluster.yaml: ...`                        | Invalid YAML on disk; the operator must fix and re-save.  |
+| `report to master ...: <err>`                                 | A follower couldn't ship a probe result to the master.    |
+| `replicate: pull from ...: <err>`                             | A follower couldn't pull a higher-version config snapshot. |
+
+## Sample reload / restart drill
+
+After editing the unit file:
+
+```sh
+sudo systemctl daemon-reload
+sudo systemctl restart quptime
+```
+
+After editing `cluster.yaml` by hand:
+
+```sh
+sudoedit /etc/quptime/cluster.yaml
+# No restart needed — the watcher picks it up within 2s and pushes to master.
+```
+
+After upgrading the binary:
+
+```sh
+sudo install -m 0755 qu-new /usr/local/bin/qu
+sudo setcap cap_net_raw=+ep /usr/local/bin/qu   # if you use raw ICMP
+sudo systemctl restart quptime
+```
+
+Doing rolling upgrades? See [operations.md](../operations.md).
@@ -0,0 +1,181 @@
+# Deployment: Tailscale / WireGuard overlay
+
+When your nodes live in different networks — different VPS providers,
+different physical sites, a mix of home and cloud — exposing TCP/9901
+to the open internet is a poor idea. An overlay network gives every
+node a stable private IP regardless of NAT, and `qu` only needs to
+listen on that overlay address.
+
+This page focuses on Tailscale because the repo ships an example
+compose for it, but everything generalises to WireGuard, Nebula, or a
+self-hosted Headscale.
+
+## The big idea
+
+```
+--- host A (VPS, no public ICMP) ----+
+| tailscale ←→ overlay ip 100.64.1.1  |
+| qu listening on 100.64.1.1:9901     |
+-------------------------------------+
+              │   mTLS over overlay
+              ▼
+--- host B (homelab behind NAT) -----+
+| tailscale ←→ overlay ip 100.64.1.2  |
+| qu listening on 100.64.1.2:9901     |
+-------------------------------------+
+```
+
+`bind_addr` is set to the tailscale IP, the host's public interface
+has no port 9901 open, and the cluster secret + mTLS handshake gate
+the link inside the tunnel.
+
+## Compose recipe
+
+The repo ships [`docker/docker-compose-tailscale.yml`](../../docker/docker-compose-tailscale.yml).
+The relevant trick is `network_mode: "service:tailscale"` — the
+`quptime` container shares the network namespace of the `tailscale`
+sidecar so it sees the tailnet as its own interface.
+
+```yaml
+services:
+  tailscale:
+    image: tailscale/tailscale:latest
+    container_name: tailscale
+    cap_add: [NET_ADMIN]
+    environment:
+      - TS_AUTHKEY=${TAILSCALE_AUTHKEY}   # provision via .env
+      - TS_HOSTNAME=quptime-${HOST}       # name visible in admin
+    volumes:
+      - /dev/net/tun:/dev/net/tun
+      - tailscale:/var/lib/tailscale
+    restart: unless-stopped
+
+  quptime:
+    image: git.cer.sh/axodouble/quptime:v0.1.0
+    container_name: quptime
+    volumes:
+      - quptime:/etc/quptime
+    network_mode: "service:tailscale"
+    depends_on: [tailscale]
+    cap_add: [NET_RAW]
+    # No restart directive yet — needs `qu init` first.
+
+volumes:
+  tailscale:
+  quptime:
+```
+
+### One-time bootstrap
+
+Each host runs the same script with different `HOST` and `TAILSCALE_AUTHKEY`:
+
+```sh
+# .env
+HOST=alpha
+TAILSCALE_AUTHKEY=tskey-auth-xxxxxxxx
+```
+
+Start Tailscale alone first so it gets an IP:
+
+```sh
+docker compose up -d tailscale
+sleep 5
+TSIP=$(docker compose exec tailscale tailscale ip --4)
+echo "this node's tailnet IP: $TSIP"
+```
+
+On the **first** host, init without `--secret`:
+
+```sh
+docker compose run --rm quptime init --advertise "$TSIP:9901"
+# Grab the printed secret; pipe through your password manager.
+```
+
+On every **other** host, paste the secret:
+
+```sh
+docker compose run --rm quptime init \
+  --advertise "$TSIP:9901" \
+  --secret "$CLUSTER_SECRET"
+```
+
+Then bring up `qu` on every node and invite from the first:
+
+```sh
+# Each host
+docker compose up -d quptime
+
+# From alpha
+docker compose exec quptime qu node add 100.64.1.2:9901
+sleep 3
+docker compose exec quptime qu node add 100.64.1.3:9901
+
+docker compose exec quptime qu status
+```
+
+## Tailscale ACLs
+
+Belt and braces — even though mTLS pins identities, lock down the
+tailnet itself so only the `qu` nodes can reach each other's :9901.
+In the Tailscale admin console:
+
+```jsonc
+{
+  "tagOwners": { "tag:qu-node": ["group:ops"] },
+  "acls": [
+    {
+      "action": "accept",
+      "src": ["tag:qu-node"],
+      "dst": ["tag:qu-node:9901"]
+    }
+    // ...your other rules
+  ]
+}
+```
+
+Then tag every `qu` node in its auth key:
+
+```yaml
+environment:
+  - TS_AUTHKEY=${TAILSCALE_AUTHKEY}?ephemeral=false&tags=tag:qu-node
+```
+
+## WireGuard / Nebula / Headscale equivalents
+
+The recipe generalises:
+
+1. Provision the overlay interface on each host with a stable
+   private IP (the tunnel's own address).
+2. `qu init --advertise <overlay-ip>:9901`.
+3. Set `bind_addr: <overlay-ip>` in `node.yaml` so the daemon does
+   **not** also listen on the public interface.
+4. Open `:9901` only on the overlay interface in your firewall — for
+   nftables that's something like `iifname "wg0" tcp dport 9901
+   accept`.
+
+The cluster secret and mTLS fingerprints still apply; the overlay just
+removes the open-internet attack surface.
+
+## Why prefer overlay over public exposure
+
+- Single failure domain at the network layer: an attacker who finds an
+  exploit in your overlay client (rare; Tailscale and WireGuard are
+  small surfaces) still hits the application-layer pinning before any
+  cluster-level operation.
+- The cluster secret can be lower-entropy when it's already
+  unreachable from outside. (You should still treat it as a real
+  secret; "defence in depth" only works if every layer is real.)
+- ICMP probes from a homelab to a target on the public internet are
+  trivial through NAT, but ICMP *into* a homelab usually isn't.
+  Running `qu` on a tailnet means peers can heartbeat each other
+  regardless of NAT direction.
+
+## Trade-offs
+
+- One more thing to monitor. If your tailnet is down, your monitor is
+  down. Counter-measure: run *another* tiny `qu` cluster (or a single
+  node) on the public internet that watches the overlay's coordinator
+  health.
+- Probe latency includes the overlay's hop. Tailscale's wireguard is
+  fast (<1 ms LAN, single-digit ms WAN) so this rarely matters, but
+  if you're alerting on tight latency thresholds, account for it.