This commit is contained in:
@@ -0,0 +1,250 @@
|
||||
# Deployment: systemd on bare metal / VM
|
||||
|
||||
The canonical way to run `qu` on a Linux host. Single static binary,
|
||||
managed by systemd, with a hardened unit file. Most production users
|
||||
should start here.
|
||||
|
||||
## Audience and assumptions
|
||||
|
||||
- You have root (or `sudo`) on the host.
|
||||
- You have at least three hosts that can reach each other on TCP/9901.
|
||||
(Three is the minimum for a useful quorum; fewer is fine for
|
||||
development but a 2-node cluster offers no consensus protection.)
|
||||
- The hosts have a way to authenticate each other — direct IP or a
|
||||
resolvable hostname is fine. For overlay networks see
|
||||
[tailscale.md](tailscale.md).
|
||||
|
||||
## Install the binary
|
||||
|
||||
See [installation.md](../installation.md). The official `install.sh`
|
||||
script writes a *minimal* unit file that's fine for development. For
|
||||
production replace it with the hardened version below.
|
||||
|
||||
## Create a dedicated user
|
||||
|
||||
Running as a dedicated unprivileged user is best practice, but ICMP
|
||||
support adds a wrinkle — see the next section.
|
||||
|
||||
```sh
|
||||
sudo useradd --system --no-create-home --shell /usr/sbin/nologin quptime
|
||||
sudo install -d -o quptime -g quptime -m 0750 /etc/quptime
|
||||
sudo install -d -o quptime -g quptime -m 0750 /var/run/quptime
|
||||
```
|
||||
|
||||
## ICMP capabilities
|
||||
|
||||
ICMP probes have two implementations:
|
||||
|
||||
1. **Unprivileged UDP pings** — Linux's `dgram` ICMP socket. Works on
|
||||
any modern kernel without elevated privileges, but only if
|
||||
`net.ipv4.ping_group_range` includes the daemon's GID. This is the
|
||||
default in `qu`.
|
||||
2. **Raw ICMP** — requires `CAP_NET_RAW`, more accurate latency
|
||||
numbers and works for IPv6 from arbitrary kernels.
|
||||
|
||||
The simplest path: stick with unprivileged pings and widen
|
||||
`ping_group_range`. Sysctl, persistent across reboots:
|
||||
|
||||
```sh
|
||||
# /etc/sysctl.d/10-quptime.conf
|
||||
net.ipv4.ping_group_range = 0 2147483647
|
||||
```
|
||||
|
||||
```sh
|
||||
sudo sysctl --system
|
||||
```
|
||||
|
||||
If you need raw ICMP instead, grant the capability on the binary:
|
||||
|
||||
```sh
|
||||
sudo setcap cap_net_raw=+ep /usr/local/bin/qu
|
||||
```
|
||||
|
||||
Note that `setcap` is overwritten by every `qu` upgrade — bake the
|
||||
`setcap` call into your deploy script, or re-run it after each
|
||||
package update.
|
||||
|
||||
## Hardened unit file
|
||||
|
||||
Drop this in `/etc/systemd/system/quptime.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=QUptime distributed uptime monitor
|
||||
Documentation=https://git.cer.sh/axodouble/quptime
|
||||
Wants=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/qu serve
|
||||
Restart=always
|
||||
RestartSec=5s
|
||||
|
||||
User=quptime
|
||||
Group=quptime
|
||||
|
||||
# Where state lives. RuntimeDirectory creates /var/run/quptime/ each
|
||||
# boot owned by User:Group with mode 0750.
|
||||
Environment=QUPTIME_DIR=/etc/quptime
|
||||
RuntimeDirectory=quptime
|
||||
RuntimeDirectoryMode=0750
|
||||
ReadWritePaths=/etc/quptime /var/run/quptime
|
||||
|
||||
# Hardening. Comment out individual directives if a probe needs
|
||||
# something we've revoked.
|
||||
NoNewPrivileges=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
PrivateTmp=true
|
||||
PrivateDevices=true
|
||||
ProtectKernelTunables=true
|
||||
ProtectKernelModules=true
|
||||
ProtectControlGroups=true
|
||||
ProtectClock=true
|
||||
ProtectHostname=true
|
||||
RestrictNamespaces=true
|
||||
RestrictRealtime=true
|
||||
RestrictSUIDSGID=true
|
||||
LockPersonality=true
|
||||
MemoryDenyWriteExecute=true
|
||||
|
||||
# Network access is required (we're a network monitor). Keep address
|
||||
# families minimal — AF_NETLINK is needed for some libc lookups.
|
||||
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
|
||||
|
||||
# If you need raw ICMP, *also* uncomment:
|
||||
# AmbientCapabilities=CAP_NET_RAW
|
||||
# CapabilityBoundingSet=CAP_NET_RAW
|
||||
# Otherwise drop all capabilities:
|
||||
CapabilityBoundingSet=
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Reload systemd and enable:
|
||||
|
||||
```sh
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable quptime.service
|
||||
```
|
||||
|
||||
## Initialise the node
|
||||
|
||||
**Don't start the service yet** — `qu init` must run first, and it
|
||||
must run as the `quptime` user so it creates files with the right
|
||||
ownership.
|
||||
|
||||
On the **first** host (it will print a secret; copy it):
|
||||
|
||||
```sh
|
||||
sudo -u quptime QUPTIME_DIR=/etc/quptime \
|
||||
qu init --advertise alpha.example.com:9901
|
||||
```
|
||||
|
||||
On every **other** host (paste the secret):
|
||||
|
||||
```sh
|
||||
sudo -u quptime QUPTIME_DIR=/etc/quptime \
|
||||
qu init --advertise bravo.example.com:9901 --secret '<paste>'
|
||||
|
||||
sudo -u quptime QUPTIME_DIR=/etc/quptime \
|
||||
qu init --advertise charlie.example.com:9901 --secret '<paste>'
|
||||
```
|
||||
|
||||
## Open the firewall
|
||||
|
||||
`qu` needs TCP/9901 reachable between cluster members. Adjust to your
|
||||
firewall:
|
||||
|
||||
```sh
|
||||
# ufw
|
||||
sudo ufw allow from <peer-ip> to any port 9901 proto tcp
|
||||
|
||||
# firewalld
|
||||
sudo firewall-cmd --permanent --zone=internal \
|
||||
--add-rich-rule='rule family=ipv4 source address=<peer-ip> port port=9901 protocol=tcp accept'
|
||||
sudo firewall-cmd --reload
|
||||
|
||||
# nftables (drop-in)
|
||||
table inet filter {
|
||||
chain input {
|
||||
ip saddr { 10.0.0.10, 10.0.0.11, 10.0.0.12 } tcp dport 9901 accept
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For exposing 9901 to the open internet see
|
||||
[public-internet.md](public-internet.md).
|
||||
|
||||
## Start the daemon
|
||||
|
||||
```sh
|
||||
sudo systemctl start quptime
|
||||
sudo systemctl status quptime
|
||||
journalctl -u quptime -f
|
||||
```
|
||||
|
||||
## Invite peers
|
||||
|
||||
From one node (typically `alpha`):
|
||||
|
||||
```sh
|
||||
sudo -u quptime qu node add bravo.example.com:9901
|
||||
# Pause a few seconds so heartbeats reach the new peer before the next add —
|
||||
# otherwise the "needs ≥2 live to mutate" check rejects the second invite.
|
||||
sudo -u quptime qu node add charlie.example.com:9901
|
||||
```
|
||||
|
||||
`qu node add` prints each remote's fingerprint and asks for SSH-style
|
||||
confirmation. Verify it matches an out-of-band channel (the remote
|
||||
operator can show their fingerprint with
|
||||
`sudo -u quptime qu status` or by reading `trust.yaml`).
|
||||
|
||||
## Verify
|
||||
|
||||
```sh
|
||||
sudo -u quptime qu status
|
||||
```
|
||||
|
||||
Expect to see all three peers `live=true` and one of them as
|
||||
`master`.
|
||||
|
||||
## Log scraping
|
||||
|
||||
`journalctl -u quptime` is the canonical log stream. Notable lines:
|
||||
|
||||
| Pattern | Meaning |
|
||||
| ------------------------------------------------------------- | --------------------------------------------------------- |
|
||||
| `listening on ... as node ...` | Daemon up. |
|
||||
| `manual-edit: cluster.yaml changed externally — replicating…` | An operator edited `cluster.yaml` directly. |
|
||||
| `manual-edit: parse cluster.yaml: ...` | Invalid YAML on disk; the operator must fix and re-save. |
|
||||
| `report to master ...: <err>` | A follower couldn't ship a probe result to the master. |
|
||||
| `replicate: pull from ...: <err>` | A follower couldn't pull a higher-version config snapshot. |
|
||||
|
||||
## Sample reload / restart drill
|
||||
|
||||
After editing the unit file:
|
||||
|
||||
```sh
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart quptime
|
||||
```
|
||||
|
||||
After editing `cluster.yaml` by hand:
|
||||
|
||||
```sh
|
||||
sudoedit /etc/quptime/cluster.yaml
|
||||
# No restart needed — the watcher picks it up within 2s and pushes to master.
|
||||
```
|
||||
|
||||
After upgrading the binary:
|
||||
|
||||
```sh
|
||||
sudo install -m 0755 qu-new /usr/local/bin/qu
|
||||
sudo setcap cap_net_raw=+ep /usr/local/bin/qu # if you use raw ICMP
|
||||
sudo systemctl restart quptime
|
||||
```
|
||||
|
||||
Doing rolling upgrades? See [operations.md](../operations.md).
|
||||
Reference in New Issue
Block a user