AI assisted documentation
Container image / image (push) Successful in 1m37s

This commit is contained in:
2026-05-15 04:05:30 +00:00
parent 364ba222e2
commit 6953709574
12 changed files with 2029 additions and 0 deletions
+273
View File
@@ -0,0 +1,273 @@
# Configuration
This page is the canonical reference for the on-disk files, the
environment variables, and every field that `qu` reads. It's
deliberately tedious — when something doesn't behave the way you
expect, this is where the answer lives.
## File layout
When running as **root** (the typical case under systemd):
```
/etc/quptime/
├── node.yaml identity, never replicated
├── cluster.yaml replicated state
├── trust.yaml local fingerprint trust store
└── keys/
├── private.pem RSA private key (0600)
├── public.pem RSA public key
└── cert.pem self-signed X.509 cert
/var/run/quptime/quptime.sock control socket (0600)
```
When running as a **non-root** user (the typical case for `go run` or a
desktop test):
```
~/.config/quptime/... same shape as /etc/quptime
$XDG_RUNTIME_DIR/quptime/quptime.sock control socket
```
Override the data directory with `QUPTIME_DIR=/some/path qu serve`.
Override the socket path with `QUPTIME_SOCKET=/run/foo.sock`.
## Environment variables
| Variable | Purpose |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `QUPTIME_DIR` | Data directory. Defaults to `/etc/quptime` (root) or `$XDG_CONFIG_HOME/quptime`. |
| `QUPTIME_SOCKET` | Path to the CLI ↔ daemon unix socket. Defaults to `/var/run/quptime/quptime.sock` (root) or `$XDG_RUNTIME_DIR/quptime/…`. |
| `XDG_CONFIG_HOME` | Honored when running as non-root and `QUPTIME_DIR` is unset. |
| `XDG_RUNTIME_DIR` | Honored when running as non-root and `QUPTIME_SOCKET` is unset. |
The daemon does not read any other environment variables. SMTP, Discord,
and HTTP probe targets are configured exclusively in `cluster.yaml`.
## `node.yaml` — local identity
Never replicated. One file per host. Generated by `qu init`.
```yaml
node_id: 7f3a5b9e-... # UUIDv4, immutable after init
bind_addr: 0.0.0.0 # listen address for :9901
bind_port: 9901 # listen port
advertise: alpha.example.com:9901 # how peers reach us; may differ from bind
cluster_secret: 4hZqK8vT9... # base64; required to Join, never replicated
```
### Field reference
- `node_id` — UUIDv4 generated at `qu init`. Used by every peer to
refer to this node across IP changes and restarts. Do not edit.
- `bind_addr` — Address the daemon listens on. `0.0.0.0` is the
default. Set to `127.0.0.1` if you only want to expose the daemon
through an overlay (Tailscale, WireGuard) — see
[deployment/tailscale.md](deployment/tailscale.md).
- `bind_port` — Defaults to `9901`. Change here if 9901 is taken; the
cluster does not require port-uniformity, peers just need to know
what to dial via the `advertise` field.
- `advertise` — Host:port other nodes use to reach this one. Must be
routable from every peer. Falls back to `bind_addr:bind_port` if
unset, which is rarely what you want behind NAT.
- `cluster_secret` — Pre-shared base64 string. Required on every
`Join` RPC; constant-time comparison on the receiver. Generate on
the first node, distribute out-of-band, keep out of version
control.
### How `qu init` populates this file
```sh
qu init \
--advertise alpha.example.com:9901 \
--bind 0.0.0.0 \
--port 9901 \
--secret '<paste from first node, or omit on the first node>'
```
Idempotent in one direction only: if `node.yaml` exists, `qu init`
refuses to overwrite. To re-init, delete the data directory entirely.
## `cluster.yaml` — replicated state
This is the file that every node converges on. The master is the only
one allowed to bump `version`; followers `Replace` it whole each time
they receive a higher-versioned snapshot.
```yaml
version: 12
updated_at: 2026-05-15T14:01:00Z
updated_by: 7f3a5b9e-...
peers:
- node_id: 7f3a5b9e-...
advertise: alpha.example.com:9901
fingerprint: SHA256:abcd...
cert_pem: |
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
checks:
- id: 0006a1...
name: homepage
type: http
target: https://example.com
interval: 30s
timeout: 10s
expect_status: 200
alert_ids: [oncall]
suppress_alert_ids: []
alerts:
- id: f001ab...
name: oncall
type: discord
default: true
discord_webhook: https://discord.com/api/webhooks/...
body_template: |
:rotating_light: {{.Check.Name}} is {{.Verb}}
```
### Top-level fields
| Field | Owner | Notes |
| ------------ | -------- | ---------------------------------------------------------------------------------- |
| `version` | master | Monotonic. Followers reject snapshots whose version is ≤ their local. |
| `updated_at` | master | UTC RFC3339. Cosmetic — humans use it, no logic depends on it. |
| `updated_by` | master | NodeID of the committing master. |
| `peers` | editable | Cluster members. Edits go through `add_peer` / `remove_peer` mutations. |
| `checks` | editable | Monitored targets. |
| `alerts` | editable | Notifier destinations. |
### `peers[]`
```yaml
- node_id: 7f3a5b9e-... # immutable, the peer's own UUID
advertise: host:port # how anyone dials this peer
fingerprint: SHA256:... # SPKI fingerprint of the peer's cert
cert_pem: | # full PEM so other peers can mTLS without a separate invite
-----BEGIN CERTIFICATE-----
...
```
The `cert_pem` field is what enables N-node clusters without N×(N-1)
manual invites: when peer X is added via the master, every other node
that receives the new `cluster.yaml` learns X's cert at the same time
and adds it to the local trust store. See
`internal/daemon/daemon.go:syncTrustFromCluster`.
### `checks[]`
```yaml
- id: 0006a1... # UUIDv4, generated when the check is created
name: homepage # human-friendly, must be unique within cluster
type: http # http | tcp | icmp
target: https://example.com
interval: 30s # Go duration syntax: 5s, 1m30s, 2h
timeout: 10s # default 10s
expect_status: 200 # http only; 0 = accept anything < 400
body_match: "OK" # http only; substring match on response body
alert_ids: [oncall] # alerts attached explicitly
suppress_alert_ids: [] # opt out of specific default alerts
```
Defaults:
- `interval`: 30s
- `timeout`: 10s
- `expect_status`: 0 → any 2xx is OK; otherwise the configured status
must match exactly.
ICMP checks default to **unprivileged UDP-mode pings** so the daemon
does not need root. For raw ICMP, grant the capability — see
[deployment/systemd.md](deployment/systemd.md).
### `alerts[]`
Two notifier kinds, distinguished by `type`:
```yaml
# Discord
- id: f001ab...
name: oncall
type: discord
default: true # attach to every check automatically
discord_webhook: https://...
body_template: | # optional Go text/template override
{{.Check.Name}} is {{.Verb}}
# SMTP
- id: f002cd...
name: ops
type: smtp
smtp_host: smtp.example.com
smtp_port: 587
smtp_user: mailbot
smtp_password: '...'
smtp_from: monitor@example.com
smtp_to: [ops@example.com]
smtp_starttls: true
subject_template: '[{{.Verb}}] {{.Check.Name}}'
body_template: |
Check {{.Check.Name}} ({{.Check.Target}}) is now {{.Verb}}.
```
If `default: true`, the alert fires for every check unless the check
lists the alert's ID or name in `suppress_alert_ids`. Otherwise the
alert only fires for checks that name it in `alert_ids`.
Templates are Go `text/template`. The full variable list is in the
top-level README under "Custom alert messages" — `qu alert add smtp
--help` and `qu alert add discord --help` print the same table.
### Suppression precedence
For each check, the dispatcher computes the effective alert list as:
```
( explicit alert_ids alerts with default=true ) \ suppress_alert_ids
```
de-duplicated by alert ID. So a check can both opt in to specific
alerts and opt out of specific defaults.
## `trust.yaml` — local trust store
A flat list of fingerprints this node accepts. One entry per peer,
populated by `qu node add` (or pulled in automatically when a peer's
cert arrives via the replicated `cluster.yaml`).
```yaml
entries:
- node_id: 7f3a5b9e-...
address: alpha.example.com:9901
fingerprint: SHA256:...
cert_pem: |
-----BEGIN CERTIFICATE-----
...
```
Never edit this by hand. Use `qu trust list` and `qu trust remove`.
## Key material
`keys/private.pem` is the only secret on disk besides
`node.yaml.cluster_secret`. It's chmod 0600 by default; preserve that.
The public cert at `keys/cert.pem` is what gets fingerprinted and
shipped in `cluster.yaml.peers[].cert_pem`.
There is **no automatic key rotation**. Rolling a node's identity
means wiping its data directory, running `qu init` again, and
re-adding it from another node as a fresh peer.
## Tunables that don't live in YAML
A few values are compiled constants. Change them in source and rebuild
if you need different behaviour.
| Constant | Default | What it does |
| ----------------------------------------------------- | ------- | ------------------------------------------------------------- |
| `quorum.DefaultHeartbeatInterval` | `1s` | How often each node heartbeats every peer. |
| `quorum.DefaultDeadAfter` | `4s` | A peer is dead if no heartbeat is seen within this window. |
| `checks.HysteresisCount` | `2` | Consecutive aggregate evaluations needed before a state flip. |
| `checks.ReconcileInterval` | `5s` | How often the scheduler reconciles its workers vs `checks[]`. |
| `daemon.manualEditPollInterval` (`internal/daemon/watcher.go`) | `2s` | How often the daemon hashes `cluster.yaml` for hand edits. |