Files
QUptime/docs/configuration.md
T
Axodouble 6953709574
Container image / image (push) Successful in 1m37s
AI assisted documentation
2026-05-15 04:05:30 +00:00

10 KiB
Raw Blame History

Configuration

This page is the canonical reference for the on-disk files, the environment variables, and every field that qu reads. It's deliberately tedious — when something doesn't behave the way you expect, this is where the answer lives.

File layout

When running as root (the typical case under systemd):

/etc/quptime/
├── node.yaml          identity, never replicated
├── cluster.yaml       replicated state
├── trust.yaml         local fingerprint trust store
└── keys/
    ├── private.pem    RSA private key (0600)
    ├── public.pem     RSA public key
    └── cert.pem       self-signed X.509 cert

/var/run/quptime/quptime.sock   control socket (0600)

When running as a non-root user (the typical case for go run or a desktop test):

~/.config/quptime/...                       same shape as /etc/quptime
$XDG_RUNTIME_DIR/quptime/quptime.sock       control socket

Override the data directory with QUPTIME_DIR=/some/path qu serve. Override the socket path with QUPTIME_SOCKET=/run/foo.sock.

Environment variables

Variable Purpose
QUPTIME_DIR Data directory. Defaults to /etc/quptime (root) or $XDG_CONFIG_HOME/quptime.
QUPTIME_SOCKET Path to the CLI ↔ daemon unix socket. Defaults to /var/run/quptime/quptime.sock (root) or $XDG_RUNTIME_DIR/quptime/….
XDG_CONFIG_HOME Honored when running as non-root and QUPTIME_DIR is unset.
XDG_RUNTIME_DIR Honored when running as non-root and QUPTIME_SOCKET is unset.

The daemon does not read any other environment variables. SMTP, Discord, and HTTP probe targets are configured exclusively in cluster.yaml.

node.yaml — local identity

Never replicated. One file per host. Generated by qu init.

node_id: 7f3a5b9e-...        # UUIDv4, immutable after init
bind_addr: 0.0.0.0           # listen address for :9901
bind_port: 9901              # listen port
advertise: alpha.example.com:9901   # how peers reach us; may differ from bind
cluster_secret: 4hZqK8vT9... # base64; required to Join, never replicated

Field reference

  • node_id — UUIDv4 generated at qu init. Used by every peer to refer to this node across IP changes and restarts. Do not edit.
  • bind_addr — Address the daemon listens on. 0.0.0.0 is the default. Set to 127.0.0.1 if you only want to expose the daemon through an overlay (Tailscale, WireGuard) — see deployment/tailscale.md.
  • bind_port — Defaults to 9901. Change here if 9901 is taken; the cluster does not require port-uniformity, peers just need to know what to dial via the advertise field.
  • advertise — Host:port other nodes use to reach this one. Must be routable from every peer. Falls back to bind_addr:bind_port if unset, which is rarely what you want behind NAT.
  • cluster_secret — Pre-shared base64 string. Required on every Join RPC; constant-time comparison on the receiver. Generate on the first node, distribute out-of-band, keep out of version control.

How qu init populates this file

qu init \
  --advertise alpha.example.com:9901 \
  --bind 0.0.0.0 \
  --port 9901 \
  --secret '<paste from first node, or omit on the first node>'

Idempotent in one direction only: if node.yaml exists, qu init refuses to overwrite. To re-init, delete the data directory entirely.

cluster.yaml — replicated state

This is the file that every node converges on. The master is the only one allowed to bump version; followers Replace it whole each time they receive a higher-versioned snapshot.

version: 12
updated_at: 2026-05-15T14:01:00Z
updated_by: 7f3a5b9e-...
peers:
  - node_id: 7f3a5b9e-...
    advertise: alpha.example.com:9901
    fingerprint: SHA256:abcd...
    cert_pem: |
      -----BEGIN CERTIFICATE-----
      ...
      -----END CERTIFICATE-----
checks:
  - id: 0006a1...
    name: homepage
    type: http
    target: https://example.com
    interval: 30s
    timeout: 10s
    expect_status: 200
    alert_ids: [oncall]
    suppress_alert_ids: []
alerts:
  - id: f001ab...
    name: oncall
    type: discord
    default: true
    discord_webhook: https://discord.com/api/webhooks/...
    body_template: |
      :rotating_light: {{.Check.Name}} is {{.Verb}}

Top-level fields

Field Owner Notes
version master Monotonic. Followers reject snapshots whose version is ≤ their local.
updated_at master UTC RFC3339. Cosmetic — humans use it, no logic depends on it.
updated_by master NodeID of the committing master.
peers editable Cluster members. Edits go through add_peer / remove_peer mutations.
checks editable Monitored targets.
alerts editable Notifier destinations.

peers[]

- node_id: 7f3a5b9e-...        # immutable, the peer's own UUID
  advertise: host:port         # how anyone dials this peer
  fingerprint: SHA256:...      # SPKI fingerprint of the peer's cert
  cert_pem: |                  # full PEM so other peers can mTLS without a separate invite
    -----BEGIN CERTIFICATE-----
    ...

The cert_pem field is what enables N-node clusters without N×(N-1) manual invites: when peer X is added via the master, every other node that receives the new cluster.yaml learns X's cert at the same time and adds it to the local trust store. See internal/daemon/daemon.go:syncTrustFromCluster.

checks[]

- id: 0006a1...           # UUIDv4, generated when the check is created
  name: homepage          # human-friendly, must be unique within cluster
  type: http              # http | tcp | icmp
  target: https://example.com
  interval: 30s           # Go duration syntax: 5s, 1m30s, 2h
  timeout: 10s            # default 10s
  expect_status: 200      # http only; 0 = accept anything < 400
  body_match: "OK"        # http only; substring match on response body
  alert_ids: [oncall]     # alerts attached explicitly
  suppress_alert_ids: []  # opt out of specific default alerts

Defaults:

  • interval: 30s
  • timeout: 10s
  • expect_status: 0 → any 2xx is OK; otherwise the configured status must match exactly.

ICMP checks default to unprivileged UDP-mode pings so the daemon does not need root. For raw ICMP, grant the capability — see deployment/systemd.md.

alerts[]

Two notifier kinds, distinguished by type:

# Discord
- id: f001ab...
  name: oncall
  type: discord
  default: true              # attach to every check automatically
  discord_webhook: https://...
  body_template: |           # optional Go text/template override
    {{.Check.Name}} is {{.Verb}}

# SMTP
- id: f002cd...
  name: ops
  type: smtp
  smtp_host: smtp.example.com
  smtp_port: 587
  smtp_user: mailbot
  smtp_password: '...'
  smtp_from: monitor@example.com
  smtp_to: [ops@example.com]
  smtp_starttls: true
  subject_template: '[{{.Verb}}] {{.Check.Name}}'
  body_template: |
    Check {{.Check.Name}} ({{.Check.Target}}) is now {{.Verb}}.

If default: true, the alert fires for every check unless the check lists the alert's ID or name in suppress_alert_ids. Otherwise the alert only fires for checks that name it in alert_ids.

Templates are Go text/template. The full variable list is in the top-level README under "Custom alert messages" — qu alert add smtp --help and qu alert add discord --help print the same table.

Suppression precedence

For each check, the dispatcher computes the effective alert list as:

( explicit alert_ids  alerts with default=true ) \ suppress_alert_ids

de-duplicated by alert ID. So a check can both opt in to specific alerts and opt out of specific defaults.

trust.yaml — local trust store

A flat list of fingerprints this node accepts. One entry per peer, populated by qu node add (or pulled in automatically when a peer's cert arrives via the replicated cluster.yaml).

entries:
  - node_id: 7f3a5b9e-...
    address: alpha.example.com:9901
    fingerprint: SHA256:...
    cert_pem: |
      -----BEGIN CERTIFICATE-----
      ...

Never edit this by hand. Use qu trust list and qu trust remove.

Key material

keys/private.pem is the only secret on disk besides node.yaml.cluster_secret. It's chmod 0600 by default; preserve that. The public cert at keys/cert.pem is what gets fingerprinted and shipped in cluster.yaml.peers[].cert_pem.

There is no automatic key rotation. Rolling a node's identity means wiping its data directory, running qu init again, and re-adding it from another node as a fresh peer.

Tunables that don't live in YAML

A few values are compiled constants. Change them in source and rebuild if you need different behaviour.

Constant Default What it does
quorum.DefaultHeartbeatInterval 1s How often each node heartbeats every peer.
quorum.DefaultDeadAfter 4s A peer is dead if no heartbeat is seen within this window.
checks.HysteresisCount 2 Consecutive aggregate evaluations needed before a state flip.
checks.ReconcileInterval 5s How often the scheduler reconciles its workers vs checks[].
daemon.manualEditPollInterval (internal/daemon/watcher.go) 2s How often the daemon hashes cluster.yaml for hand edits.