qu — quorum-based uptime monitor
qu is a small Linux daemon that watches HTTP, TCP, and ICMP endpoints
from several cooperating nodes. The nodes form a quorum cluster; one is
elected master and owns alert dispatch. A check is only reported as
DOWN when the majority of nodes agree, which keeps a single node's
flaky uplink from paging anyone at 3am.
A single static binary contains the daemon, the CLI, and everything in between. Inter-node traffic is mutual TLS with SSH-style fingerprint trust — no central CA, no shared secret.
Installation
From pre-built binary
The canonical home is Gitea; the repo is push-mirrored to GitHub on every tag. Releases and multi-arch container images are published to both.
| Source | Releases | Container image |
|---|---|---|
| Gitea (primary) | https://git.cer.sh/axodouble/quptime/releases | git.cer.sh/axodouble/quptime |
| GitHub (mirror) | https://github.com/Axodouble/QUptime/releases | ghcr.io/axodouble/quptime |
One-step install — tries Gitea first, falls back to GitHub automatically:
curl -fsSL https://git.cer.sh/Axodouble/QUptime/raw/branch/master/install.sh | sudo bash
# or, via the GitHub mirror:
# curl -fsSL https://raw.githubusercontent.com/Axodouble/QUptime/master/install.sh | sudo bash
The script verifies the binary against the published SHA256SUMS
before installing and refuses to proceed on a mismatch.
From Docker
docker pull git.cer.sh/axodouble/quptime:latest
# or, via the GitHub mirror:
# docker pull ghcr.io/axodouble/quptime:latest
See docs/deployment/docker.md for compose recipes.
Why
Most uptime monitors are either a SaaS or a single box that, by
definition, can't tell you when it's the one that's down. qu solves
both: run it on a few cheap hosts in different networks and they vote
on truth. If one of them loses its uplink, the rest keep alerting.
Documentation
This README is the quick-start. For production use, the longer guides
live under docs/:
| If you want to… | Read |
|---|---|
| understand the consensus / replication model | docs/architecture.md |
reference every field in node.yaml / cluster.yaml |
docs/configuration.md |
| deploy on Linux with systemd hardening | docs/deployment/systemd.md |
| deploy with Docker / docker-compose | docs/deployment/docker.md |
| deploy over Tailscale or WireGuard | docs/deployment/tailscale.md |
expose qu on the open internet safely |
docs/deployment/public-internet.md |
| upgrade, back up, or recover from failures | docs/operations.md |
| understand the trust model and rotate identities | docs/security.md |
| diagnose a misbehaving cluster | docs/troubleshooting.md |
Architecture
+-------------- node A ---------------+
| qu serve |
| ├─ transport server (mTLS :9901) |
| ├─ quorum manager (heartbeats) |
| ├─ replicator (cluster.yaml) |
| ├─ scheduler (HTTP/TCP/ICMP) | <─── probes targets
| ├─ aggregator (master-only) |
| ├─ alerts (master-only) |
| └─ control socket (unix, for CLI) |
+-------------------------------------+
│ ▲ mTLS, pinned by fingerprint
▼ │
node B node C …
Every node runs every probe. Results are shipped to the elected master, which folds them into a per-check sliding window. A state flips (UP↔DOWN) only after two consecutive aggregate evaluations agree — that's the hysteresis that absorbs network blips.
Master election is deterministic: among the live members of the quorum, the node with the lexicographically smallest NodeID wins. No negotiation, no split-brain window.
cluster.yaml is the single replicated source of truth (peers, checks,
alerts). Mutations from the CLI route through the master, which bumps a
monotonic version and broadcasts the result. The same file is also
watched on disk, so an operator can sudoedit cluster.yaml on any node
and the daemon will replicate the edit cluster-wide.
Build
Requires Go 1.24.2 or newer.
go build -o qu ./cmd/qu
To stamp the version into the binary:
go build -ldflags "-X main.version=v0.0.1" -o qu ./cmd/qu
qu --version
Releases
Pushing a tag matching v* triggers .gitea/workflows/release.yaml,
which runs the test suite, cross-compiles static Linux binaries for
amd64 and arm64, and publishes them as a Gitea release with a
SHA256SUMS file alongside.
git tag v0.0.1
git push --tags
Set up a 3-node cluster
On the first host:
qu init --advertise alpha.example.com:9901
That prints a random cluster secret. Copy it.
On every other host, pass that secret via --secret:
qu init --advertise bravo.example.com:9901 --secret <paste>
qu init --advertise charlie.example.com:9901 --secret <paste>
Without the matching secret a node cannot join, so random hosts that can reach :9901 are safely ignored.
Start the daemon on every host (foreground; wire into systemd for prod):
qu serve
Then on one node — usually alpha — invite the others. The CLI prints
each remote's fingerprint and asks for confirmation SSH-style:
qu node add bravo.example.com:9901
qu node add charlie.example.com:9901
After the first invite, give it a few seconds for heartbeats to bring the new peer into the live set before inviting the next one — otherwise the local node's "needs ≥2 live to mutate" check will reject the second add.
You only need to invite from one node. Peer certs ride along with the
replicated cluster.yaml, so every peer auto-trusts every other peer
without N×(N-1) invites.
That's it — the master broadcasts the new cluster config to every
trusting peer. qu status from any node should now show all three:
node a7f3...
term 2
master a7f3...
quorum true (need 2)
config ver 4
PEERS
NODE_ID ADVERTISE LIVE LAST_SEEN
a7f3... alpha.example.com:9901 true 2026-05-12T15:01:32Z
b21c... bravo.example.com:9901 true 2026-05-12T15:01:32Z
c0d4... charlie.example.com:9901 true 2026-05-12T15:01:32Z
Adding checks and alerts
⚠️ Alert credentials are replicated cluster-wide. SMTP passwords and Discord webhook URLs live in
cluster.yaml, which is mirrored to every node. Any node that can read its own data directory can read every alert secret. Treat compromising one node as compromising every alert credential, and restrict who can reach$QUPTIME_DIRon each host (the hardened systemd unit and the Docker image both default to0700/0750). See docs/security.md for the full threat model.
# alerts first so checks can reference them
qu alert add discord oncall --webhook https://discord.com/api/webhooks/...
qu alert add smtp ops --host smtp.example.com --port 587 \
--from monitor@example.com --to ops@example.com \
--user mailbot --password '****' --starttls=true
# checks
qu check add http homepage https://example.com --expect 200 --alerts oncall,ops
qu check add tcp db db.internal:5432 --interval 15s
qu check add icmp gateway 10.0.0.1 --interval 5s
Mutations always route to the master, which bumps a monotonic version
and pushes the new cluster.yaml to every peer. If quorum is lost,
mutating commands fail loudly.
qu status shows the effective alert list for each check. Default
alerts are suffixed with * so you can tell at a glance which alerts
were attached automatically vs explicitly listed on the check:
CHECKS
ID NAME STATE OK/TOTAL ALERTS DETAIL
ddbd... homepage up 3/3 oncall,ops*
0006... db down 1/3 ops* dial timeout
24f4... gateway up 3/3 -
(alerts marked * are attached as defaults)
Default alerts (attach to every check)
Rather than listing the same --alerts on every check add, mark an
alert as default and it fires for every check automatically:
# at creation
qu alert add discord oncall --webhook https://... --default
# or toggle later
qu alert default oncall on
qu alert default oncall off
qu alert list shows a DEFAULT column. A check can opt out of a
specific default by adding the alert's ID or name to its
suppress_alert_ids list in cluster.yaml (see "Edit cluster.yaml
directly" below).
Interactive TUI
Prefer a dashboard over typing commands? qu tui opens a full-screen
bubbletea UI over the
local daemon socket. The header shows quorum, master, term, and config
version; three tabs hold peers, checks, and alerts with auto-refresh
every two seconds.
┌─ QUptime ── node: 88a00af9 master: 3438fd6f (follower) ● quorum 3/2 term 4 ver 10 ──┐
│ Peers (3) [2] Checks (3) [3] Alerts (1) │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ ID NAME STATE OK/TOTAL ALERTS DETAIL │
│ ddbd... homepage ● up 3/3 oncall* │
│ 0006... db ● down 1/3 oncall* dial timeout │
│ 24f4... gateway ○ unknown 0/0 - │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
↑↓ navigate ⇥ next tab 1/2/3 jump r refresh a add check d remove check q quit
Keybindings:
| Key | Action |
|---|---|
↑ / ↓ |
move cursor within a tab |
Tab / Shift+Tab |
next / previous tab |
1 / 2 / 3 |
jump to Peers / Checks / Alerts |
r |
force-refresh |
a |
add (opens a picker on Checks/Alerts; node form on Peers) |
d |
remove the selected row (confirmation prompt) |
t |
send a test message to the selected alert |
D |
toggle the selected alert's default flag |
q / Ctrl+C |
quit |
Forms run the same control-plane methods the CLI does, so any side
effect (a mutation, a node add, an alert test) ends up routed through
the master exactly like qu … from the shell.
Custom alert messages
Each alert can carry its own subject_template and body_template
(Go text/template syntax). When set, they override the built-in
formatting for that one alert; the default renderer is used otherwise.
Discord ignores the subject template (it has no subject line).
qu alert add discord oncall --webhook https://... \
--body ':rotating_light: **{{.Check.Name}}** is now {{.Verb}}
target: `{{.Check.Target}}`
detail: {{.Snapshot.Detail}}'
# multi-line templates are easier from a file
qu alert add smtp ops --host ... --from ... --to ... \
--subject-file /etc/quptime/templates/ops.subject \
--body-file /etc/quptime/templates/ops.body
Available template variables:
| Variable | Meaning |
|---|---|
{{.Check.Name}} |
check name |
{{.Check.Type}} |
http / tcp / icmp |
{{.Check.Target}} |
URL or host:port being probed |
{{.Check.ID}} |
UUID |
{{.From}} |
previous state (up / down / unknown) |
{{.To}} |
new state |
{{.Verb}} |
UP / DOWN / RECOVERED |
{{.VerbLower}} |
lowercase form (up / down / recovered) |
{{.Snapshot.Reports}} |
total per-node reports counted |
{{.Snapshot.OKCount}} |
how many reported OK |
{{.Snapshot.NotOK}} |
how many reported failure |
{{.Snapshot.Detail}} |
first failure detail string |
{{.NodeID}} |
master that dispatched |
{{.When}} |
RFC3339 timestamp |
The same variable list is surfaced in-app: qu alert add smtp --help,
qu alert add discord --help, and qu alert edit --help each print
it under their flag table, and qu tui shows a compact reminder of
the supported variables as a hint when the cursor lands on a Subject
or Body template field in the add/edit alert forms.
qu alert test <name> exercises the template against a synthetic
"homepage going DOWN" transition, so you can verify rendering before
production traffic depends on it. A template parse or execution error
falls back to the built-in format and is logged.
Conditionals, pipelines, and worked examples
Templates use Go's text/template syntax, so you have if/else if/
else/end, comparison helpers (eq, ne, lt, gt), printf
pipelines, and with blocks. The default rendering — the one used
when no custom template is set — lives in internal/alerts/message.go
inside the Render function; tweak it there if you want to change
what every alert without an override produces.
A few progressively richer examples:
1. State-specific Discord copy — different tone for DOWN,
RECOVERED, and first-time UP:
body_template: |
{{if eq .Verb "DOWN"}}:rotating_light: **{{.Check.Name}}** is DOWN
We're investigating. Last detail: `{{.Snapshot.Detail}}`
{{else if eq .Verb "RECOVERED"}}:white_check_mark: **{{.Check.Name}}** is back UP after a {{.From}} blip.
{{else}}:information_source: **{{.Check.Name}}** is online ({{.VerbLower}}).{{end}}
2. SMTP subject with severity prefix and run-length detail —
pipes Verb through printf for padding and only mentions the
report count when it actually matters:
subject_template: '[{{printf "%-9s" .Verb}}] {{.Check.Name}} — {{.Check.Target}}'
body_template: |
Check: {{.Check.Name}} ({{.Check.Type}})
Target: {{.Check.Target}}
Status: {{.Verb}} (was {{.From}})
Reporter: {{.NodeID}}
At: {{.When}}
{{if gt .Snapshot.Reports 1}}
Quorum: {{.Snapshot.OKCount}} ok / {{.Snapshot.NotOK}} failing across {{.Snapshot.Reports}} reports.
{{end}}{{with .Snapshot.Detail}}
Detail: {{.}}
{{end}}
3. PagerDuty-style severity routing — nest if/else if so a
single template can produce three different first lines without
duplicating the rest of the body:
subject_template: >-
{{if eq .Verb "DOWN"}}P1: {{.Check.Name}} hard down
{{else if eq .Verb "RECOVERED"}}P3: {{.Check.Name}} recovered
{{else}}P4: {{.Check.Name}} {{.VerbLower}}{{end}}
body_template: |
{{/* Header line — uses .VerbLower so the prose reads naturally */}}
{{.Check.Name}} ({{.Check.Target}}) is now {{.VerbLower}}.
{{if eq .Verb "DOWN"-}}
This is a real outage. Quorum: {{.Snapshot.NotOK}}/{{.Snapshot.Reports}} reporters see it failing.
Detail from the first failing probe: {{.Snapshot.Detail}}
Acknowledge in the runbook before paging on-call.
{{- else if eq .Verb "RECOVERED" -}}
Recovered after a {{.From}} period. No action needed; this is informational.
{{- else -}}
First successful probe after {{.From}}. Marking healthy.
{{- end}}
— {{.NodeID}} at {{.When}}
The {{- / -}} trim adjacent whitespace, which keeps the rendered
output tidy even when the template itself is indented for readability.
If a template fails to parse or panics at execute time, the
dispatcher falls back to the default Render output for that field
and logs the error — your alert still ships, you just lose the
custom formatting until you fix the template.
Edit cluster.yaml directly
Anything you can do through the CLI you can also do by editing
$QUPTIME_DIR/cluster.yaml on any node. The daemon polls the file every
few seconds; when it sees a hash that differs from what it last wrote,
it parses the YAML and forwards the change through the master, which
bumps the version and broadcasts the result everywhere — so a hand-edit
on bravo propagates to alpha and charlie automatically.
sudoedit /etc/quptime/cluster.yaml
# add `default: true` to an alert, or `suppress_alert_ids: [oncall]`
# on a check, then save and quit
You'll see a manual-edit: cluster.yaml changed externally — replicating via master line in the daemon log when it picks the change
up. Invalid YAML is logged and ignored until you save a valid file.
The replicated fields are peers, checks, and alerts. version,
updated_at, and updated_by are server-controlled — the master
overwrites them on commit.
Test an alert without waiting for a real outage
qu alert test oncall
File layout
A node's state lives under $QUPTIME_DIR (defaults to /etc/quptime
when root, ~/.config/quptime otherwise):
node.yaml identity (NodeID, bind addr, port). Never replicated.
cluster.yaml replicated state: peers, checks, alerts, version.
trust.yaml local fingerprint trust store.
keys/ RSA private + public + self-signed cert.
The CLI talks to the local daemon over a unix socket at
$QUPTIME_SOCKET (defaults to /var/run/quptime/quptime.sock when
root, $XDG_RUNTIME_DIR/quptime/quptime.sock otherwise) — filesystem
permissions guard it; no TLS on the local socket.
ICMP and capabilities
ICMP checks default to unprivileged UDP-mode pings so the daemon does
not need root or CAP_NET_RAW. If you want classic raw ICMP, either
run the daemon as root or grant the capability:
sudo setcap cap_net_raw=+ep ./qu
CLI reference
qu init generate identity + keys
qu serve run the daemon
qu status quorum, master, check states
qu tui interactive dashboard
qu node add <host:port> TOFU-add a peer
qu node list show peers + liveness
qu node remove <node-id> remove from cluster + trust
qu check add http <name> <url> [--expect 200] [--interval 30s] [--body-match str] [--alerts a,b]
qu check add tcp <name> <host:port>
qu check add icmp <name> <host>
qu check list
qu check remove <id-or-name>
qu alert add smtp <name> --host … --port … --from … --to … [--user --password --starttls] [--default] [--subject … --body …]
qu alert add discord <name> --webhook … [--default] [--body …]
qu alert list / remove / test <id-or-name>
qu alert default <id-or-name> on|off toggle default attachment to every check
qu trust list / remove <node-id>
All --interval and --timeout flags accept Go duration syntax: 5s,
1m30s, 2h, etc.
Tests
go test ./...
go test -race ./...
Each internal package has unit tests; coverage hovers around 60–90 % on the meaningful packages. The transport tests bring up real mTLS listeners over loopback, which exercises the cert pinning end-to-end.
What's intentionally not here (v1)
- No web UI. The CLI is the only operator surface.
- No historical metrics or SLA reports — only the current aggregate state is kept in memory. Add SQLite later if you need graphs.
- No automatic key rotation. Re-init a node and re-trust if you need to roll its identity.
- No multi-tenant isolation. One cluster = one set of checks.
Layout
cmd/qu/ entry point
internal/config/ on-disk file layout, ClusterConfig, NodeConfig
internal/crypto/ RSA keypair + self-signed cert + SPKI fingerprints
internal/trust/ fingerprint trust store
internal/transport/ mTLS listener/dialer, framed JSON-RPC
internal/quorum/ heartbeats + deterministic master election
internal/replicate/ master-routed mutations, version-gated replication
internal/checks/ HTTP/TCP/ICMP probers, scheduler, aggregator
internal/alerts/ SMTP + Discord dispatchers, message rendering
internal/daemon/ glue: wires every component + control socket
internal/cli/ cobra commands, the user-facing surface
internal/tui/ bubbletea dashboard (qu tui)