2026-05-12 06:57:08 +00:00
2026-05-12 06:53:15 +00:00
2026-05-12 06:07:16 +00:00

qu — quorum-based uptime monitor

qu is a small Linux daemon that watches HTTP, TCP, and ICMP endpoints from several cooperating nodes. The nodes form a quorum cluster; one is elected master and owns alert dispatch. A check is only reported as DOWN when the majority of nodes agree, which keeps a single node's flaky uplink from paging anyone at 3am.

A single static binary contains the daemon, the CLI, and everything in between. Inter-node traffic is mutual TLS with SSH-style fingerprint trust — no central CA, no shared secret.

Why

Most uptime monitors are either a SaaS or a single box that, by definition, can't tell you when it's the one that's down. qu solves both: run it on a few cheap hosts in different networks and they vote on truth. If one of them loses its uplink, the rest keep alerting.

Architecture

   +-------------- node A ---------------+
   | qu serve                            |
   | ├─ transport server (mTLS :9001)    |
   | ├─ quorum manager  (heartbeats)     |
   | ├─ replicator      (cluster.yaml)   |
   | ├─ scheduler       (HTTP/TCP/ICMP)  |  <─── probes targets
   | ├─ aggregator      (master-only)    |
   | ├─ alerts          (master-only)    |
   | └─ control socket  (unix, for CLI)  |
   +-------------------------------------+
              │  ▲   mTLS, pinned by fingerprint
              ▼  │
        node B   node C   …

Every node runs every probe. Results are shipped to the elected master, which folds them into a per-check sliding window. A state flips (UP↔DOWN) only after two consecutive aggregate evaluations agree — that's the hysteresis that absorbs network blips.

Master election is deterministic: among the live members of the quorum, the node with the lexicographically smallest NodeID wins. No negotiation, no split-brain window.

Build

Requires Go 1.23 or newer.

go build -o qu ./cmd/qu

To stamp the version into the binary:

go build -ldflags "-X main.version=v0.1.0" -o qu ./cmd/qu
qu --version

Releases

Pushing a tag matching v* triggers .gitea/workflows/release.yaml, which runs the test suite, cross-compiles static Linux binaries for amd64 and arm64, and publishes them as a Gitea release with a SHA256SUMS file alongside.

git tag v0.1.0
git push --tags

Set up a 3-node cluster

On each host:

# 1. Generate identity + RSA-3072 keypair + self-signed cert.
qu init --advertise <this-host's reachable address>:9001

# 2. Start the daemon (foreground; wire it into systemd for prod).
qu serve

Pick one node and tell it about the other two. The CLI prints the remote fingerprint and asks for confirmation, SSH-style:

qu node add  bravo.example.com:9001
qu node add  charlie.example.com:9001

That's it — the master broadcasts the new cluster config to every trusting peer. qu status from any node should now show all three:

node       a7f3...
term       2
master     a7f3...
quorum     true (need 2)
config ver 4

PEERS
NODE_ID  ADVERTISE                 LIVE  LAST_SEEN
a7f3...  alpha.example.com:9001    true  2026-05-12T15:01:32Z
b21c...  bravo.example.com:9001    true  2026-05-12T15:01:32Z
c0d4...  charlie.example.com:9001  true  2026-05-12T15:01:32Z

Adding checks and alerts

# alerts first so checks can reference them
qu alert add discord oncall --webhook https://discord.com/api/webhooks/...
qu alert add smtp   ops    --host smtp.example.com --port 587 \
                            --from monitor@example.com --to ops@example.com \
                            --user mailbot --password '****' --starttls=true

# checks
qu check add http  homepage https://example.com  --expect 200  --alerts oncall,ops
qu check add tcp   db       db.internal:5432     --interval 15s
qu check add icmp  gateway  10.0.0.1             --interval 5s

Mutations always route to the master, which bumps a monotonic version and pushes the new cluster.yaml to every peer. If quorum is lost, mutating commands fail loudly.

Test an alert without waiting for a real outage

qu alert test oncall

File layout

A node's state lives under $QUPTIME_DIR (defaults to /etc/quptime when root, ~/.config/quptime otherwise):

node.yaml      identity (NodeID, bind addr, port). Never replicated.
cluster.yaml   replicated state: peers, checks, alerts, version.
trust.yaml     local fingerprint trust store.
keys/          RSA private + public + self-signed cert.

The CLI talks to the local daemon over a unix socket at $QUPTIME_SOCKET (defaults to /var/run/quptime/quptime.sock when root, $XDG_RUNTIME_DIR/quptime/quptime.sock otherwise) — filesystem permissions guard it; no TLS on the local socket.

ICMP and capabilities

ICMP checks default to unprivileged UDP-mode pings so the daemon does not need root or CAP_NET_RAW. If you want classic raw ICMP, either run the daemon as root or grant the capability:

sudo setcap cap_net_raw=+ep ./qu

CLI reference

qu init                                       generate identity + keys
qu serve                                      run the daemon
qu status                                     quorum, master, check states
qu node add    <host:port>                    TOFU-add a peer
qu node list                                  show peers + liveness
qu node remove <node-id>                      remove from cluster + trust
qu check add http  <name> <url>  [--expect 200] [--interval 30s] [--body-match str] [--alerts a,b]
qu check add tcp   <name> <host:port>
qu check add icmp  <name> <host>
qu check list
qu check remove <id-or-name>
qu alert add smtp    <name> --host … --port … --from … --to … [--user --password --starttls]
qu alert add discord <name> --webhook …
qu alert list / remove / test <id-or-name>
qu trust list / remove <node-id>

All --interval and --timeout flags accept Go duration syntax: 5s, 1m30s, 2h, etc.

Tests

go test ./...
go test -race ./...

Each internal package has unit tests; coverage hovers around 6090 % on the meaningful packages. The transport tests bring up real mTLS listeners over loopback, which exercises the cert pinning end-to-end.

What's intentionally not here (v1)

  • No web UI. The CLI is the only operator surface.
  • No historical metrics or SLA reports — only the current aggregate state is kept in memory. Add SQLite later if you need graphs.
  • No automatic key rotation. Re-init a node and re-trust if you need to roll its identity.
  • No multi-tenant isolation. One cluster = one set of checks.

Layout

cmd/qu/                    entry point
internal/config/           on-disk file layout, ClusterConfig, NodeConfig
internal/crypto/           RSA keypair + self-signed cert + SPKI fingerprints
internal/trust/            fingerprint trust store
internal/transport/        mTLS listener/dialer, framed JSON-RPC
internal/quorum/           heartbeats + deterministic master election
internal/replicate/        master-routed mutations, version-gated replication
internal/checks/           HTTP/TCP/ICMP probers, scheduler, aggregator
internal/alerts/           SMTP + Discord dispatchers, message rendering
internal/daemon/           glue: wires every component + control socket
internal/cli/              cobra commands, the user-facing surface
S
Description
QUptime is a Quorum-Based uptime monitor, designed to be run in a loosely trusted network to check whether a singular host is offline, optimal in a multi node setup where any one node or any one location is able to go down, and you would still like centralized logging of the issue.
https://github.com/Axodouble/QUptime Readme MIT 649 KiB
v0.1.1 Latest
2026-05-15 08:05:55 +00:00
Languages
Go 95.3%
Shell 4.3%
Dockerfile 0.4%