10 KiB
Architecture
This page is the long-form companion to the diagram in the top-level
README. Read it if you need to reason about partitions, recovery,
upgrade ordering, or the consistency guarantees of qu.
Components
A running qu serve is one process containing five long-lived
goroutines plus the listeners:
| Component | Package | Role |
|---|---|---|
| Transport | internal/transport |
mTLS listener + dialer, length-prefixed JSON-RPC framing. |
| Quorum manager | internal/quorum |
1 Hz heartbeats, liveness tracking, deterministic master election. |
| Replicator | internal/replicate |
Master-routed mutations, version-gated broadcast and pull. |
| Scheduler | internal/checks |
One goroutine per check; runs HTTP/TCP/ICMP probes on each node. |
| Aggregator | internal/checks |
Master-only. Folds per-node probe results into a cluster-wide verdict. |
| Alert dispatch | internal/alerts |
Master-only. Renders templates and ships SMTP / Discord notifications. |
| Control socket | internal/daemon |
Local-only unix socket; the CLI and TUI talk to the daemon through it. |
Every node runs every component. Whether the master-only ones actually do anything depends on the result of master election.
Trust and transport
Inter-node traffic is TLS 1.3 with mutual authentication. There is no
central CA. Each node generates a self-signed RSA cert at qu init
and the SPKI fingerprint of that cert is what other nodes pin against.
Two layers gate access:
- TLS layer accepts any client cert. This avoids a chicken-and-egg during bootstrap — a brand-new node has no entry in anyone's trust store yet, so a strict TLS check would refuse the very first handshake.
- RPC dispatcher rejects every method except
Joinfor callers whose presented fingerprint is not intrust.yaml. So an untrusted peer can knock on the door but cannot ask questions.
Join itself is gated by the cluster secret — a pre-shared base64
string generated at qu init on the first node. Without it, an
attacker who can reach :9901 cannot enrol themselves into the
cluster.
The local CLI talks to the daemon over a unix socket with 0600
permissions; filesystem ACLs are the only authentication and no TLS is
used on that channel.
The replicated state machine
cluster.yaml is the single replicated source of truth. It holds three
editable lists — peers, checks, alerts — plus three
server-controlled fields:
version: 7 # monotonically increasing
updated_at: 2026-05-15T...
updated_by: <node-id> # master that committed this version
peers: [...]
checks: [...]
alerts: [...]
How mutations flow
- The CLI (or the manual-edit watcher; see below) issues a mutation on the local daemon's control socket.
- The daemon's replicator looks at the current quorum view:
- If there is no quorum, the mutation fails loudly with
no quorum: refusing mutation. - If this node is the master, apply locally and broadcast.
- Otherwise, ship the mutation to the master via the
ProposeMutationRPC and wait for the result.
- If there is no quorum, the mutation fails loudly with
- The master holds the cluster lock, applies the mutation, bumps
version, writescluster.yamlatomically, and broadcasts the new snapshot to every peer viaApplyClusterCfg. - Each follower's
Replaceaccepts the snapshot only ifincoming.Version > local.Version. Older or equal versions are dropped silently.
The mutation kinds are enumerated in internal/transport/messages.go:
add_check, remove_check, add_alert, remove_alert, add_peer,
remove_peer, replace_config.
Manual edits to cluster.yaml
Operators can sudoedit /etc/quptime/cluster.yaml on any node. Every
2 seconds the daemon hashes the file. When the on-disk hash diverges
from the last hash the daemon wrote, the new content is parsed and
forwarded to the master as a replace_config mutation. So a hand-edit
on a follower still ends up on the master, version-bumped, and
broadcast everywhere.
If the parse fails (invalid YAML), the daemon logs and pins the bad hash so it doesn't loop. The operator's next valid save unblocks it.
Quorum and master election
Every node sends a heartbeat to every peer once per second. A peer is live if a heartbeat (sent or received) was observed within the last 4 seconds — comfortably more than three missed beats so a one-tick blip does not unseat the master.
Quorum is met when len(live_peers) >= floor(N/2) + 1 where N
is the total peer count in cluster.yaml. Below quorum, the cluster
refuses every mutation; existing checks continue probing locally but no
state transitions are committed (the master is the only one who
aggregates, and there is no master).
Master election is deterministic with no negotiation step: among
the live members, the master is the one with the lexicographically
smallest NodeID. Every node that observes the same live set picks the
same master — so there is no split-brain window even during a partial
partition.
The term integer in qu status is bumped every time the elected
master changes (including transitions to and from "no master"). Use it
to spot flappy clusters.
Master cooldown
The bare "lowest-live-NodeID wins" rule has one unpleasant edge: if the
primary master is also being monitored by qu itself (a TCP check on
its own :9901, say), a brief restart causes a master flap and a
state flap in lock-step. The new master sees the old master come back
on the next tick and immediately hands the role back, taking the
just-recovering node from unknown to up with no quiet period.
To absorb that, the quorum manager applies a master cooldown
(DefaultMasterCooldown, 2 minutes) before a peer with a lower NodeID
may displace the incumbent. The rules:
- The cooldown timer starts on the first heartbeat after a dead-after gap — i.e. when a peer re-enters the live set after having aged out. Continuous heartbeats never restart it.
- A flap during the cooldown resets the timer; the returning peer must clear a full fresh window before taking over.
- The cooldown applies only when an incumbent master exists. Bootstrap and quorum-regained-from-empty elect the lowest-NodeID live peer immediately, because there is no role to protect.
- If the incumbent drops out of the live set, the cooldown is irrelevant — any live peer may take over without waiting.
The constant lives in internal/quorum/manager.go. Lower it for
faster fail-back at the cost of monitoring-self flap risk; raise it
to give a recovering master longer to settle before reclaiming the
role.
Catch-up when a node reconnects
This is the scenario most people ask about: node C is offline, the master commits config version 7, node C comes back online. What happens?
- Node C's tick loop fires heartbeats every second regardless of its previous state. There is no backoff, no give-up.
- Each heartbeat carries the sender's
Version. Each response carries the responder'sVersion. - The first time C sees a peer reporting a higher version than its
own, the version-observer fires and calls
replicator.PullFrom(peerID, addr). PullFromdoes aGetClusterCfgRPC against that peer and feeds the snapshot throughReplace, which writescluster.yamlatomically and refreshes the on-disk hash so the manual-edit watcher doesn't re-fire.- Within ~1 heartbeat C is byte-for-byte identical to the master.
The same path catches a stale node up when the partition heals on the minority side: the minority side cannot mutate, so when it rejoins it strictly has the older version, and the pull fires.
There is one corner case worth knowing about: the pull only fires when
peer_version > local_version. Two nodes at the same version with
different content would silently diverge — but the design forbids
that (only the master mutates, and the master is the only one bumping
the version) unless somebody hand-edits cluster.yaml and also
manually sets version:. Don't do that.
Why a check flips state
The aggregator runs on the master only. Followers' probe results are
shipped to the master via the ReportResult RPC; the master's own
probe results are submitted directly.
For each check, the aggregator keeps the latest result per node within a freshness window (3× the check interval, minimum 30s). On each incoming submission it counts OK vs not-OK across the fresh results:
- 0 fresh reports →
unknown - more OK than not-OK →
up - more not-OK than OK →
down - tie →
up(a tie at one report means one node says yes and one says no; biasing towardupavoids false alerts when nodes disagree transiently).
A state flip is not committed immediately. Hysteresis requires the
candidate state to hold for two consecutive aggregate evaluations
before the state transition fires and the alert dispatcher is called.
Set in internal/checks/aggregator.go as the HysteresisCount
constant — change it there if you want a hair-trigger or a slower
alert.
If the master changes, the new master starts the per-check state from
unknown and rebuilds it as fresh results arrive. The first few
seconds after a re-election can therefore show unknown even for
checks that were up a moment ago.
What qu does not do
These omissions are intentional in v1 and useful to know up front:
- No persistent history. Only the current aggregate state lives in memory. There are no graphs, no SLA reports. Add a sidecar (Prometheus exporter, SQLite logger) if you need them.
- No automatic key rotation. Re-init a node and re-trust if you need to roll its identity. See security.md.
- No multi-tenant isolation. One cluster = one set of checks = one alert tree.
- No web UI. Operator surface is
qu(CLI),qu tui, and direct edits tocluster.yaml. - No automatic peer eviction on prolonged downtime. A dead peer
stays in
cluster.yamluntil an operator runsqu node remove, because that decision affects the quorum size and shouldn't happen silently.