Added tests and readme

This commit is contained in:
2026-05-12 06:20:51 +00:00
parent 7e85bb0fcc
commit 139c224a31
23 changed files with 1449 additions and 97 deletions
+201
View File
@@ -0,0 +1,201 @@
# qu — quorum-based uptime monitor
`qu` is a small Linux daemon that watches HTTP, TCP, and ICMP endpoints
from several cooperating nodes. The nodes form a quorum cluster; one is
elected master and owns alert dispatch. A check is only reported as
**DOWN** when the majority of nodes agree, which keeps a single node's
flaky uplink from paging anyone at 3am.
A single static binary contains the daemon, the CLI, and everything in
between. Inter-node traffic is mutual TLS with SSH-style fingerprint
trust — no central CA, no shared secret.
## Why
Most uptime monitors are either a SaaS or a single box that, by
definition, can't tell you when it's the one that's down. `qu` solves
both: run it on a few cheap hosts in different networks and they vote
on truth. If one of them loses its uplink, the rest keep alerting.
## Architecture
```
+-------------- node A ---------------+
| qu serve |
| ├─ transport server (mTLS :9001) |
| ├─ quorum manager (heartbeats) |
| ├─ replicator (cluster.yaml) |
| ├─ scheduler (HTTP/TCP/ICMP) | <─── probes targets
| ├─ aggregator (master-only) |
| ├─ alerts (master-only) |
| └─ control socket (unix, for CLI) |
+-------------------------------------+
│ ▲ mTLS, pinned by fingerprint
▼ │
node B node C …
```
Every node runs every probe. Results are shipped to the elected master,
which folds them into a per-check sliding window. A state flips (UP↔DOWN)
only after **two consecutive aggregate evaluations** agree — that's
the hysteresis that absorbs network blips.
Master election is deterministic: among the live members of the quorum,
the node with the lexicographically smallest NodeID wins. No
negotiation, no split-brain window.
## Build
Requires Go 1.23 or newer.
```sh
go build -o qu ./cmd/qu
```
## Set up a 3-node cluster
On each host:
```sh
# 1. Generate identity + RSA-3072 keypair + self-signed cert.
qu init --advertise <this-host's reachable address>:9001
# 2. Start the daemon (foreground; wire it into systemd for prod).
qu serve
```
Pick one node and tell it about the other two. The CLI prints the
remote fingerprint and asks for confirmation, SSH-style:
```sh
qu node add bravo.example.com:9001
qu node add charlie.example.com:9001
```
That's it — the master broadcasts the new cluster config to every
trusting peer. `qu status` from any node should now show all three:
```
node a7f3...
term 2
master a7f3...
quorum true (need 2)
config ver 4
PEERS
NODE_ID ADVERTISE LIVE LAST_SEEN
a7f3... alpha.example.com:9001 true 2026-05-12T15:01:32Z
b21c... bravo.example.com:9001 true 2026-05-12T15:01:32Z
c0d4... charlie.example.com:9001 true 2026-05-12T15:01:32Z
```
## Adding checks and alerts
```sh
# alerts first so checks can reference them
qu alert add discord oncall --webhook https://discord.com/api/webhooks/...
qu alert add smtp ops --host smtp.example.com --port 587 \
--from monitor@example.com --to ops@example.com \
--user mailbot --password '****' --starttls=true
# checks
qu check add http homepage https://example.com --expect 200 --alerts oncall,ops
qu check add tcp db db.internal:5432 --interval 15s
qu check add icmp gateway 10.0.0.1 --interval 5s
```
Mutations always route to the master, which bumps a monotonic version
and pushes the new `cluster.yaml` to every peer. If quorum is lost,
mutating commands fail loudly.
## Test an alert without waiting for a real outage
```sh
qu alert test oncall
```
## File layout
A node's state lives under `$QUPTIME_DIR` (defaults to `/etc/quptime`
when root, `~/.config/quptime` otherwise):
```
node.yaml identity (NodeID, bind addr, port). Never replicated.
cluster.yaml replicated state: peers, checks, alerts, version.
trust.yaml local fingerprint trust store.
keys/ RSA private + public + self-signed cert.
```
The CLI talks to the local daemon over a unix socket at
`$QUPTIME_SOCKET` (defaults to `/var/run/quptime/quptime.sock` when
root, `$XDG_RUNTIME_DIR/quptime/quptime.sock` otherwise) — filesystem
permissions guard it; no TLS on the local socket.
## ICMP and capabilities
ICMP checks default to unprivileged UDP-mode pings so the daemon does
not need root or `CAP_NET_RAW`. If you want classic raw ICMP, either
run the daemon as root or grant the capability:
```sh
sudo setcap cap_net_raw=+ep ./qu
```
## CLI reference
```
qu init generate identity + keys
qu serve run the daemon
qu status quorum, master, check states
qu node add <host:port> TOFU-add a peer
qu node list show peers + liveness
qu node remove <node-id> remove from cluster + trust
qu check add http <name> <url> [--expect 200] [--interval 30s] [--body-match str] [--alerts a,b]
qu check add tcp <name> <host:port>
qu check add icmp <name> <host>
qu check list
qu check remove <id-or-name>
qu alert add smtp <name> --host … --port … --from … --to … [--user --password --starttls]
qu alert add discord <name> --webhook …
qu alert list / remove / test <id-or-name>
qu trust list / remove <node-id>
```
All `--interval` and `--timeout` flags accept Go duration syntax: `5s`,
`1m30s`, `2h`, etc.
## Tests
```sh
go test ./...
go test -race ./...
```
Each internal package has unit tests; coverage hovers around 6090 %
on the meaningful packages. The transport tests bring up real mTLS
listeners over loopback, which exercises the cert pinning end-to-end.
## What's intentionally not here (v1)
- No web UI. The CLI is the only operator surface.
- No historical metrics or SLA reports — only the current aggregate
state is kept in memory. Add SQLite later if you need graphs.
- No automatic key rotation. Re-init a node and re-trust if you need
to roll its identity.
- No multi-tenant isolation. One cluster = one set of checks.
## Layout
```
cmd/qu/ entry point
internal/config/ on-disk file layout, ClusterConfig, NodeConfig
internal/crypto/ RSA keypair + self-signed cert + SPKI fingerprints
internal/trust/ fingerprint trust store
internal/transport/ mTLS listener/dialer, framed JSON-RPC
internal/quorum/ heartbeats + deterministic master election
internal/replicate/ master-routed mutations, version-gated replication
internal/checks/ HTTP/TCP/ICMP probers, scheduler, aggregator
internal/alerts/ SMTP + Discord dispatchers, message rendering
internal/daemon/ glue: wires every component + control socket
internal/cli/ cobra commands, the user-facing surface
```
+2 -2
View File
@@ -32,7 +32,7 @@ func (d *Dispatcher) OnTransition(check *config.Check, from, to checks.State, sn
} }
msg := Render(d.selfID, check, from, to, snap) msg := Render(d.selfID, check, from, to, snap)
for _, alertID := range check.AlertIDs { for _, alertID := range check.AlertIDs {
alert, _ := d.cluster.FindAlert(alertID) alert := d.cluster.FindAlert(alertID)
if alert == nil { if alert == nil {
d.logger.Printf("alerts: check %q references unknown alert %q", check.Name, alertID) d.logger.Printf("alerts: check %q references unknown alert %q", check.Name, alertID)
continue continue
@@ -46,7 +46,7 @@ func (d *Dispatcher) OnTransition(check *config.Check, from, to checks.State, sn
// Test sends a one-shot test message to the named alert. Returns an // Test sends a one-shot test message to the named alert. Returns an
// error so the CLI can surface failures interactively. // error so the CLI can surface failures interactively.
func (d *Dispatcher) Test(alertID string) error { func (d *Dispatcher) Test(alertID string) error {
alert, _ := d.cluster.FindAlert(alertID) alert := d.cluster.FindAlert(alertID)
if alert == nil { if alert == nil {
return fmt.Errorf("alert %q not found", alertID) return fmt.Errorf("alert %q not found", alertID)
} }
+52
View File
@@ -0,0 +1,52 @@
package alerts
import (
"strings"
"testing"
"github.com/jasper/quptime/internal/checks"
"github.com/jasper/quptime/internal/config"
)
func TestRenderDownTransition(t *testing.T) {
check := &config.Check{Name: "homepage", Target: "https://example.com", Type: config.CheckHTTP}
snap := checks.Snapshot{Reports: 3, OKCount: 0, NotOK: 3, Detail: "connection refused"}
msg := Render("master-node", check, checks.StateUp, checks.StateDown, snap)
if !strings.Contains(msg.Subject, "DOWN") {
t.Errorf("subject missing DOWN: %q", msg.Subject)
}
if !strings.Contains(msg.Subject, "homepage") {
t.Errorf("subject missing check name: %q", msg.Subject)
}
if !strings.Contains(msg.Body, "connection refused") {
t.Errorf("body missing detail: %q", msg.Body)
}
if !strings.Contains(msg.Body, "master-node") {
t.Errorf("body missing reporter: %q", msg.Body)
}
if !strings.Contains(msg.Body, "3 (ok=0, fail=3)") {
t.Errorf("body missing report count: %q", msg.Body)
}
}
func TestRenderRecoveryTransition(t *testing.T) {
check := &config.Check{Name: "api", Target: "https://api/", Type: config.CheckHTTP}
snap := checks.Snapshot{Reports: 3, OKCount: 3, NotOK: 0}
msg := Render("master", check, checks.StateDown, checks.StateUp, snap)
if !strings.Contains(msg.Subject, "RECOVERED") {
t.Errorf("subject missing RECOVERED: %q", msg.Subject)
}
}
func TestRenderUpInitialTransition(t *testing.T) {
check := &config.Check{Name: "api", Target: "https://api/"}
snap := checks.Snapshot{Reports: 1, OKCount: 1}
msg := Render("master", check, checks.StateUnknown, checks.StateUp, snap)
if !strings.Contains(msg.Subject, "UP") {
t.Errorf("subject missing UP: %q", msg.Subject)
}
if strings.Contains(msg.Subject, "RECOVERED") {
t.Error("first-time UP should not be tagged RECOVERED")
}
}
-11
View File
@@ -99,17 +99,6 @@ func (a *Aggregator) Submit(nodeID string, r Result) {
a.evaluate(r.CheckID) a.evaluate(r.CheckID)
} }
// SnapshotAll returns the current aggregate view of every known check.
func (a *Aggregator) SnapshotAll() map[string]Snapshot {
a.mu.Lock()
defer a.mu.Unlock()
out := make(map[string]Snapshot, len(a.perCheck))
for id, st := range a.perCheck {
out[id] = a.snapshotLocked(id, st)
}
return out
}
// SnapshotFor returns the aggregate for a single check. // SnapshotFor returns the aggregate for a single check.
func (a *Aggregator) SnapshotFor(checkID string) (Snapshot, bool) { func (a *Aggregator) SnapshotFor(checkID string) (Snapshot, bool) {
a.mu.Lock() a.mu.Lock()
+111
View File
@@ -0,0 +1,111 @@
package checks
import (
"sync/atomic"
"testing"
"time"
"github.com/jasper/quptime/internal/config"
)
func TestAggregatorHysteresisRequiresConsecutiveEvals(t *testing.T) {
cluster := &config.ClusterConfig{Checks: []config.Check{
{ID: "c1", Name: "x", Interval: 10 * time.Second},
}}
var transitions atomic.Int32
agg := NewAggregator(cluster, func(_ *config.Check, _, _ State, _ Snapshot) {
transitions.Add(1)
})
// First OK submission — candidate=Up, committed still Unknown.
agg.Submit("nodeA", Result{CheckID: "c1", OK: true, Timestamp: time.Now()})
snap, _ := agg.SnapshotFor("c1")
if snap.State != StateUnknown {
t.Errorf("after one tick state=%s want unknown", snap.State)
}
if transitions.Load() != 0 {
t.Errorf("transitions=%d after one tick, want 0", transitions.Load())
}
// Second OK — hysteresis satisfied, commit Up.
agg.Submit("nodeA", Result{CheckID: "c1", OK: true, Timestamp: time.Now()})
snap, _ = agg.SnapshotFor("c1")
if snap.State != StateUp {
t.Errorf("after two ticks state=%s want up", snap.State)
}
if transitions.Load() != 1 {
t.Errorf("transitions=%d after commit, want 1", transitions.Load())
}
// Single failure — candidate flips to Down, committed stays Up.
agg.Submit("nodeA", Result{CheckID: "c1", OK: false, Detail: "boom", Timestamp: time.Now()})
snap, _ = agg.SnapshotFor("c1")
if snap.State != StateUp {
t.Errorf("single fail flipped state prematurely: %s", snap.State)
}
// Second failure — commit Down.
agg.Submit("nodeA", Result{CheckID: "c1", OK: false, Detail: "boom", Timestamp: time.Now()})
snap, _ = agg.SnapshotFor("c1")
if snap.State != StateDown {
t.Errorf("after two fails state=%s want down", snap.State)
}
if transitions.Load() != 2 {
t.Errorf("transitions=%d after second commit, want 2", transitions.Load())
}
}
func TestAggregatorMajorityRule(t *testing.T) {
cluster := &config.ClusterConfig{Checks: []config.Check{
{ID: "c1", Name: "x", Interval: 10 * time.Second},
}}
agg := NewAggregator(cluster, nil)
// 2 OK + 1 fail → candidate Up.
now := time.Now()
agg.Submit("a", Result{CheckID: "c1", OK: true, Timestamp: now})
agg.Submit("b", Result{CheckID: "c1", OK: true, Timestamp: now})
agg.Submit("c", Result{CheckID: "c1", OK: false, Timestamp: now})
snap, _ := agg.SnapshotFor("c1")
if snap.OKCount != 2 || snap.NotOK != 1 {
t.Errorf("counts wrong: %+v", snap)
}
// flip the majority
for i := 0; i < 2; i++ {
agg.Submit("a", Result{CheckID: "c1", OK: false, Timestamp: time.Now()})
agg.Submit("b", Result{CheckID: "c1", OK: false, Timestamp: time.Now()})
agg.Submit("c", Result{CheckID: "c1", OK: false, Timestamp: time.Now()})
}
snap, _ = agg.SnapshotFor("c1")
if snap.State != StateDown {
t.Errorf("majority-fail did not transition to down: %s", snap.State)
}
}
func TestAggregatorDropsUnknownChecks(t *testing.T) {
cluster := &config.ClusterConfig{}
agg := NewAggregator(cluster, nil)
agg.Submit("a", Result{CheckID: "ghost", OK: true, Timestamp: time.Now()})
if _, ok := agg.SnapshotFor("ghost"); ok {
t.Error("aggregator kept state for unconfigured check")
}
}
func TestAggregatorIgnoresStaleResults(t *testing.T) {
cluster := &config.ClusterConfig{Checks: []config.Check{
{ID: "c1", Name: "x", Interval: 10 * time.Second},
}}
agg := NewAggregator(cluster, nil)
old := time.Now().Add(-10 * time.Minute)
agg.Submit("a", Result{CheckID: "c1", OK: true, Timestamp: old})
snap, _ := agg.SnapshotFor("c1")
if snap.Reports != 0 {
t.Errorf("stale report counted: %+v", snap)
}
}
+118
View File
@@ -0,0 +1,118 @@
package checks
import (
"context"
"net"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"github.com/jasper/quptime/internal/config"
)
func TestHTTPProberHappyPath(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(200)
w.Write([]byte("hello world"))
}))
defer srv.Close()
res := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckHTTP, Target: srv.URL,
Timeout: 5 * time.Second, ExpectStatus: 200,
})
if !res.OK {
t.Errorf("expected OK, got %+v", res)
}
}
func TestHTTPProberBodyMatch(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(200)
w.Write([]byte("the magic word is xyzzy and other stuff"))
}))
defer srv.Close()
hit := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckHTTP, Target: srv.URL,
Timeout: 5 * time.Second, BodyMatch: "xyzzy",
})
if !hit.OK {
t.Errorf("expected match, got %+v", hit)
}
miss := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckHTTP, Target: srv.URL,
Timeout: 5 * time.Second, BodyMatch: "absent",
})
if miss.OK {
t.Errorf("expected miss, got %+v", miss)
}
if !strings.Contains(miss.Detail, "body match") {
t.Errorf("detail unexpected: %q", miss.Detail)
}
}
func TestHTTPProberStatusMismatch(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(500)
}))
defer srv.Close()
res := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckHTTP, Target: srv.URL, Timeout: 5 * time.Second,
})
if res.OK {
t.Errorf("500 should fail check, got %+v", res)
}
}
func TestTCPProberHappyPath(t *testing.T) {
ln, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
go func() {
for {
c, err := ln.Accept()
if err != nil {
return
}
c.Close()
}
}()
res := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckTCP, Target: ln.Addr().String(),
Timeout: 2 * time.Second,
})
if !res.OK {
t.Errorf("expected OK, got %+v", res)
}
}
func TestTCPProberRefusedConnection(t *testing.T) {
// Listen and immediately close so the address is known-bad.
ln, _ := net.Listen("tcp", "127.0.0.1:0")
addr := ln.Addr().String()
ln.Close()
res := Run(context.Background(), &config.Check{
ID: "c", Type: config.CheckTCP, Target: addr, Timeout: 1 * time.Second,
})
if res.OK {
t.Errorf("dead address should fail check, got %+v", res)
}
}
func TestRunUnknownCheckType(t *testing.T) {
res := Run(context.Background(), &config.Check{
ID: "c", Type: "bogus", Target: "x",
})
if res.OK {
t.Error("unknown check type should not succeed")
}
}
+5 -18
View File
@@ -4,6 +4,7 @@ import (
"context" "context"
"encoding/json" "encoding/json"
"fmt" "fmt"
"text/tabwriter"
"time" "time"
"github.com/google/uuid" "github.com/google/uuid"
@@ -30,30 +31,16 @@ func addAlertCmd(root *cobra.Command) {
Use: "list", Use: "list",
Short: "List configured alerts", Short: "List configured alerts",
RunE: func(cmd *cobra.Command, args []string) error { RunE: func(cmd *cobra.Command, args []string) error {
ctx, cancel := context.WithTimeout(cmd.Context(), 10*time.Second)
defer cancel()
raw, err := callDaemon(ctx, daemon.CtrlStatus, nil)
if err != nil {
return err
}
var st transport.StatusResponse
if err := json.Unmarshal(raw, &st); err != nil {
return err
}
// status response doesn't carry alerts — call mutate with a
// "list" by reading cluster.yaml indirectly via status's
// version is not enough. Fall back: ask for ClusterConfig
// via a dedicated read RPC if needed. For v1 we rely on
// node.yaml being co-located: read the local cluster.yaml
// directly so the operator gets up-to-date output.
cluster, err := config.LoadClusterConfig() cluster, err := config.LoadClusterConfig()
if err != nil { if err != nil {
return err return err
} }
tw := tabwriter.NewWriter(cmd.OutOrStdout(), 0, 0, 2, ' ', 0)
fmt.Fprintln(tw, "ID\tTYPE\tNAME")
for _, a := range cluster.Alerts { for _, a := range cluster.Alerts {
fmt.Fprintf(cmd.OutOrStdout(), "%s\t%s\t%s\n", a.ID, a.Type, a.Name) fmt.Fprintf(tw, "%s\t%s\t%s\n", a.ID, a.Type, a.Name)
} }
return nil return tw.Flush()
}, },
} }
-5
View File
@@ -30,7 +30,6 @@ func addCheckCmd(root *cobra.Command) {
}) })
addHTTP.Flags().Int("expect", 200, "HTTP status code that signals UP") addHTTP.Flags().Int("expect", 200, "HTTP status code that signals UP")
addHTTP.Flags().String("body-match", "", "substring required in response body for UP") addHTTP.Flags().String("body-match", "", "substring required in response body for UP")
bindHTTPFlags(addHTTP)
addTCP := buildAddCheckCmd(config.CheckTCP, "tcp", "<name> <host:port>", addTCP := buildAddCheckCmd(config.CheckTCP, "tcp", "<name> <host:port>",
"Add a TCP-connect check", "Add a TCP-connect check",
@@ -168,7 +167,3 @@ func bindCheckFlags(cmd *cobra.Command) {
cmd.Flags().String("timeout", "10s", "per-probe timeout") cmd.Flags().String("timeout", "10s", "per-probe timeout")
cmd.Flags().String("alerts", "", "comma-separated alert IDs/names to notify on transition") cmd.Flags().String("alerts", "", "comma-separated alert IDs/names to notify on transition")
} }
// bindHTTPFlags is a no-op kept to mirror the per-type flag bind sites
// so the caller can extend cleanly later.
func bindHTTPFlags(cmd *cobra.Command) {}
+5 -30
View File
@@ -175,43 +175,18 @@ func (c *ClusterConfig) Replace(incoming *ClusterConfig) (bool, error) {
return true, nil return true, nil
} }
// FindCheck returns the check with the given ID or name. // FindAlert returns the alert with the given ID or name, or nil if
func (c *ClusterConfig) FindCheck(idOrName string) (*Check, int) { // no entry matches.
c.mu.RLock() func (c *ClusterConfig) FindAlert(idOrName string) *Alert {
defer c.mu.RUnlock()
for i := range c.Checks {
if c.Checks[i].ID == idOrName || c.Checks[i].Name == idOrName {
cp := c.Checks[i]
return &cp, i
}
}
return nil, -1
}
// FindAlert returns the alert with the given ID or name.
func (c *ClusterConfig) FindAlert(idOrName string) (*Alert, int) {
c.mu.RLock() c.mu.RLock()
defer c.mu.RUnlock() defer c.mu.RUnlock()
for i := range c.Alerts { for i := range c.Alerts {
if c.Alerts[i].ID == idOrName || c.Alerts[i].Name == idOrName { if c.Alerts[i].ID == idOrName || c.Alerts[i].Name == idOrName {
cp := c.Alerts[i] cp := c.Alerts[i]
return &cp, i return &cp
} }
} }
return nil, -1 return nil
}
// FindPeer returns the peer with the given node ID.
func (c *ClusterConfig) FindPeer(nodeID string) (*PeerInfo, int) {
c.mu.RLock()
defer c.mu.RUnlock()
for i := range c.Peers {
if c.Peers[i].NodeID == nodeID {
cp := c.Peers[i]
return &cp, i
}
}
return nil, -1
} }
// QuorumSize returns the minimum number of live nodes required for // QuorumSize returns the minimum number of live nodes required for
+107
View File
@@ -0,0 +1,107 @@
package config
import (
"fmt"
"testing"
)
func TestQuorumSize(t *testing.T) {
cases := []struct {
peers int
want int
}{
{0, 1},
{1, 1},
{2, 2},
{3, 2},
{4, 3},
{5, 3},
{7, 4},
}
for _, tc := range cases {
c := &ClusterConfig{}
for i := 0; i < tc.peers; i++ {
c.Peers = append(c.Peers, PeerInfo{NodeID: fmt.Sprintf("n%d", i)})
}
if got := c.QuorumSize(); got != tc.want {
t.Errorf("peers=%d: QuorumSize=%d want %d", tc.peers, got, tc.want)
}
}
}
func TestClusterMutateBumpsVersion(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
c := &ClusterConfig{}
err := c.Mutate("nodeA", func(cc *ClusterConfig) error {
cc.Checks = append(cc.Checks, Check{ID: "1", Name: "x"})
return nil
})
if err != nil {
t.Fatal(err)
}
if c.Version != 1 {
t.Errorf("Version=%d want 1", c.Version)
}
if c.UpdatedBy != "nodeA" {
t.Errorf("UpdatedBy=%q want nodeA", c.UpdatedBy)
}
err = c.Mutate("nodeB", func(cc *ClusterConfig) error { return nil })
if err != nil {
t.Fatal(err)
}
if c.Version != 2 {
t.Errorf("Version=%d want 2 after second mutate", c.Version)
}
}
func TestClusterReplaceGatesOnVersion(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
cur := &ClusterConfig{Version: 5, Checks: []Check{{ID: "old"}}}
if applied, _ := cur.Replace(&ClusterConfig{Version: 4}); applied {
t.Error("older version was applied")
}
if applied, _ := cur.Replace(&ClusterConfig{Version: 5}); applied {
t.Error("equal version was applied")
}
applied, err := cur.Replace(&ClusterConfig{
Version: 6,
Checks: []Check{{ID: "new"}},
})
if err != nil {
t.Fatal(err)
}
if !applied {
t.Error("newer version was not applied")
}
if cur.Version != 6 || len(cur.Checks) != 1 || cur.Checks[0].ID != "new" {
t.Errorf("after replace: %+v", cur)
}
}
func TestClusterSnapshotIsCopy(t *testing.T) {
c := &ClusterConfig{Checks: []Check{{ID: "a"}}}
snap := c.Snapshot()
snap.Checks[0].ID = "b"
if c.Checks[0].ID != "a" {
t.Error("snapshot mutation leaked back to original")
}
}
func TestFindAlert(t *testing.T) {
c := &ClusterConfig{Alerts: []Alert{
{ID: "id-1", Name: "primary", Type: AlertSMTP},
{ID: "id-2", Name: "secondary", Type: AlertDiscord},
}}
if a := c.FindAlert("primary"); a == nil || a.Type != AlertSMTP {
t.Errorf("by name: %+v", a)
}
if a := c.FindAlert("id-2"); a == nil || a.Type != AlertDiscord {
t.Errorf("by id: %+v", a)
}
if a := c.FindAlert("ghost"); a != nil {
t.Errorf("expected nil for missing, got %+v", a)
}
}
+58
View File
@@ -0,0 +1,58 @@
package config
import "testing"
func TestAdvertiseAddrFallback(t *testing.T) {
cases := []struct {
name string
cfg NodeConfig
want string
}{
{"explicit advertise wins", NodeConfig{Advertise: "host:1234", BindAddr: "0.0.0.0", BindPort: 9001}, "host:1234"},
{"empty bind falls back to loopback", NodeConfig{BindPort: 9001}, "127.0.0.1:9001"},
{"wildcard bind falls back to loopback", NodeConfig{BindAddr: "0.0.0.0", BindPort: 9001}, "127.0.0.1:9001"},
{"ipv6 wildcard falls back to loopback", NodeConfig{BindAddr: "::", BindPort: 9001}, "127.0.0.1:9001"},
{"specific bind preserved", NodeConfig{BindAddr: "10.0.0.1", BindPort: 9001}, "10.0.0.1:9001"},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
if got := tc.cfg.AdvertiseAddr(); got != tc.want {
t.Errorf("got %q want %q", got, tc.want)
}
})
}
}
func TestNodeConfigRoundtrip(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
n := &NodeConfig{NodeID: "abc", BindAddr: "127.0.0.1", BindPort: 9001, Advertise: "10.0.0.1:9001"}
if err := n.Save(); err != nil {
t.Fatal(err)
}
loaded, err := LoadNodeConfig()
if err != nil {
t.Fatal(err)
}
if *loaded != *n {
t.Errorf("got %+v want %+v", *loaded, *n)
}
}
func TestLoadNodeConfigAppliesDefaults(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
// Save with empty bind addr/port to verify Load fills them.
n := &NodeConfig{NodeID: "abc"}
if err := n.Save(); err != nil {
t.Fatal(err)
}
loaded, err := LoadNodeConfig()
if err != nil {
t.Fatal(err)
}
if loaded.BindPort != 9001 {
t.Errorf("BindPort=%d want 9001", loaded.BindPort)
}
if loaded.BindAddr != "0.0.0.0" {
t.Errorf("BindAddr=%q want 0.0.0.0", loaded.BindAddr)
}
}
-5
View File
@@ -6,7 +6,6 @@
// cluster.yaml — replicated state (peers, checks, alerts, version) // cluster.yaml — replicated state (peers, checks, alerts, version)
// trust.yaml — local fingerprint trust store // trust.yaml — local fingerprint trust store
// keys/ — RSA private + public keys + self-signed cert // keys/ — RSA private + public keys + self-signed cert
// state.json — runtime cache (last check results, current master)
// //
// A unix socket for the local CLI lives alongside (defaults to // A unix socket for the local CLI lives alongside (defaults to
// /var/run/quptime/quptime.sock when running as root, otherwise // /var/run/quptime/quptime.sock when running as root, otherwise
@@ -25,7 +24,6 @@ const (
NodeFile = "node.yaml" NodeFile = "node.yaml"
ClusterFile = "cluster.yaml" ClusterFile = "cluster.yaml"
TrustFile = "trust.yaml" TrustFile = "trust.yaml"
StateFile = "state.json"
KeysDir = "keys" KeysDir = "keys"
PrivateKey = "private.pem" PrivateKey = "private.pem"
PublicKey = "public.pem" PublicKey = "public.pem"
@@ -86,9 +84,6 @@ func ClusterFilePath() string { return filepath.Join(DataDir(), ClusterFile) }
// TrustFilePath returns the absolute path to trust.yaml. // TrustFilePath returns the absolute path to trust.yaml.
func TrustFilePath() string { return filepath.Join(DataDir(), TrustFile) } func TrustFilePath() string { return filepath.Join(DataDir(), TrustFile) }
// StateFilePath returns the absolute path to state.json.
func StateFilePath() string { return filepath.Join(DataDir(), StateFile) }
// PrivateKeyPath returns the absolute path to the RSA private key. // PrivateKeyPath returns the absolute path to the RSA private key.
func PrivateKeyPath() string { return filepath.Join(DataDir(), KeysDir, PrivateKey) } func PrivateKeyPath() string { return filepath.Join(DataDir(), KeysDir, PrivateKey) }
-10
View File
@@ -65,13 +65,3 @@ func FingerprintFromCertPEM(certPEM []byte) (string, error) {
} }
return Fingerprint(cert), nil return Fingerprint(cert), nil
} }
// FingerprintFromPubKeyPEM parses a public-key PEM and returns its
// fingerprint over the same SPKI bytes.
func FingerprintFromPubKeyPEM(pubPEM []byte) (string, error) {
block, _ := pem.Decode(pubPEM)
if block == nil {
return "", errors.New("pubkey: no PEM block")
}
return FingerprintFromSPKI(block.Bytes), nil
}
+111
View File
@@ -0,0 +1,111 @@
package crypto
import (
"crypto/x509"
"encoding/pem"
"strings"
"testing"
)
func TestGenerateAndLoadKeyPair(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
priv, err := GenerateKeyPair("node-1")
if err != nil {
t.Fatalf("GenerateKeyPair: %v", err)
}
if priv.N.BitLen() < KeySize-8 {
t.Errorf("key too small: %d bits", priv.N.BitLen())
}
// Refusing to overwrite existing material is part of the contract.
if _, err := GenerateKeyPair("node-1"); err == nil {
t.Error("expected error on re-generate")
}
loaded, err := LoadPrivateKey()
if err != nil {
t.Fatalf("LoadPrivateKey: %v", err)
}
if loaded.N.Cmp(priv.N) != 0 {
t.Error("loaded key modulus differs from generated")
}
}
func TestFingerprintDeterminismAndUniqueness(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
priv, err := GenerateKeyPair("node-x")
if err != nil {
t.Fatal(err)
}
certPEM, err := LoadCertPEM()
if err != nil {
t.Fatal(err)
}
block, _ := pem.Decode(certPEM)
if block == nil {
t.Fatal("no PEM block in cert")
}
cert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
t.Fatal(err)
}
fp1 := Fingerprint(cert)
fp2 := Fingerprint(cert)
if fp1 != fp2 {
t.Errorf("non-deterministic: %s vs %s", fp1, fp2)
}
if !strings.HasPrefix(fp1, "sha256:") {
t.Errorf("missing sha256: prefix: %s", fp1)
}
pemFP, err := FingerprintFromCertPEM(certPEM)
if err != nil {
t.Fatal(err)
}
if pemFP != fp1 {
t.Errorf("PEM-derived fingerprint differs: %s vs %s", pemFP, fp1)
}
// Now generate a fresh cert from the same key — fingerprint must
// match (SPKI is identical).
derSame, err := buildSelfSignedCert(priv, "node-x")
if err != nil {
t.Fatal(err)
}
certSame, _ := x509.ParseCertificate(derSame)
if Fingerprint(certSame) != fp1 {
t.Error("fingerprint changed across cert regen with same key")
}
}
func TestFingerprintDiffersAcrossKeys(t *testing.T) {
dirA := t.TempDir()
dirB := t.TempDir()
t.Setenv("QUPTIME_DIR", dirA)
if _, err := GenerateKeyPair("a"); err != nil {
t.Fatal(err)
}
pemA, _ := LoadCertPEM()
fpA, _ := FingerprintFromCertPEM(pemA)
t.Setenv("QUPTIME_DIR", dirB)
if _, err := GenerateKeyPair("b"); err != nil {
t.Fatal(err)
}
pemB, _ := LoadCertPEM()
fpB, _ := FingerprintFromCertPEM(pemB)
if fpA == fpB {
t.Error("two independent keys produced the same fingerprint")
}
}
func TestFingerprintFromCertPEMRejectsGarbage(t *testing.T) {
if _, err := FingerprintFromCertPEM([]byte("not a pem")); err == nil {
t.Error("expected error on non-PEM input")
}
}
-6
View File
@@ -88,12 +88,6 @@ func LoadCertPEM() ([]byte, error) {
return os.ReadFile(config.CertFilePath()) return os.ReadFile(config.CertFilePath())
} }
// LoadPublicKeyPEM reads the public-key PEM (exchanged out of band
// during invite / join).
func LoadPublicKeyPEM() ([]byte, error) {
return os.ReadFile(config.PublicKeyPath())
}
func writePEM(path, blockType string, der []byte, perm os.FileMode) error { func writePEM(path, blockType string, der []byte, perm os.FileMode) error {
encoded := pem.EncodeToMemory(&pem.Block{Type: blockType, Bytes: der}) encoded := pem.EncodeToMemory(&pem.Block{Type: blockType, Bytes: der})
return config.AtomicWrite(path, encoded, perm) return config.AtomicWrite(path, encoded, perm)
+1 -2
View File
@@ -196,8 +196,7 @@ func (c *controlServer) dispatch(ctx context.Context, req CtrlRequest) CtrlRespo
if err := json.Unmarshal(req.Body, &body); err != nil { if err := json.Unmarshal(req.Body, &body); err != nil {
return fail(err) return fail(err)
} }
var payload json.RawMessage = body.Payload ver, err := c.d.replicator.LocalMutate(ctx, body.Kind, body.Payload)
ver, err := c.d.replicator.LocalMutate(ctx, body.Kind, json.RawMessage(payload))
if err != nil { if err != nil {
return fail(err) return fail(err)
} }
+146
View File
@@ -0,0 +1,146 @@
package quorum
import (
"testing"
"time"
"github.com/jasper/quptime/internal/config"
"github.com/jasper/quptime/internal/transport"
)
func threeNode(self string) (*config.ClusterConfig, *Manager) {
cluster := &config.ClusterConfig{Peers: []config.PeerInfo{
{NodeID: "a"}, {NodeID: "b"}, {NodeID: "c"},
}}
return cluster, New(self, cluster, nil)
}
func TestSoloNodeElectsItself(t *testing.T) {
cluster := &config.ClusterConfig{}
m := New("only", cluster, nil)
m.markLive("only")
m.recomputeMaster()
if m.Master() != "only" {
t.Errorf("Master=%q want %q", m.Master(), "only")
}
if !m.HasQuorum() {
t.Error("solo node should have quorum")
}
if m.Term() != 1 {
t.Errorf("Term=%d want 1 after first election", m.Term())
}
}
func TestThreeNodeElectsLowestNodeID(t *testing.T) {
_, m := threeNode("b")
m.markLive("a")
m.markLive("b")
m.markLive("c")
m.recomputeMaster()
if got := m.Master(); got != "a" {
t.Errorf("Master=%q want a", got)
}
if !m.HasQuorum() {
t.Error("expected quorum with 3 live of 3")
}
}
func TestNoQuorumClearsMaster(t *testing.T) {
_, m := threeNode("b")
m.markLive("b")
m.recomputeMaster()
if m.Master() != "" {
t.Errorf("Master=%q want empty (no quorum)", m.Master())
}
if m.HasQuorum() {
t.Error("1 of 3 live should not be quorum")
}
}
func TestTermBumpsOnMasterChange(t *testing.T) {
_, m := threeNode("b")
m.markLive("a")
m.markLive("b")
m.recomputeMaster()
termBefore := m.Term()
masterBefore := m.Master()
if masterBefore != "a" {
t.Fatalf("expected initial master a, got %q", masterBefore)
}
// "a" goes dead — we and "c" join up.
m.mu.Lock()
delete(m.lastSeen, "a")
m.mu.Unlock()
m.markLive("c")
m.recomputeMaster()
if m.Master() != "b" {
t.Errorf("after a-fail Master=%q want b", m.Master())
}
if m.Term() <= termBefore {
t.Errorf("Term did not bump: before=%d after=%d", termBefore, m.Term())
}
}
func TestHandleHeartbeatMarksSenderLive(t *testing.T) {
cluster, m := threeNode("a")
_ = cluster
resp := m.HandleHeartbeat(transport.HeartbeatRequest{
FromNodeID: "b",
Term: 7,
MasterID: "a",
Version: 3,
})
if resp.NodeID != "a" {
t.Errorf("response NodeID=%q want a", resp.NodeID)
}
if _, ok := m.Liveness()["b"]; !ok {
t.Error("sender was not recorded live")
}
}
func TestDeadAfterEvictsStaleLiveness(t *testing.T) {
_, m := threeNode("a")
m.deadAfter = 50 * time.Millisecond
m.markLive("a")
m.markLive("b")
m.markLive("c")
m.recomputeMaster()
if m.Master() != "a" {
t.Fatal("expected initial master a")
}
// Wait past the dead-after window — only self remains live.
time.Sleep(120 * time.Millisecond)
m.markLive("a")
m.recomputeMaster()
if m.Master() != "" {
t.Errorf("expected no master after peers timed out, got %q", m.Master())
}
}
func TestVersionObserverFiresOnHigherVersion(t *testing.T) {
cluster := &config.ClusterConfig{Version: 2}
m := New("a", cluster, nil)
var notified struct {
peerID string
peerVer uint64
count int
}
m.SetVersionObserver(func(peerID, _ string, peerVer uint64) {
notified.peerID = peerID
notified.peerVer = peerVer
notified.count++
})
m.maybeNotifyVersion("b", 5)
if notified.count != 1 || notified.peerID != "b" || notified.peerVer != 5 {
t.Errorf("expected observer fired with b=5, got %+v", notified)
}
m.maybeNotifyVersion("b", 1)
if notified.count != 1 {
t.Errorf("observer fired for stale version, count=%d", notified.count)
}
}
+9 -2
View File
@@ -36,16 +36,23 @@ type MasterView interface {
HasQuorum() bool HasQuorum() bool
} }
// RPCClient is the slice of *transport.Client that the replicator
// actually uses. Pulled out as an interface so tests can stub it
// without bringing up a TLS listener.
type RPCClient interface {
Call(ctx context.Context, nodeID, addr, method string, params, out any) error
}
// Replicator drives mutation routing and broadcast. // Replicator drives mutation routing and broadcast.
type Replicator struct { type Replicator struct {
selfID string selfID string
cluster *config.ClusterConfig cluster *config.ClusterConfig
client *transport.Client client RPCClient
master MasterView master MasterView
} }
// New constructs a replicator. selfID is this node's NodeID. // New constructs a replicator. selfID is this node's NodeID.
func New(selfID string, cluster *config.ClusterConfig, client *transport.Client, master MasterView) *Replicator { func New(selfID string, cluster *config.ClusterConfig, client RPCClient, master MasterView) *Replicator {
return &Replicator{ return &Replicator{
selfID: selfID, selfID: selfID,
cluster: cluster, cluster: cluster,
+164
View File
@@ -0,0 +1,164 @@
package replicate
import (
"context"
"encoding/json"
"sync"
"testing"
"github.com/jasper/quptime/internal/config"
"github.com/jasper/quptime/internal/transport"
)
type fakeMaster struct {
master string
isMaster bool
hasQuorum bool
}
func (f *fakeMaster) Master() string { return f.master }
func (f *fakeMaster) IsMaster() bool { return f.isMaster }
func (f *fakeMaster) HasQuorum() bool { return f.hasQuorum }
// stubClient records every Call without doing any actual I/O.
type stubClient struct {
mu sync.Mutex
calls []string
}
func (s *stubClient) Call(_ context.Context, _, _, method string, _, _ any) error {
s.mu.Lock()
defer s.mu.Unlock()
s.calls = append(s.calls, method)
return nil
}
func newReplicator(t *testing.T, isMaster, hasQuorum bool) (*Replicator, *config.ClusterConfig, *stubClient) {
t.Helper()
t.Setenv("QUPTIME_DIR", t.TempDir())
cluster := &config.ClusterConfig{}
fm := &fakeMaster{master: "self", isMaster: isMaster, hasQuorum: hasQuorum}
stub := &stubClient{}
r := New("self", cluster, stub, fm)
return r, cluster, stub
}
func TestApplyAddCheck(t *testing.T) {
r, cluster, _ := newReplicator(t, true, true)
payload, _ := json.Marshal(config.Check{ID: "c1", Name: "homepage", Type: config.CheckHTTP, Target: "https://example.com"})
ver, err := r.LocalMutate(context.Background(), transport.MutationAddCheck, json.RawMessage(payload))
if err != nil {
t.Fatal(err)
}
if ver != 1 {
t.Errorf("version=%d want 1", ver)
}
if len(cluster.Snapshot().Checks) != 1 {
t.Errorf("expected 1 check, got %d", len(cluster.Snapshot().Checks))
}
}
func TestApplyRemoveCheck(t *testing.T) {
r, cluster, _ := newReplicator(t, true, true)
_ = cluster.Mutate("self", func(c *config.ClusterConfig) error {
c.Checks = []config.Check{{ID: "c1", Name: "x"}, {ID: "c2", Name: "y"}}
return nil
})
target, _ := json.Marshal("x")
ver, err := r.LocalMutate(context.Background(), transport.MutationRemoveCheck, json.RawMessage(target))
if err != nil {
t.Fatal(err)
}
if ver < 2 {
t.Errorf("version did not advance: %d", ver)
}
cs := cluster.Snapshot().Checks
if len(cs) != 1 || cs[0].ID != "c2" {
t.Errorf("expected only c2 remaining, got %+v", cs)
}
}
func TestApplyAddAndRemoveAlertAndPeer(t *testing.T) {
r, cluster, _ := newReplicator(t, true, true)
alert, _ := json.Marshal(config.Alert{ID: "a1", Name: "notify", Type: config.AlertDiscord})
if _, err := r.LocalMutate(context.Background(), transport.MutationAddAlert, json.RawMessage(alert)); err != nil {
t.Fatal(err)
}
peer, _ := json.Marshal(config.PeerInfo{NodeID: "p1", Advertise: "10.0.0.1:9001", Fingerprint: "fp"})
if _, err := r.LocalMutate(context.Background(), transport.MutationAddPeer, json.RawMessage(peer)); err != nil {
t.Fatal(err)
}
snap := cluster.Snapshot()
if len(snap.Alerts) != 1 || len(snap.Peers) != 1 {
t.Fatalf("missing entries: %+v", snap)
}
target, _ := json.Marshal("notify")
if _, err := r.LocalMutate(context.Background(), transport.MutationRemoveAlert, json.RawMessage(target)); err != nil {
t.Fatal(err)
}
target, _ = json.Marshal("p1")
if _, err := r.LocalMutate(context.Background(), transport.MutationRemovePeer, json.RawMessage(target)); err != nil {
t.Fatal(err)
}
snap = cluster.Snapshot()
if len(snap.Alerts) != 0 || len(snap.Peers) != 0 {
t.Errorf("entries not removed: %+v", snap)
}
}
func TestMutateRequiresQuorum(t *testing.T) {
r, _, _ := newReplicator(t, true, false)
_, err := r.LocalMutate(context.Background(), transport.MutationAddCheck, json.RawMessage("{}"))
if err == nil {
t.Error("expected quorum-required error")
}
}
func TestHandleApplyClusterCfgGatesOnVersion(t *testing.T) {
r, cluster, _ := newReplicator(t, false, true)
// Push local version to 7 directly via Replace (Mutate would
// implicitly bump to 8 and confuse the test cases below).
if _, err := cluster.Replace(&config.ClusterConfig{Version: 7}); err != nil {
t.Fatal(err)
}
if resp := r.HandleApplyClusterCfg(transport.ApplyClusterCfgRequest{
Config: &config.ClusterConfig{Version: 6},
}); resp.Applied {
t.Error("older snapshot was applied")
}
if resp := r.HandleApplyClusterCfg(transport.ApplyClusterCfgRequest{
Config: &config.ClusterConfig{Version: 7},
}); resp.Applied {
t.Error("same-version snapshot was applied")
}
resp := r.HandleApplyClusterCfg(transport.ApplyClusterCfgRequest{
Config: &config.ClusterConfig{Version: 8, Checks: []config.Check{{ID: "n"}}},
})
if !resp.Applied {
t.Error("newer snapshot was rejected")
}
if cluster.Snapshot().Version != 8 {
t.Errorf("local version did not advance: %d", cluster.Snapshot().Version)
}
}
func TestHandleProposeMutationRejectsNonMaster(t *testing.T) {
r, _, _ := newReplicator(t, false, true)
resp := r.HandleProposeMutation(context.Background(), transport.ProposeMutationRequest{
FromNodeID: "follower",
Kind: transport.MutationAddCheck,
Payload: json.RawMessage(`{}`),
})
if resp.Error == "" {
t.Error("follower accepted a proposal")
}
}
+76
View File
@@ -0,0 +1,76 @@
package transport
import (
"bytes"
"io"
"testing"
)
func TestFrameRoundtrip(t *testing.T) {
cases := [][]byte{
nil,
{},
[]byte("hello"),
bytes.Repeat([]byte("x"), 1<<14),
}
for _, payload := range cases {
var buf bytes.Buffer
if err := writeFrame(&buf, payload); err != nil {
t.Fatalf("write %d bytes: %v", len(payload), err)
}
out, err := readFrame(&buf)
if err != nil {
t.Fatalf("read %d bytes: %v", len(payload), err)
}
if !bytes.Equal(out, payload) {
t.Errorf("roundtrip lost data for %d bytes", len(payload))
}
}
}
func TestFrameRejectsOversize(t *testing.T) {
var buf bytes.Buffer
if err := writeFrame(&buf, bytes.Repeat([]byte{0}, MaxFrameSize+1)); err == nil {
t.Error("oversized write was accepted")
}
}
func TestFrameRejectsOversizeOnRead(t *testing.T) {
// hand-crafted header announcing a size beyond the cap
var buf bytes.Buffer
buf.Write([]byte{0xFF, 0xFF, 0xFF, 0xFF}) // ~4GiB
if _, err := readFrame(&buf); err == nil {
t.Error("oversized read was accepted")
}
}
func TestFrameReportsShortRead(t *testing.T) {
var buf bytes.Buffer
// header says 10 bytes, body only 3
buf.Write([]byte{0, 0, 0, 10})
buf.WriteString("abc")
if _, err := readFrame(&buf); err == nil {
t.Error("short body did not error")
}
}
func TestMultipleFramesInOneStream(t *testing.T) {
var buf bytes.Buffer
for _, s := range []string{"first", "second", "third"} {
if err := writeFrame(&buf, []byte(s)); err != nil {
t.Fatal(err)
}
}
for _, want := range []string{"first", "second", "third"} {
got, err := readFrame(&buf)
if err != nil {
t.Fatal(err)
}
if string(got) != want {
t.Errorf("got %q want %q", got, want)
}
}
if _, err := readFrame(&buf); err != io.EOF {
t.Errorf("expected EOF, got %v", err)
}
}
-6
View File
@@ -3,7 +3,6 @@ package transport
import ( import (
"context" "context"
"crypto/tls" "crypto/tls"
"crypto/x509"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt" "fmt"
@@ -322,8 +321,3 @@ func peerNodeIDFromConnState(cs tls.ConnectionState) string {
} }
return cs.PeerCertificates[0].Subject.CommonName return cs.PeerCertificates[0].Subject.CommonName
} }
// fingerprintOf is a small local mirror to keep this file independent
// of the crypto package's import path at link time; we recompute the
// SPKI hash here. Defined in tofu.go.
var _ = (*x509.Certificate)(nil)
+186
View File
@@ -0,0 +1,186 @@
package transport
import (
"context"
"encoding/json"
"errors"
"net"
"testing"
"time"
"github.com/jasper/quptime/internal/crypto"
"github.com/jasper/quptime/internal/trust"
)
// testNode bundles everything one side of the handshake needs.
type testNode struct {
id string
dir string
assets *TLSAssets
fp string
}
// makeNode builds keys + cert + an empty trust store rooted at dir.
// After every disk-touching trust operation the caller must ensure
// QUPTIME_DIR points back at this node's dir.
func makeNode(t *testing.T, dir, id string) *testNode {
t.Helper()
t.Setenv("QUPTIME_DIR", dir)
priv, err := crypto.GenerateKeyPair(id)
if err != nil {
t.Fatal(err)
}
certPEM, err := crypto.LoadCertPEM()
if err != nil {
t.Fatal(err)
}
fp, err := crypto.FingerprintFromCertPEM(certPEM)
if err != nil {
t.Fatal(err)
}
store, err := trust.Load()
if err != nil {
t.Fatal(err)
}
return &testNode{
id: id,
dir: dir,
assets: &TLSAssets{Cert: certPEM, Key: priv, Trust: store},
fp: fp,
}
}
func (n *testNode) trust(t *testing.T, other *testNode, addr string) {
t.Helper()
t.Setenv("QUPTIME_DIR", n.dir)
if err := n.assets.Trust.Add(trust.Entry{
NodeID: other.id, Address: addr, Fingerprint: other.fp,
}); err != nil {
t.Fatal(err)
}
}
func TestRPCRoundtrip(t *testing.T) {
a := makeNode(t, t.TempDir(), "node-a")
b := makeNode(t, t.TempDir(), "node-b")
// pre-pick a free port; brief race window is acceptable for tests
tmpLn, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
addr := tmpLn.Addr().String()
tmpLn.Close()
a.trust(t, b, addr)
b.trust(t, a, addr)
srv := NewServer(a.assets)
srv.Handle("Echo", func(_ context.Context, peer string, payload json.RawMessage) (any, error) {
var s string
if err := json.Unmarshal(payload, &s); err != nil {
return nil, err
}
if peer != b.id {
return nil, errors.New("unexpected peer id: " + peer)
}
return s + " ack", nil
})
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
done := make(chan error, 1)
go func() { done <- srv.Serve(ctx, addr) }()
defer srv.Stop()
if !waitForDial(addr, 2*time.Second) {
t.Fatal("server did not start listening in time")
}
cli := NewClient(b.assets)
defer cli.Close()
callCtx, callCancel := context.WithTimeout(ctx, 5*time.Second)
defer callCancel()
var got string
if err := cli.Call(callCtx, a.id, addr, "Echo", "hello", &got); err != nil {
t.Fatalf("Call: %v", err)
}
if got != "hello ack" {
t.Errorf("got %q want %q", got, "hello ack")
}
}
func TestRPCUnknownMethod(t *testing.T) {
a := makeNode(t, t.TempDir(), "node-a")
b := makeNode(t, t.TempDir(), "node-b")
tmpLn, _ := net.Listen("tcp", "127.0.0.1:0")
addr := tmpLn.Addr().String()
tmpLn.Close()
a.trust(t, b, addr)
b.trust(t, a, addr)
srv := NewServer(a.assets)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go srv.Serve(ctx, addr)
defer srv.Stop()
if !waitForDial(addr, 2*time.Second) {
t.Fatal("server not up")
}
cli := NewClient(b.assets)
defer cli.Close()
err := cli.Call(ctx, a.id, addr, "DoesNotExist", nil, nil)
if err == nil {
t.Fatal("expected error for unknown method")
}
}
func TestRPCRejectsUntrustedPeer(t *testing.T) {
a := makeNode(t, t.TempDir(), "node-a")
b := makeNode(t, t.TempDir(), "node-b")
tmpLn, _ := net.Listen("tcp", "127.0.0.1:0")
addr := tmpLn.Addr().String()
tmpLn.Close()
// Deliberately omit b.trust(...) on the server side: b is unknown to a.
t.Setenv("QUPTIME_DIR", b.dir)
_ = b.assets.Trust.Add(trust.Entry{NodeID: a.id, Address: addr, Fingerprint: a.fp})
srv := NewServer(a.assets)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go srv.Serve(ctx, addr)
defer srv.Stop()
if !waitForDial(addr, 2*time.Second) {
t.Fatal("server not up")
}
cli := NewClient(b.assets)
defer cli.Close()
callCtx, callCancel := context.WithTimeout(ctx, 2*time.Second)
defer callCancel()
if err := cli.Call(callCtx, a.id, addr, "Ping", nil, nil); err == nil {
t.Error("untrusted client was admitted")
}
}
// waitForDial polls a TCP listener until it accepts a plain TCP
// connection, signalling that Serve has begun listening.
func waitForDial(addr string, max time.Duration) bool {
deadline := time.Now().Add(max)
for time.Now().Before(deadline) {
c, err := net.DialTimeout("tcp", addr, 200*time.Millisecond)
if err == nil {
_ = c.Close()
return true
}
time.Sleep(20 * time.Millisecond)
}
return false
}
+97
View File
@@ -0,0 +1,97 @@
package trust
import (
"crypto/x509"
"encoding/pem"
"testing"
"github.com/jasper/quptime/internal/crypto"
)
func TestRoundtripAndLookup(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
s, err := Load()
if err != nil {
t.Fatal(err)
}
if len(s.List()) != 0 {
t.Error("expected empty store")
}
if err := s.Add(Entry{NodeID: "n1", Address: "10.0.0.1:9001", Fingerprint: "sha256:abc"}); err != nil {
t.Fatal(err)
}
if err := s.Add(Entry{NodeID: "n2", Address: "10.0.0.2:9001", Fingerprint: "sha256:def"}); err != nil {
t.Fatal(err)
}
s2, err := Load()
if err != nil {
t.Fatal(err)
}
if len(s2.List()) != 2 {
t.Errorf("got %d entries after reload", len(s2.List()))
}
if e, ok := s2.Get("n1"); !ok || e.Fingerprint != "sha256:abc" {
t.Errorf("Get(n1) = %+v ok=%v", e, ok)
}
if e, ok := s2.LookupByFingerprint("sha256:def"); !ok || e.NodeID != "n2" {
t.Errorf("LookupByFingerprint = %+v ok=%v", e, ok)
}
removed, err := s2.Remove("n1")
if err != nil || !removed {
t.Fatalf("Remove returned %v err=%v", removed, err)
}
if _, ok := s2.Get("n1"); ok {
t.Error("entry still present after Remove")
}
s3, _ := Load()
if _, ok := s3.Get("n1"); ok {
t.Error("Remove did not persist")
}
}
func TestAddRequiresIDAndFingerprint(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
s, _ := Load()
if err := s.Add(Entry{NodeID: "n1"}); err == nil {
t.Error("missing fingerprint should error")
}
if err := s.Add(Entry{Fingerprint: "fp"}); err == nil {
t.Error("missing node id should error")
}
}
func TestVerifyPeerCertPinsFingerprint(t *testing.T) {
t.Setenv("QUPTIME_DIR", t.TempDir())
if _, err := crypto.GenerateKeyPair("peer-1"); err != nil {
t.Fatal(err)
}
certPEM, _ := crypto.LoadCertPEM()
block, _ := pem.Decode(certPEM)
cert, _ := x509.ParseCertificate(block.Bytes)
fp := crypto.Fingerprint(cert)
s, _ := Load()
// Untrusted: should reject.
if err := s.VerifyPeerCert([][]byte{cert.Raw}, nil); err == nil {
t.Error("untrusted cert was accepted")
}
if err := s.Add(Entry{NodeID: "peer-1", Fingerprint: fp}); err != nil {
t.Fatal(err)
}
// Now trusted.
if err := s.VerifyPeerCert([][]byte{cert.Raw}, nil); err != nil {
t.Errorf("trusted cert rejected: %v", err)
}
// No certs presented at all should error.
if err := s.VerifyPeerCert(nil, nil); err == nil {
t.Error("empty cert chain was accepted")
}
}