Compare commits
6 Commits
v0.0.2-hf0
..
master
| Author | SHA1 | Date | |
|---|---|---|---|
| a1d74cf36d | |||
| f60b0a0609 | |||
| ea30dbb895 | |||
| 1e2e382867 | |||
| ed25e9ed68 | |||
| c55482664c |
@@ -4,6 +4,51 @@ All notable changes to this project are documented here. The format
|
|||||||
follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and
|
follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and
|
||||||
this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [v0.1.1] — 2026-05-15
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`install.sh` now repairs data-dir permissions on every run.**
|
||||||
|
Re-running the installer reasserts the canonical ownership
|
||||||
|
(`quptime:quptime`) and modes across `/etc/quptime/` — `0750` on
|
||||||
|
the dir, `0700` on `keys/`, `0600` on `node.yaml`, `cluster.yaml`,
|
||||||
|
`trust.yaml`, and `keys/private.pem`, `0644` on `keys/public.pem`
|
||||||
|
and `keys/cert.pem`. Makes the installer the one-step recovery
|
||||||
|
path when something has tampered with modes (e.g. a stray
|
||||||
|
`chmod -R`, a backup restore, or an accidental `sudo qu init`
|
||||||
|
that left files owned by root). Unknown files in the dir are left
|
||||||
|
alone.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **CLI socket lookup as the daemon user.** `sudo -u quptime qu …`
|
||||||
|
no longer fails with `dial daemon socket /tmp/quptime-quptime/…:
|
||||||
|
no such file or directory` while the system daemon is running.
|
||||||
|
`config.SocketPath()` now probes the canonical systemd location
|
||||||
|
(`/run/quptime/quptime.sock`, then `/var/run/quptime/quptime.sock`)
|
||||||
|
regardless of euid before falling back to per-user paths, so the
|
||||||
|
CLI reaches the daemon's socket even when `sudo` has stripped
|
||||||
|
`RUNTIME_DIRECTORY` and `XDG_RUNTIME_DIR` from the environment.
|
||||||
|
|
||||||
|
## [v0.1.0] — 2026-05-15
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **Master election cooldown (2 min).** A returning peer with a
|
||||||
|
lower NodeID no longer reclaims master the instant it reappears.
|
||||||
|
It must stay continuously live for `DefaultMasterCooldown`
|
||||||
|
(2 minutes) before displacing the incumbent. Bootstrap and
|
||||||
|
quorum-regained-from-empty still elect immediately; the cooldown
|
||||||
|
only protects an active incumbent. Fixes #3: a self-monitoring
|
||||||
|
master (TCP check on its own `:9901`) would otherwise flap the
|
||||||
|
role in lock-step with its own restart.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- #1 Previously up services are alerted as going back up if the master goes down.
|
||||||
|
Ignore `unknown` -> `up` transitions during master election; still
|
||||||
|
alert on `unknown` -> `down` by design.
|
||||||
|
|
||||||
## [v0.0.2] — 2026-05-15
|
## [v0.0.2] — 2026-05-15
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
@@ -93,3 +138,5 @@ Initial public release.
|
|||||||
Planned for a future release.
|
Planned for a future release.
|
||||||
|
|
||||||
[v0.0.1]: https://git.cer.sh/axodouble/quptime/releases/tag/v0.0.1
|
[v0.0.1]: https://git.cer.sh/axodouble/quptime/releases/tag/v0.0.1
|
||||||
|
[v0.1.0]: https://git.cer.sh/axodouble/quptime/releases/tag/v0.1.0
|
||||||
|
[v0.1.1]: https://git.cer.sh/axodouble/quptime/releases/tag/v0.1.1
|
||||||
@@ -94,7 +94,11 @@ the hysteresis that absorbs network blips.
|
|||||||
|
|
||||||
Master election is deterministic: among the live members of the quorum,
|
Master election is deterministic: among the live members of the quorum,
|
||||||
the node with the lexicographically smallest NodeID wins. No
|
the node with the lexicographically smallest NodeID wins. No
|
||||||
negotiation, no split-brain window.
|
negotiation, no split-brain window. A 2-minute **master cooldown**
|
||||||
|
keeps the current master in place until a returning lower-NodeID peer
|
||||||
|
has been continuously live for the full window, so a self-monitoring
|
||||||
|
master that briefly drops doesn't flap the role back the instant it
|
||||||
|
reappears.
|
||||||
|
|
||||||
`cluster.yaml` is the single replicated source of truth (peers, checks,
|
`cluster.yaml` is the single replicated source of truth (peers, checks,
|
||||||
alerts). Mutations from the CLI route through the master, which bumps a
|
alerts). Mutations from the CLI route through the master, which bumps a
|
||||||
|
|||||||
@@ -118,6 +118,35 @@ The `term` integer in `qu status` is bumped every time the elected
|
|||||||
master changes (including transitions to and from "no master"). Use it
|
master changes (including transitions to and from "no master"). Use it
|
||||||
to spot flappy clusters.
|
to spot flappy clusters.
|
||||||
|
|
||||||
|
### Master cooldown
|
||||||
|
|
||||||
|
The bare "lowest-live-NodeID wins" rule has one unpleasant edge: if the
|
||||||
|
primary master is also being monitored by `qu` itself (a TCP check on
|
||||||
|
its own `:9901`, say), a brief restart causes a master flap *and* a
|
||||||
|
state flap in lock-step. The new master sees the old master come back
|
||||||
|
on the next tick and immediately hands the role back, taking the
|
||||||
|
just-recovering node from `unknown` to `up` with no quiet period.
|
||||||
|
|
||||||
|
To absorb that, the quorum manager applies a **master cooldown**
|
||||||
|
(`DefaultMasterCooldown`, 2 minutes) before a peer with a lower NodeID
|
||||||
|
may displace the incumbent. The rules:
|
||||||
|
|
||||||
|
- The cooldown timer starts on the **first heartbeat after a
|
||||||
|
dead-after gap** — i.e. when a peer re-enters the live set after
|
||||||
|
having aged out. Continuous heartbeats never restart it.
|
||||||
|
- A flap during the cooldown resets the timer; the returning peer
|
||||||
|
must clear a full fresh window before taking over.
|
||||||
|
- The cooldown applies **only when an incumbent master exists**.
|
||||||
|
Bootstrap and quorum-regained-from-empty elect the lowest-NodeID
|
||||||
|
live peer immediately, because there is no role to protect.
|
||||||
|
- If the incumbent drops out of the live set, the cooldown is
|
||||||
|
irrelevant — any live peer may take over without waiting.
|
||||||
|
|
||||||
|
The constant lives in `internal/quorum/manager.go`. Lower it for
|
||||||
|
faster fail-back at the cost of monitoring-self flap risk; raise it
|
||||||
|
to give a recovering master longer to settle before reclaiming the
|
||||||
|
role.
|
||||||
|
|
||||||
## Catch-up when a node reconnects
|
## Catch-up when a node reconnects
|
||||||
|
|
||||||
This is the scenario most people ask about: node C is offline, the
|
This is the scenario most people ask about: node C is offline, the
|
||||||
|
|||||||
@@ -70,6 +70,15 @@ What it does:
|
|||||||
`/etc/systemd/system/quptime.service` (hardened — matches the unit
|
`/etc/systemd/system/quptime.service` (hardened — matches the unit
|
||||||
in [systemd.md](deployment/systemd.md)). Enables but does not start
|
in [systemd.md](deployment/systemd.md)). Enables but does not start
|
||||||
the service, so you can configure identity before first boot.
|
the service, so you can configure identity before first boot.
|
||||||
|
5. Repairs ownership and modes under `/etc/quptime/` to the canonical
|
||||||
|
layout (`0750` on the dir, `0700` on `keys/`, `0600` on
|
||||||
|
`node.yaml` / `cluster.yaml` / `trust.yaml` / `keys/private.pem`,
|
||||||
|
`0644` on `keys/public.pem` / `keys/cert.pem`). This makes the
|
||||||
|
installer idempotent for permission damage — if something
|
||||||
|
tightened or loosened modes (a stray `chmod -R`, a misguided
|
||||||
|
backup restore, an accidental `sudo qu init`), re-running
|
||||||
|
`install.sh` puts everything back without touching the contents
|
||||||
|
of those files.
|
||||||
|
|
||||||
## Build from source
|
## Build from source
|
||||||
|
|
||||||
|
|||||||
@@ -183,6 +183,7 @@ Options:
|
|||||||
| `quorum` | `true` | `false` — no mutations, no alerts. |
|
| `quorum` | `true` | `false` — no mutations, no alerts. |
|
||||||
| `master` | a NodeID | `(none — ...)` — quorum lost or election in flight. |
|
| `master` | a NodeID | `(none — ...)` — quorum lost or election in flight. |
|
||||||
| `term` | slow growth | rapid growth → master flapping, network unstable. |
|
| `term` | slow growth | rapid growth → master flapping, network unstable. |
|
||||||
|
| `master` after a restart of the primary | unchanged for ~2 min, then bumps back | bumps back immediately → cooldown disabled or misconfigured. |
|
||||||
| `config ver` | identical across nodes | divergence → a node is stuck pulling. |
|
| `config ver` | identical across nodes | divergence → a node is stuck pulling. |
|
||||||
|
|
||||||
A simple cron sentinel on each node:
|
A simple cron sentinel on each node:
|
||||||
|
|||||||
+25
-2
@@ -35,6 +35,25 @@ flapping. Causes:
|
|||||||
- Heartbeat timeouts (default 4s) are too tight for your inter-node
|
- Heartbeat timeouts (default 4s) are too tight for your inter-node
|
||||||
link. Rebuild with a higher `DefaultDeadAfter` if you need it.
|
link. Rebuild with a higher `DefaultDeadAfter` if you need it.
|
||||||
|
|
||||||
|
## Primary master came back but the cluster hasn't switched to it
|
||||||
|
|
||||||
|
**What it means.** Working as designed. After a returning peer with a
|
||||||
|
lower NodeID rejoins, the quorum manager waits
|
||||||
|
`DefaultMasterCooldown` (2 minutes) before letting it displace the
|
||||||
|
incumbent. The window prevents a self-monitoring master from flapping
|
||||||
|
the role in lock-step with its own restart.
|
||||||
|
|
||||||
|
How to confirm:
|
||||||
|
|
||||||
|
- `qu status` on every node shows the same (current) master and a
|
||||||
|
steady `term` — not flapping. The lower-NodeID peer is in the live
|
||||||
|
set but not yet master.
|
||||||
|
- After ~2 minutes of continuous liveness, `term` bumps once and the
|
||||||
|
master switches to the lower-NodeID peer.
|
||||||
|
|
||||||
|
If you need a different window, change `DefaultMasterCooldown` in
|
||||||
|
`internal/quorum/manager.go` and rebuild.
|
||||||
|
|
||||||
## A check is stuck in `unknown`
|
## A check is stuck in `unknown`
|
||||||
|
|
||||||
**What it means.** The aggregator has no fresh reports for that check.
|
**What it means.** The aggregator has no fresh reports for that check.
|
||||||
@@ -153,7 +172,9 @@ still see this error, the most likely causes are:
|
|||||||
|
|
||||||
- The data directory is read-only or owned by a different user — the
|
- The data directory is read-only or owned by a different user — the
|
||||||
bootstrap can't write `node.yaml`. Fix permissions on
|
bootstrap can't write `node.yaml`. Fix permissions on
|
||||||
`$QUPTIME_DIR`.
|
`$QUPTIME_DIR`. The fastest fix on a standard install is just to
|
||||||
|
re-run `install.sh` — it reasserts the canonical ownership and
|
||||||
|
modes on the whole tree without touching your config.
|
||||||
- Something else removed `node.yaml` mid-run (a config-management
|
- Something else removed `node.yaml` mid-run (a config-management
|
||||||
tool, a misconfigured volume). Re-run `qu serve` and it will
|
tool, a misconfigured volume). Re-run `qu serve` and it will
|
||||||
rebuild from env, or run `qu init` manually with the flags you
|
rebuild from env, or run `qu init` manually with the flags you
|
||||||
@@ -178,7 +199,9 @@ load private key: ...
|
|||||||
```
|
```
|
||||||
|
|
||||||
Permissions on `keys/private.pem` are wrong — should be 0600 and owned
|
Permissions on `keys/private.pem` are wrong — should be 0600 and owned
|
||||||
by the daemon user. Fix and restart.
|
by the daemon user. Fix and restart. Re-running `install.sh` on a
|
||||||
|
standard install is the easiest path: it repairs ownership and modes
|
||||||
|
on the entire data dir.
|
||||||
|
|
||||||
## Probes look much slower than expected
|
## Probes look much slower than expected
|
||||||
|
|
||||||
|
|||||||
+65
-1
@@ -175,6 +175,63 @@ fi
|
|||||||
|
|
||||||
install -d -o "$SERVICE_USER" -g "$SERVICE_GROUP" -m 0750 "$DATA_DIR"
|
install -d -o "$SERVICE_USER" -g "$SERVICE_GROUP" -m 0750 "$DATA_DIR"
|
||||||
|
|
||||||
|
# Repair ownership and permissions on the data dir's contents. Catches:
|
||||||
|
# - re-running the installer over a previous install where the
|
||||||
|
# service user/group changed.
|
||||||
|
# - the operator ran `qu init` or `qu serve` as root once (easy
|
||||||
|
# mistake: `sudo qu init` is shorter than the documented
|
||||||
|
# `sudo -u quptime qu init`). When the daemon runs as root its
|
||||||
|
# DataDir() resolves to /etc/quptime, so any files it writes land
|
||||||
|
# owned by root:root — the systemd service then fails with
|
||||||
|
# `open node.yaml: permission denied`.
|
||||||
|
# - someone or something (a stray `chmod -R`, a misguided backup
|
||||||
|
# restore) tightened or loosened modes. Re-running the installer
|
||||||
|
# should be enough to get back to a working baseline.
|
||||||
|
# The canonical layout (mirrors the modes the daemon writes itself
|
||||||
|
# in internal/config and internal/crypto):
|
||||||
|
# /etc/quptime/ quptime:quptime 0750
|
||||||
|
# /etc/quptime/keys/ quptime:quptime 0700
|
||||||
|
# /etc/quptime/node.yaml quptime:quptime 0600
|
||||||
|
# /etc/quptime/cluster.yaml quptime:quptime 0600
|
||||||
|
# /etc/quptime/trust.yaml quptime:quptime 0600
|
||||||
|
# /etc/quptime/keys/private.pem quptime:quptime 0600
|
||||||
|
# /etc/quptime/keys/public.pem quptime:quptime 0644
|
||||||
|
# /etc/quptime/keys/cert.pem quptime:quptime 0644
|
||||||
|
# The runtime dir /var/run/quptime is owned by systemd via
|
||||||
|
# RuntimeDirectory= and rebuilt at each service start, so we leave it
|
||||||
|
# alone.
|
||||||
|
repair_perms() {
|
||||||
|
# Always reset the top-level dir mode — `install -d` only sets it
|
||||||
|
# on creation, not on re-run.
|
||||||
|
chown "$SERVICE_USER:$SERVICE_GROUP" "$DATA_DIR"
|
||||||
|
chmod 0750 "$DATA_DIR"
|
||||||
|
|
||||||
|
# Reassert ownership across the whole tree in one pass.
|
||||||
|
if [ -n "$(ls -A "$DATA_DIR" 2>/dev/null)" ]; then
|
||||||
|
chown -R "$SERVICE_USER:$SERVICE_GROUP" "$DATA_DIR"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# keys/ is a directory with its own tighter mode.
|
||||||
|
if [ -d "$DATA_DIR/keys" ]; then
|
||||||
|
chmod 0700 "$DATA_DIR/keys"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Each known file gets its canonical mode if it exists. We don't
|
||||||
|
# create anything that isn't already there — that's `qu init`'s
|
||||||
|
# job — and we don't touch unknown files an operator may have
|
||||||
|
# parked in the dir.
|
||||||
|
local f
|
||||||
|
for f in node.yaml cluster.yaml trust.yaml keys/private.pem; do
|
||||||
|
[ -f "$DATA_DIR/$f" ] && chmod 0600 "$DATA_DIR/$f"
|
||||||
|
done
|
||||||
|
for f in keys/public.pem keys/cert.pem; do
|
||||||
|
[ -f "$DATA_DIR/$f" ] && chmod 0644 "$DATA_DIR/$f"
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
repair_perms
|
||||||
|
echo "> reasserted ownership ($SERVICE_USER:$SERVICE_GROUP) and modes under $DATA_DIR"
|
||||||
|
|
||||||
echo "> writing $SERVICE_FILE"
|
echo "> writing $SERVICE_FILE"
|
||||||
cat > "$SERVICE_FILE" <<'EOF'
|
cat > "$SERVICE_FILE" <<'EOF'
|
||||||
[Unit]
|
[Unit]
|
||||||
@@ -252,11 +309,18 @@ Next steps:
|
|||||||
# On follower nodes, also set the shared join secret:
|
# On follower nodes, also set the shared join secret:
|
||||||
# Environment=QUPTIME_CLUSTER_SECRET=<paste from first node>
|
# Environment=QUPTIME_CLUSTER_SECRET=<paste from first node>
|
||||||
|
|
||||||
b) Or run \`qu init\` once explicitly:
|
b) Or run \`qu init\` once explicitly. IMPORTANT: run as the
|
||||||
|
${SERVICE_USER} user, not root — otherwise node.yaml lands
|
||||||
|
owned by root and the service can't read it on start.
|
||||||
|
|
||||||
sudo -u ${SERVICE_USER} QUPTIME_DIR=${DATA_DIR} \\
|
sudo -u ${SERVICE_USER} QUPTIME_DIR=${DATA_DIR} \\
|
||||||
qu init --advertise <this-host>:9901
|
qu init --advertise <this-host>:9901
|
||||||
|
|
||||||
|
If you already ran it as root and the service is failing
|
||||||
|
with "permission denied" on node.yaml, repair with:
|
||||||
|
|
||||||
|
sudo chown -R ${SERVICE_USER}:${SERVICE_GROUP} ${DATA_DIR}
|
||||||
|
|
||||||
2. Start the service:
|
2. Start the service:
|
||||||
|
|
||||||
sudo systemctl start ${SERVICE_NAME}
|
sudo systemctl start ${SERVICE_NAME}
|
||||||
|
|||||||
@@ -16,6 +16,7 @@ import (
|
|||||||
"errors"
|
"errors"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
)
|
)
|
||||||
|
|
||||||
// Default file names. Callers should always go through DataDir() so an
|
// Default file names. Callers should always go through DataDir() so an
|
||||||
@@ -55,10 +56,47 @@ func DataDir() string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// SocketPath returns the unix socket used for local CLI ↔ daemon control.
|
// SocketPath returns the unix socket used for local CLI ↔ daemon control.
|
||||||
|
//
|
||||||
|
// Resolution order:
|
||||||
|
// 1. $QUPTIME_SOCKET — explicit operator override.
|
||||||
|
// 2. $RUNTIME_DIRECTORY — set by systemd when the unit declares
|
||||||
|
// RuntimeDirectory=quptime. This is the path the daemon uses
|
||||||
|
// when run under the packaged unit: /run/quptime/quptime.sock.
|
||||||
|
// 3. The canonical system socket path — /run/quptime/quptime.sock —
|
||||||
|
// if it exists. This catches the CLI side regardless of who is
|
||||||
|
// invoking it: `sudo -u quptime qu status` strips RUNTIME_DIRECTORY
|
||||||
|
// and XDG_RUNTIME_DIR, so without this probe the CLI falls all
|
||||||
|
// the way through to /tmp/quptime-<user>/… and reports "no such
|
||||||
|
// file" even while the daemon is happily listening.
|
||||||
|
// 4. /var/run/quptime/… when euid is 0 (CLI side, packaged installs
|
||||||
|
// on systems where /var/run isn't a symlink to /run).
|
||||||
|
// 5. $XDG_RUNTIME_DIR/quptime/… for user-mode installs.
|
||||||
|
// 6. /tmp/quptime-<user>/… as a last resort.
|
||||||
func SocketPath() string {
|
func SocketPath() string {
|
||||||
if v := os.Getenv("QUPTIME_SOCKET"); v != "" {
|
if v := os.Getenv("QUPTIME_SOCKET"); v != "" {
|
||||||
return v
|
return v
|
||||||
}
|
}
|
||||||
|
if v := os.Getenv("RUNTIME_DIRECTORY"); v != "" {
|
||||||
|
// systemd may pass multiple colon-separated entries when more
|
||||||
|
// than one RuntimeDirectory= is declared. Ours is single, but
|
||||||
|
// be defensive in case a future unit adds more.
|
||||||
|
if i := strings.IndexByte(v, ':'); i >= 0 {
|
||||||
|
v = v[:i]
|
||||||
|
}
|
||||||
|
return filepath.Join(v, SocketName)
|
||||||
|
}
|
||||||
|
// If a system-managed daemon is already listening, route there
|
||||||
|
// regardless of euid. Without this, `sudo -u quptime qu …` can't
|
||||||
|
// find the socket the daemon (also running as quptime) created
|
||||||
|
// via RuntimeDirectory=.
|
||||||
|
for _, p := range []string{
|
||||||
|
"/run/quptime/" + SocketName,
|
||||||
|
"/var/run/quptime/" + SocketName,
|
||||||
|
} {
|
||||||
|
if _, err := os.Stat(p); err == nil {
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
}
|
||||||
if os.Geteuid() == 0 {
|
if os.Geteuid() == 0 {
|
||||||
return "/var/run/quptime/" + SocketName
|
return "/var/run/quptime/" + SocketName
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -34,6 +34,12 @@ import (
|
|||||||
const (
|
const (
|
||||||
DefaultHeartbeatInterval = 1 * time.Second
|
DefaultHeartbeatInterval = 1 * time.Second
|
||||||
DefaultDeadAfter = 4 * time.Second
|
DefaultDeadAfter = 4 * time.Second
|
||||||
|
// DefaultMasterCooldown is the grace period a returning peer must
|
||||||
|
// stay continuously live before it's allowed to displace the
|
||||||
|
// currently-elected master. Without it, a self-monitoring master
|
||||||
|
// that briefly drops would reclaim the role immediately on return
|
||||||
|
// and disrupt anything watching its TCP port.
|
||||||
|
DefaultMasterCooldown = 2 * time.Minute
|
||||||
)
|
)
|
||||||
|
|
||||||
// VersionObserver is invoked whenever a heartbeat exchange reveals
|
// VersionObserver is invoked whenever a heartbeat exchange reveals
|
||||||
@@ -50,12 +56,14 @@ type Manager struct {
|
|||||||
|
|
||||||
heartbeatInterval time.Duration
|
heartbeatInterval time.Duration
|
||||||
deadAfter time.Duration
|
deadAfter time.Duration
|
||||||
|
masterCooldown time.Duration
|
||||||
|
|
||||||
mu sync.RWMutex
|
mu sync.RWMutex
|
||||||
term uint64
|
term uint64
|
||||||
masterID string
|
masterID string
|
||||||
lastSeen map[string]time.Time // peerID -> last contact (sent or recv)
|
lastSeen map[string]time.Time // peerID -> last contact (sent or recv)
|
||||||
addrOf map[string]string // peerID -> advertise addr (last known)
|
liveSince map[string]time.Time // peerID -> start of current liveness streak
|
||||||
|
addrOf map[string]string // peerID -> advertise addr (last known)
|
||||||
|
|
||||||
observer VersionObserver
|
observer VersionObserver
|
||||||
}
|
}
|
||||||
@@ -70,7 +78,9 @@ func New(selfID string, cluster *config.ClusterConfig, client *transport.Client)
|
|||||||
client: client,
|
client: client,
|
||||||
heartbeatInterval: DefaultHeartbeatInterval,
|
heartbeatInterval: DefaultHeartbeatInterval,
|
||||||
deadAfter: DefaultDeadAfter,
|
deadAfter: DefaultDeadAfter,
|
||||||
|
masterCooldown: DefaultMasterCooldown,
|
||||||
lastSeen: map[string]time.Time{},
|
lastSeen: map[string]time.Time{},
|
||||||
|
liveSince: map[string]time.Time{},
|
||||||
addrOf: map[string]string{},
|
addrOf: map[string]string{},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -242,7 +252,15 @@ func (m *Manager) tick(ctx context.Context) {
|
|||||||
|
|
||||||
func (m *Manager) markLive(id string) {
|
func (m *Manager) markLive(id string) {
|
||||||
m.mu.Lock()
|
m.mu.Lock()
|
||||||
m.lastSeen[id] = time.Now()
|
now := time.Now()
|
||||||
|
prev, ok := m.lastSeen[id]
|
||||||
|
// A peer entering its first liveness streak — or returning after
|
||||||
|
// the dead-after window expired — resets liveSince. Subsequent
|
||||||
|
// heartbeats within the streak leave it untouched.
|
||||||
|
if !ok || now.Sub(prev) > m.deadAfter {
|
||||||
|
m.liveSince[id] = now
|
||||||
|
}
|
||||||
|
m.lastSeen[id] = now
|
||||||
m.mu.Unlock()
|
m.mu.Unlock()
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -276,7 +294,41 @@ func (m *Manager) recomputeMaster() {
|
|||||||
|
|
||||||
var newMaster string
|
var newMaster string
|
||||||
if len(live) >= quorum && len(live) > 0 {
|
if len(live) >= quorum && len(live) > 0 {
|
||||||
newMaster = live[0] // lowest NodeID wins
|
// Without an incumbent the cluster is bootstrapping or
|
||||||
|
// has just regained quorum, so elect immediately — there's
|
||||||
|
// nothing to protect from a handoff.
|
||||||
|
if m.masterID == "" {
|
||||||
|
newMaster = live[0]
|
||||||
|
} else {
|
||||||
|
newMaster = m.masterID
|
||||||
|
now := time.Now()
|
||||||
|
incumbentLive := false
|
||||||
|
for _, id := range live {
|
||||||
|
if id == m.masterID {
|
||||||
|
incumbentLive = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// If the incumbent is no longer live, any live peer
|
||||||
|
// may take over without waiting.
|
||||||
|
if !incumbentLive {
|
||||||
|
newMaster = live[0]
|
||||||
|
} else {
|
||||||
|
// Incumbent is live. A peer with a lower NodeID
|
||||||
|
// may only displace it after it has stayed
|
||||||
|
// continuously live for masterCooldown.
|
||||||
|
for _, id := range live {
|
||||||
|
if id >= m.masterID {
|
||||||
|
break // sorted ascending — nobody lower left
|
||||||
|
}
|
||||||
|
since, ok := m.liveSince[id]
|
||||||
|
if ok && now.Sub(since) >= m.masterCooldown {
|
||||||
|
newMaster = id
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if newMaster != m.masterID {
|
if newMaster != m.masterID {
|
||||||
m.term++
|
m.term++
|
||||||
|
|||||||
@@ -119,6 +119,127 @@ func TestDeadAfterEvictsStaleLiveness(t *testing.T) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// heartbeatLoop simulates the production heartbeat cadence — calling
|
||||||
|
// markLive for the given peers more frequently than deadAfter, so a
|
||||||
|
// peer that's "live throughout" never has its liveSince reset by the
|
||||||
|
// dead-after gap heuristic. It returns when the context's deadline
|
||||||
|
// hits.
|
||||||
|
func heartbeatLoop(t *testing.T, m *Manager, dur time.Duration, peers ...string) {
|
||||||
|
t.Helper()
|
||||||
|
deadline := time.Now().Add(dur)
|
||||||
|
interval := m.deadAfter / 4
|
||||||
|
if interval < time.Millisecond {
|
||||||
|
interval = time.Millisecond
|
||||||
|
}
|
||||||
|
for time.Now().Before(deadline) {
|
||||||
|
for _, p := range peers {
|
||||||
|
m.markLive(p)
|
||||||
|
}
|
||||||
|
m.recomputeMaster()
|
||||||
|
time.Sleep(interval)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestReturningLowerIDWaitsForCooldown(t *testing.T) {
|
||||||
|
_, m := threeNode("b")
|
||||||
|
m.deadAfter = 80 * time.Millisecond
|
||||||
|
m.masterCooldown = 200 * time.Millisecond
|
||||||
|
|
||||||
|
// Bootstrap: all three live, "a" elected.
|
||||||
|
m.markLive("a")
|
||||||
|
m.markLive("b")
|
||||||
|
m.markLive("c")
|
||||||
|
m.recomputeMaster()
|
||||||
|
if m.Master() != "a" {
|
||||||
|
t.Fatalf("initial master=%q want a", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// "a" drops — only b/c heartbeat. Long enough to age a out and let
|
||||||
|
// b take over.
|
||||||
|
heartbeatLoop(t, m, 120*time.Millisecond, "b", "c")
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Fatalf("after a-drop master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// "a" returns. Verify b stays master for less than the cooldown.
|
||||||
|
heartbeatLoop(t, m, 120*time.Millisecond, "a", "b", "c")
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Errorf("mid-cooldown master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// Past the cooldown, a reclaims master.
|
||||||
|
heartbeatLoop(t, m, 120*time.Millisecond, "a", "b", "c")
|
||||||
|
if m.Master() != "a" {
|
||||||
|
t.Errorf("after cooldown master=%q want a", m.Master())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestCooldownResetsOnFlap(t *testing.T) {
|
||||||
|
_, m := threeNode("b")
|
||||||
|
m.deadAfter = 80 * time.Millisecond
|
||||||
|
m.masterCooldown = 200 * time.Millisecond
|
||||||
|
|
||||||
|
m.markLive("a")
|
||||||
|
m.markLive("b")
|
||||||
|
m.markLive("c")
|
||||||
|
m.recomputeMaster()
|
||||||
|
|
||||||
|
// a drops, b becomes master.
|
||||||
|
heartbeatLoop(t, m, 120*time.Millisecond, "b", "c")
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Fatalf("master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// a returns briefly, then drops again before cooldown elapses.
|
||||||
|
heartbeatLoop(t, m, 100*time.Millisecond, "a", "b", "c")
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Fatalf("during first cooldown master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
heartbeatLoop(t, m, 120*time.Millisecond, "b", "c") // a ages out again
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Fatalf("after a-reflap master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// a returns for the second time — cooldown restarts here.
|
||||||
|
// Wait less than a full cooldown — b should still be master.
|
||||||
|
heartbeatLoop(t, m, 100*time.Millisecond, "a", "b", "c")
|
||||||
|
if m.Master() != "b" {
|
||||||
|
t.Errorf("partway through fresh cooldown master=%q want b", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// Past the full fresh cooldown, a takes over.
|
||||||
|
heartbeatLoop(t, m, 150*time.Millisecond, "a", "b", "c")
|
||||||
|
if m.Master() != "a" {
|
||||||
|
t.Errorf("after fresh cooldown master=%q want a", m.Master())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func TestNewMasterAfterQuorumLossIgnoresCooldown(t *testing.T) {
|
||||||
|
_, m := threeNode("b")
|
||||||
|
m.deadAfter = 50 * time.Millisecond
|
||||||
|
m.masterCooldown = 1 * time.Hour // would block election if applied
|
||||||
|
|
||||||
|
// Bootstrap into no-master state by letting all peers age out.
|
||||||
|
m.markLive("a")
|
||||||
|
m.markLive("b")
|
||||||
|
m.markLive("c")
|
||||||
|
m.recomputeMaster()
|
||||||
|
time.Sleep(80 * time.Millisecond)
|
||||||
|
m.markLive("b")
|
||||||
|
m.recomputeMaster()
|
||||||
|
if m.Master() != "" {
|
||||||
|
t.Fatalf("master=%q want empty (quorum lost)", m.Master())
|
||||||
|
}
|
||||||
|
|
||||||
|
// Quorum regained — incumbent is empty, election must be immediate.
|
||||||
|
m.markLive("a")
|
||||||
|
m.markLive("b")
|
||||||
|
m.recomputeMaster()
|
||||||
|
if m.Master() != "a" {
|
||||||
|
t.Errorf("post-recovery master=%q want a (no cooldown when empty)", m.Master())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
func TestVersionObserverFiresOnHigherVersion(t *testing.T) {
|
func TestVersionObserverFiresOnHigherVersion(t *testing.T) {
|
||||||
cluster := &config.ClusterConfig{Version: 2}
|
cluster := &config.ClusterConfig{Version: 2}
|
||||||
m := New("a", cluster, nil)
|
m := New("a", cluster, nil)
|
||||||
|
|||||||
Reference in New Issue
Block a user