From 7bc33b18374f2a11d4a8d0ff1c2deb93ca82f279 Mon Sep 17 00:00:00 2001 From: Axodouble Date: Fri, 15 May 2026 05:03:06 +0000 Subject: [PATCH] v0.0.1 release --- .gitignore | 34 ++++++ CHANGELOG.md | 86 +++++++++++++++ README.md | 13 ++- docs/deployment/docker.md | 11 +- docs/deployment/tailscale.md | 2 +- docs/installation.md | 3 +- install.sh | 203 +++++++++++++++++++++++++++++------ 7 files changed, 310 insertions(+), 42 deletions(-) create mode 100644 .gitignore create mode 100644 CHANGELOG.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..8c0cde7 --- /dev/null +++ b/.gitignore @@ -0,0 +1,34 @@ +# Build artifacts +/qu +/qu-* +/dist/ +*.exe +*.test +*.out + +# Go workspace / module cache (only relevant if vendored) +/vendor/ + +# Local node state — never commit anything that looks like a data dir +/quptime/ +/etc/quptime/ +node.yaml +cluster.yaml +trust.yaml +keys/ + +# Compose / secrets +.env +.env.local +*.local.yml +*.local.yaml + +# Editor / OS scratch +*.swp +*.swo +*~ +.DS_Store + +# Test / coverage +coverage.out +coverage.html diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..8f9f09a --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,86 @@ +# Changelog + +All notable changes to this project are documented here. The format +follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and +this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [v0.0.1] — 2026-05-15 + +Initial public release. + +### Added + +- **Quorum-based uptime monitoring.** Multiple cooperating nodes run + the same probes (HTTP, TCP, ICMP) and vote on the cluster-wide + truth. A check flips state only after two consecutive aggregate + evaluations agree (hysteresis), so single-node flake doesn't page + anyone. +- **Deterministic master election.** Among the live members of the + quorum the lexicographically smallest NodeID wins — no negotiation + step, no split-brain window. +- **mTLS inter-node transport** with TLS 1.3 minimum, SSH-style + fingerprint pinning, and a pre-shared `cluster_secret` gating the + Join RPC. +- **Replicated `cluster.yaml`** carrying peers, checks, and alerts. + Master is the only writer; followers receive monotonic-versioned + snapshots and converge on the latest. Hand-edits to the file on any + node are picked up by the manual-edit watcher and forwarded through + the master. +- **HTTP, TCP, and ICMP probes** with configurable interval, + timeout, expected status, and optional body-substring match. ICMP + defaults to unprivileged UDP-mode pings so the daemon can run as a + non-root user. +- **SMTP and Discord alerts** with optional Go `text/template` + subject/body overrides per alert, default-attach mode (`default: + true`), and per-check opt-outs via `suppress_alert_ids`. +- **Docker-friendly env-var configuration.** Every field in + `node.yaml` can also be supplied via a `QUPTIME_*` environment + variable; `qu serve` auto-initialises a fresh data volume from + these on first start, so `docker compose up` is enough to launch a + node. +- **Interactive TUI** (`qu tui`) for peers, checks, and alerts with + live refresh. +- **Hardened systemd unit** shipped via `install.sh`: dedicated + `quptime` user, `ProtectSystem=strict`, all capabilities dropped by + default. +- **Multi-arch Docker images** (`linux/amd64`, `linux/arm64`) + published to `git.cer.sh/axodouble/quptime`. +- **Static Linux binaries** (`amd64`, `arm64`) published per tag with + a `SHA256SUMS` file; the official installer verifies the checksum + before placing the binary on disk. + +### Security + +- Cluster secret is compared in constant time + (`crypto/subtle.ConstantTimeCompare`). +- Self-signed RSA certs minted at `qu init`; SPKI SHA-256 + fingerprints are what's pinned, matching the canonical OpenSSL + representation. +- Private keys are written with mode `0600`; data and runtime + directories with `0700`/`0750`. +- All `cluster.yaml` writes go through an atomic `tmpfile + rename`. +- `install.sh` downloads the published `SHA256SUMS` and refuses to + install if the downloaded binary doesn't match. + +### Known limitations + +- **Cluster-wide secret distribution.** SMTP passwords and Discord + webhook URLs configured via `qu alert add …` are stored in + `cluster.yaml`, which is replicated to every node. Treat every node + as having read access to every alert credential. Restrict who can + reach the data directory accordingly. See + [docs/security.md](docs/security.md) for the threat model. +- **No automatic key rotation.** Rolling a node's identity means + wiping its data directory, running `qu init` again, and re-adding + it from another node. +- **No historical metrics.** Only the current aggregate state is kept + in memory. There is no built-in graph store, SLA calculator, or + audit log. +- **Master-flap state.** Aggregator hysteresis state lives in + memory on the current master. When leadership changes the new + master starts from `StateUnknown` and re-accumulates hysteresis — + expect a few seconds of delayed alerting after a master switch. +- **No release signing beyond SHA256SUMS** (no cosign / GPG). + Planned for a future release. + +[v0.0.1]: https://git.cer.sh/axodouble/quptime/releases/tag/v0.0.1 diff --git a/README.md b/README.md index e4ff6bb..67a8410 100644 --- a/README.md +++ b/README.md @@ -88,7 +88,7 @@ go build -o qu ./cmd/qu To stamp the version into the binary: ```sh -go build -ldflags "-X main.version=v0.1.0" -o qu ./cmd/qu +go build -ldflags "-X main.version=v0.0.1" -o qu ./cmd/qu qu --version ``` @@ -100,7 +100,7 @@ amd64 and arm64, and publishes them as a Gitea release with a `SHA256SUMS` file alongside. ```sh -git tag v0.1.0 +git tag v0.0.1 git push --tags ``` @@ -166,6 +166,15 @@ c0d4... charlie.example.com:9901 true 2026-05-12T15:01:32Z ## Adding checks and alerts +> ⚠️ **Alert credentials are replicated cluster-wide.** SMTP passwords +> and Discord webhook URLs live in `cluster.yaml`, which is mirrored to +> every node. Any node that can read its own data directory can read +> every alert secret. Treat compromising one node as compromising every +> alert credential, and restrict who can reach `$QUPTIME_DIR` on each +> host (the hardened systemd unit and the Docker image both default to +> `0700`/`0750`). See [docs/security.md](docs/security.md) for the full +> threat model. + ```sh # alerts first so checks can reference them qu alert add discord oncall --webhook https://discord.com/api/webhooks/... diff --git a/docs/deployment/docker.md b/docs/deployment/docker.md index d86884b..cefd94c 100644 --- a/docs/deployment/docker.md +++ b/docs/deployment/docker.md @@ -9,8 +9,9 @@ daemon can bind privileged ports and open ICMP sockets; override with ``` git.cer.sh/axodouble/quptime:master # tip of main, multi-arch -git.cer.sh/axodouble/quptime:v0.1.0 # tagged release -git.cer.sh/axodouble/quptime:v0.1.0-amd64 # single-arch (if you must pin) +git.cer.sh/axodouble/quptime:latest # latest tagged release +git.cer.sh/axodouble/quptime:v0.0.1 # specific tagged release +git.cer.sh/axodouble/quptime:latest-amd64 # single-arch (if you must pin) ``` The image embeds `QUPTIME_DIR=/etc/quptime` and declares it a volume — @@ -24,7 +25,7 @@ For a development cluster or a single-node smoke test: # compose.yaml services: quptime: - image: git.cer.sh/axodouble/quptime:v0.1.0 + image: git.cer.sh/axodouble/quptime:latest container_name: quptime restart: unless-stopped environment: @@ -76,7 +77,7 @@ For local testing of the full quorum machinery without three machines: ```yaml # compose.yaml x-quptime: &quptime - image: git.cer.sh/axodouble/quptime:v0.1.0 + image: git.cer.sh/axodouble/quptime:latest restart: unless-stopped cap_add: - NET_RAW @@ -146,7 +147,7 @@ The natural unit is one compose file per host, each running one # /etc/qu-stack/compose.yaml services: quptime: - image: git.cer.sh/axodouble/quptime:v0.1.0 + image: git.cer.sh/axodouble/quptime:latest container_name: quptime restart: unless-stopped environment: diff --git a/docs/deployment/tailscale.md b/docs/deployment/tailscale.md index 1b26be7..ddb16ef 100644 --- a/docs/deployment/tailscale.md +++ b/docs/deployment/tailscale.md @@ -51,7 +51,7 @@ services: restart: unless-stopped quptime: - image: git.cer.sh/axodouble/quptime:v0.1.0 + image: git.cer.sh/axodouble/quptime:latest container_name: quptime environment: # host:port other QUptime nodes use to reach this one. Should be diff --git a/docs/installation.md b/docs/installation.md index 71ac850..0eb48e7 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -79,7 +79,8 @@ registry on every tag and every push to `master`: ``` git.cer.sh/axodouble/quptime:master # tip of main -git.cer.sh/axodouble/quptime:v0.1.0 # tagged release +git.cer.sh/axodouble/quptime:latest # latest tagged release +git.cer.sh/axodouble/quptime:v0.0.1 # pinned release ``` See the [Docker deployment guide](deployment/docker.md) for compose diff --git a/install.sh b/install.sh index 443642f..4569096 100644 --- a/install.sh +++ b/install.sh @@ -1,23 +1,30 @@ #!/bin/bash +# QUptime installer. +# +# Downloads the latest released `qu` binary from the Gitea release +# page, verifies it against the published SHA256SUMS, installs it to +# /usr/local/bin, and (on systemd hosts) drops in a hardened +# quptime.service that matches the unit documented in +# docs/deployment/systemd.md. Idempotent — re-running upgrades the +# binary and refreshes the unit without touching the data directory. set -euo pipefail INSTALL_BIN="/usr/local/bin/qu" -SERVICE_FILE="/etc/systemd/system/qu-serve.service" -SERVICE_USER="${SUDO_USER:-$(whoami)}" -SERVICE_GROUP="$(id -gn "$SERVICE_USER" 2>/dev/null || echo root)" +SERVICE_FILE="/etc/systemd/system/quptime.service" +SERVICE_NAME="$(basename "$SERVICE_FILE")" +SERVICE_USER="quptime" +SERVICE_GROUP="quptime" +DATA_DIR="/etc/quptime" +REPO_API="https://git.cer.sh/api/v1/repos/axodouble/quptime/releases/latest" +RELEASE_BASE="https://git.cer.sh/axodouble/quptime/releases/download" fail() { echo "Error: $*" >&2 exit 1 } -echo_cmd() { - echo -e "\033[90m> $1\033[0m" - eval "$1" -} - require_command() { - command -v "$1" > /dev/null 2>&1 || fail "$1 is not installed. Please install $1 and try again." + command -v "$1" >/dev/null 2>&1 || fail "$1 is not installed. Please install $1 and try again." } write_completion() { @@ -31,52 +38,182 @@ write_completion() { return 1 } -require_command jq require_command curl +require_command jq +require_command sha256sum +require_command install +require_command mktemp + +# --- target architecture ------------------------------------------------ +case "$(uname -m)" in + x86_64) ARCH=amd64 ;; + aarch64|arm64) ARCH=arm64 ;; + *) fail "unsupported architecture: $(uname -m). Pre-built binaries are published for amd64 and arm64 only — build from source for other platforms." ;; +esac if [ ! -w "$(dirname "$INSTALL_BIN")" ]; then - fail "You are not allowed to write to $(dirname "$INSTALL_BIN"). Run this script with sudo or install qu manually." + fail "Cannot write to $(dirname "$INSTALL_BIN"). Run this script with sudo, or set INSTALL_BIN to a writable location." fi -RELEASE=$(curl -s https://git.cer.sh/api/v1/repos/axodouble/quptime/releases/latest | jq -r '.tag_name') +# --- latest release tag ------------------------------------------------- +RELEASE=$(curl -fsSL "$REPO_API" | jq -r '.tag_name') +[ -n "$RELEASE" ] && [ "$RELEASE" != "null" ] \ + || fail "could not determine the latest release tag from $REPO_API" -echo_cmd "curl -L -o '$INSTALL_BIN' 'https://git.cer.sh/axodouble/quptime/releases/download/${RELEASE}/qu-${RELEASE}-linux-amd64'" -echo_cmd "chmod +x '$INSTALL_BIN'" -echo "> qu has been installed to $INSTALL_BIN" +BINARY_NAME="qu-${RELEASE}-linux-${ARCH}" +BINARY_URL="${RELEASE_BASE}/${RELEASE}/${BINARY_NAME}" +SUMS_URL="${RELEASE_BASE}/${RELEASE}/SHA256SUMS" +# --- download + verify -------------------------------------------------- +# Stage in a temp dir so a failed verification never leaves a partial +# or unverified binary on disk. +TMPDIR=$(mktemp -d) +trap 'rm -rf "$TMPDIR"' EXIT + +echo "> downloading $BINARY_NAME" +curl -fsSL --proto '=https' --tlsv1.2 -o "$TMPDIR/$BINARY_NAME" "$BINARY_URL" +echo "> downloading SHA256SUMS" +curl -fsSL --proto '=https' --tlsv1.2 -o "$TMPDIR/SHA256SUMS" "$SUMS_URL" + +echo "> verifying checksum" +# Pull just our binary's entry so sha256sum -c doesn't fail on the +# arches we didn't download. +( + cd "$TMPDIR" + if ! grep -E "[[:space:]]\\*?${BINARY_NAME}\$" SHA256SUMS > expected.sum; then + fail "no entry for $BINARY_NAME in published SHA256SUMS — refusing to install" + fi + if ! sha256sum -c expected.sum >/dev/null 2>&1; then + echo "expected: $(awk '{print $1}' expected.sum)" + echo "actual: $(sha256sum "$BINARY_NAME" | awk '{print $1}')" + fail "checksum mismatch for $BINARY_NAME — refusing to install" + fi +) +echo "> checksum OK" + +install -m 0755 "$TMPDIR/$BINARY_NAME" "$INSTALL_BIN" +echo "> qu ${RELEASE} installed to $INSTALL_BIN" + +# --- shell completions -------------------------------------------------- if "$INSTALL_BIN" --help 2>/dev/null | grep -q "completion"; then write_completion bash /usr/share/bash-completion/completions/qu \ - || write_completion bash /etc/bash_completion.d/qu || true - write_completion zsh /usr/share/zsh/site-functions/_qu || true - write_completion fish /usr/share/fish/vendor_completions.d/qu.fish || true + || write_completion bash /etc/bash_completion.d/qu \ + || true + write_completion zsh /usr/share/zsh/site-functions/_qu || true + write_completion fish /usr/share/fish/vendor_completions.d/qu.fish || true else echo "> qu does not expose completion support; skipping shell completion installation." fi -if ! command -v systemctl > /dev/null 2>&1; then - echo "> Warning: systemd is not available on this system. qu serve will not be automatically started on boot." - echo "Installation complete, before starting qu serve, make sure to run qu init and read the documentation." +# --- systemd unit ------------------------------------------------------- +if ! command -v systemctl >/dev/null 2>&1; then + echo + echo "> systemd is not available on this system. Installation stops here." + echo "> Run \`qu serve\` manually (or wire it into the supervisor of your choice)." exit 0 fi -echo "> Creating systemd service file for qu serve..." -cat > "$SERVICE_FILE" </dev/null 2>&1; then + echo "> creating system user $SERVICE_USER" + useradd --system --no-create-home --shell /usr/sbin/nologin "$SERVICE_USER" +fi + +install -d -o "$SERVICE_USER" -g "$SERVICE_GROUP" -m 0750 "$DATA_DIR" + +echo "> writing $SERVICE_FILE" +cat > "$SERVICE_FILE" <<'EOF' [Unit] -Description=QUptime Serve -After=network.target +Description=QUptime distributed uptime monitor +Documentation=https://git.cer.sh/axodouble/quptime +Wants=network-online.target +After=network-online.target [Service] -ExecStart=$INSTALL_BIN serve +Type=simple +ExecStart=/usr/local/bin/qu serve Restart=always -User=$SERVICE_USER -Group=$SERVICE_GROUP +RestartSec=5s + +User=quptime +Group=quptime + +# Where state lives. RuntimeDirectory creates /var/run/quptime/ each +# boot owned by User:Group with mode 0750. +Environment=QUPTIME_DIR=/etc/quptime +RuntimeDirectory=quptime +RuntimeDirectoryMode=0750 +ReadWritePaths=/etc/quptime /var/run/quptime + +# Hardening. Comment out individual directives if a probe needs +# something we've revoked. +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +PrivateTmp=true +PrivateDevices=true +ProtectKernelTunables=true +ProtectKernelModules=true +ProtectControlGroups=true +ProtectClock=true +ProtectHostname=true +RestrictNamespaces=true +RestrictRealtime=true +RestrictSUIDSGID=true +LockPersonality=true +MemoryDenyWriteExecute=true + +# Network access is required (we're a network monitor). Keep address +# families minimal — AF_NETLINK is needed for some libc lookups. +RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK + +# If you need raw ICMP, *also* uncomment: +# AmbientCapabilities=CAP_NET_RAW +# CapabilityBoundingSet=CAP_NET_RAW +# Otherwise drop all capabilities: +CapabilityBoundingSet= [Install] WantedBy=multi-user.target -EOL +EOF -echo_cmd "systemctl daemon-reload" -echo_cmd "systemctl enable $(basename "$SERVICE_FILE")" -echo "> qu serve service has been created and enabled. You can start it with 'systemctl start $(basename "$SERVICE_FILE")'" +systemctl daemon-reload +systemctl enable "$SERVICE_NAME" >/dev/null +echo "> ${SERVICE_NAME} installed and enabled (not yet started)" -echo "Installation complete, before starting qu serve, make sure to run qu init and read the documentation." +cat <:9901 + # On follower nodes, also set the shared join secret: + # Environment=QUPTIME_CLUSTER_SECRET= + + b) Or run \`qu init\` once explicitly: + + sudo -u ${SERVICE_USER} QUPTIME_DIR=${DATA_DIR} \\ + qu init --advertise :9901 + + 2. Start the service: + + sudo systemctl start ${SERVICE_NAME} + sudo -u ${SERVICE_USER} qu status + + 3. For ICMP checks, the daemon defaults to unprivileged UDP-mode + pings — those need the ping_group_range sysctl widened to include + the ${SERVICE_USER} GID, or grant CAP_NET_RAW in the unit. See + docs/deployment/systemd.md for the recipes. + +Full documentation: https://git.cer.sh/axodouble/quptime/src/branch/master/docs +EOF