Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Cold-Window Real-Time Analysis

The cold-window is a continuously-refreshing analysis surface for your skrills ecosystem. It re-reads authoritative state from disk on every tick (no warm cache shortcuts), runs a four-tier alert policy with hysteresis and min-dwell over the snapshot, ranks hints with a recency-weighted scorer, and surfaces external research findings on a pull-only basis.

Two render targets, both consuming the same WindowSnapshot artifact:

  • TUI: ratatui-based panes in skrills-dashboard::cold_window (alert pane, hint pane, research pane, status bar), mounted in a crossterm raw-mode loop. Run with --tui (v0.8.2).
  • Browser: HTML page and Server-Sent Events stream in skrills-server::api::cold_window. Run with --browser (v0.8.0).

Both consume the same bus, so they can run together.

Quick start

Run the TUI against the engine’s demo producer, right in your terminal:

skrills cold-window --tui

Quit with q or Ctrl-C. Press ? for contextual help. Prefer a browser? Run the SSE surface instead (or alongside):

skrills cold-window --browser --port 8888

Open http://localhost:8888/dashboard in any modern browser. Either surface renders four panes:

  • Status bar: tick cadence with adaptive label (tick: 2.0s [base], tick: 4.0s [load 0.78], tick: 1.0s [active edit]), token-budget progress with a colored bar (green → cyan → yellow → red), per-tier alert counts, optional research-quota remaining.
  • Alerts: 4-tier list (Warning / Caution / Advisory / Status) sorted tier-then-recency, with per-tier coloring. Alerts carry a hysteresis band so re-arming requires re-crossing the matching *_clear value.
  • Hints: ranked by MultiSignalScorer formula (frequency × IMPACT_WEIGHT + impact × ACTIONABILITY_WEIGHT) / (ease + 1) × exp(-age_days / HALF_LIFE_DAYS). Pinned hints sort to the top regardless of score.
  • Research: pull-only side panel. Findings from GitHub, Hacker News, Lobsters, papers, and TRIZ analogies arrive asynchronously through the tome dispatcher. Empty by default. The dispatcher respects a token-bucket quota.

The TUI arranges those panes to fit the terminal, re-flowing live on resize:

  • Wide (≥ 80 columns): alerts over hints in a 60% left column, research filling the 40% right column.
  • Medium (60-79 columns): the same two-column layout with a slimmer research column so the alert and hint text keep their width.
  • Narrow (45-59 columns, e.g. a split pane): every pane stacks full-width top to bottom. A collapsed research pane shrinks to a fixed three-line badge so alerts and hints keep the room.
  • Compact (< 45 columns, e.g. a phone SSH session): only the focused pane renders, and Tab switches which pane is visible. Hiding panes beats squeezing them into unreadable slivers.

Below a hard floor of 20x6 the panes give way to a one-line “terminal too small” guard. The status bar stays pinned to the bottom row in every tier, with the contextual key hints for the focused pane right-aligned on it. z zooms the focused pane to the full body at any tier: the escape hatch when one pane needs all the room.

Keybindings

The default surface is deliberately minimal: every data-rich or configuration view opens as a modal overlay (drill-down details, keybinding help), so depth never costs permanent screen space. The bottom hint line shows only the keys valid for the focused pane, and ? opens the full reference scoped to it.

KeyScopeAction
q / Ctrl-CglobalQuit (q closes an open overlay first)
EscglobalClose the topmost overlay; unzoom; no-op at base
?globalToggle the help overlay
:globalOpen the command palette (type to filter, Enter runs)
Tab / Shift-TabglobalCycle pane focus
Up/Down, j/kglobalMove the focused pane’s selection
EnterglobalOpen detail for the selected item
zglobalZoom the focused pane
AalertsAcknowledge all non-warning alerts
dalertsDismiss the top warning
1-5 / 0hintsFilter by category / clear filter
PhintsPin the top hint
RresearchExpand or collapse the findings panel

Breaking change in 0.8.2: Esc no longer quits. It dismisses overlays (and zoom) the way it does in lazygit, gitui, and k9s. q and Ctrl-C remain the quit keys. Every action is reachable without CTRL/ALT modifiers, so the TUI stays usable from phone keyboards over SSH.

Ctrl-C exits cleanly within the 2-second shutdown budget. The browser sees a status event with reconnecting… while the server drains.

Design model and research basis

The TUI follows the “lazygit/gitui model”: a small fixed set of panes as the default surface, every data-rich or configurable view behind a modal overlay or drill-down, a one-line contextual hint bar for discoverability, and width-conditional layout collapse instead of a separate mobile build. gitui’s in-tree popup stack and atuin’s select-to-reveal inspector are the direct implementation references.

Three principles shape the restraint:

  • Details on demand. Shneiderman’s mantra (overview first, zoom and filter, then details) and dashboard surveys that name information overload as the dominant failure mode argue for a bounded default with depth behind interaction, not more panes.
  • Density without decoration. Terminal users self-select for speed, so meaning is carried by text and semantic color (it survives a color-stripped buffer), with no animation or fade. The surface repaints on state change, and ratatui’s buffer diffing makes the idle repaint floor a zero-write no-op rather than screen-reader spam.
  • Single-key reachability. On phones over SSH the keyboard layer, not width, is the real constraint, so every action is reachable without CTRL/ALT modifiers and the : palette bridges novice and expert use.

The design decisions and the alternatives weighed against them are recorded as TR-001 through TR-006 in docs/tradeoffs.md.

CLI flags

FlagDefaultEffect
--alert-budget <N>100000Token-budget ceiling. At 80% a Warning alert fires; at 100% the kill-switch engages.
--research-rate <N>10Tome dispatcher fetches per hour. The bucket persists across restarts at ~/.skrills/research-quota.json and refills pro-rata by elapsed time.
--port <N>8888Browser HTTP port (only with --browser).
--browseroffRun the HTTP browser surface.
--tuioffRender the live TUI in the current terminal (requires a TTY). Quit with q or Ctrl-C.
--no-belloffSuppress the terminal bell the TUI rings on a newly-fired WARNING alert.
--no-adaptiveoffDisable load-aware cadence; fix tick rate to base.
--tick-rate-ms <N>2000Override base tick rate.
--skill-dir <DIR>(none)Repeatable. Adds skill directories beyond the defaults.
--plugins-dir <DIR>./pluginsPlugins root whose <plugin>/health.toml files participate in each tick. Missing or unreadable directories yield an empty plugin set without error.

Architecture

A single producer (ColdWindowEngine in skrills-analyze::cold_window) emits one Arc<WindowSnapshot> per tick on a bounded tokio::sync::broadcast channel. Both render targets subscribe to the same bus. Drift between them is structurally impossible because the artifact is the contract.

┌────────────────────────────────────────────────────────┐
│  ColdWindowEngine (skrills-analyze::cold_window)       │
│   tick(input) → Arc<WindowSnapshot>                    │
│                                                        │
│   ↳ FieldwiseDiff       (snapshot diff)                │
│   ↳ LayeredAlertPolicy  (4-tier and hysteresis)        │
│   ↳ DefaultHintScorer   (intelligence::MultiSignal)    │
│   ↳ LoadAwareCadence    (load-ratio backoff)           │
└────────────────────┬───────────────────────────────────┘
                     │
        ┌────────────▼─────────────┐
        │  SnapshotBus              │
        │  broadcast<Arc<Snap>>     │
        └────────────┬─────────────┘
   ┌─────────────────┼─────────────────┐
   ▼                 ▼                 ▼
TUI panes        SSE handler     Tome worker
(dashboard)      (server)        (quota-gated)

Resource bounds (R11 mitigation): the broadcast channel caps at 16 queued snapshots. Lagging subscribers drop and the SSE handler emits a status banner (“subscriber lagged by N ticks”) rather than blocking the producer. The activity ring caps at 100 entries with oldest-evict.

Token thresholds

Defaults are research-backed:

  • 20K total tokensAdvisory (Anthropic API quadratic-cost inflection per the Feb 2026 HN Expensively Quadratic analysis).
  • 50K total tokensCaution (Willison’s Too many Model Context Protocol servers range).
  • 80% of --alert-budgetWarning.
  • 100% of --alert-budgetWarning and kill-switch engaged (mutating sync operations refuse until master-acked).

All thresholds are configurable via builder methods on LayeredAlertPolicy if you embed the engine directly.

Browser security posture

Two layers of XSS defense:

  1. The server html_escapes every user-derived string before it lands in a fragment.
  2. The browser swap path uses DOMParser and replaceChildren, which parses <script> tags into nodes that do not execute when later attached to the document. Even if Layer 1 ever regresses, an injected payload can’t run.

When TLS is configured (axum-server and rustls), ALPN advertises h2. Multiple browser tabs in the same origin all stay subscribed past HTTP/1.1’s 6-connection-per-origin limit because HTTP/2 multiplexes streams.

Plugin participation (FR11)

Third-party skrills plugins opt into the cold-window by shipping a health.toml file alongside their .claude-plugin/plugin.json. Each tick the engine cold-walks the configured plugins root (--plugins-dir, default ./plugins) and parses every <plugin>/health.toml it finds. Schema:

plugin_name = "my-plugin"   # optional, defaults to directory name
overall = "ok"              # ok | warn | error | unknown

[[checks]]
name = "smoke"
status = "ok"
message = "all systems nominal"  # optional

[[checks]]
name = "deps"
status = "warn"

Plugins without a health.toml are silently excluded. A missing file is the opt-out signal, not an error. Malformed health.toml files (parse error, unknown status string) trigger a deterministic Caution-tier alert with a stable fingerprint (plugin-health-malformed::<plugin>) and exclude the plugin from the snapshot until the file is fixed (spec EC5). Hysteresis and min-dwell are skipped for these alerts because user configuration errors need immediate visibility.

Prior-art validation

The cold-window’s design draws explicitly from mature reference implementations.

PatternReferenceSkrills’ choice
Single-snapshot fan-out to TUI and browserccboard, vector top, GlancesArc<WindowSnapshot> over a bounded broadcast channel; both surfaces are pure renderers.
Cold rewalk every tickPrometheus file_sd, fluent-bit in_tailFull filesystem walk per tick within the SC1 200 ms p99 budget; no warm cache.
Tick rate vs frame rate separationratatui async-templateAdaptive cadence (state advance) is decoupled from SSE keep-alive (redraw).
Hysteresis, min-dwell, and tier filteringPrometheus Alertmanager aggrGroup, ISA-18.2 alarm management4-tier model with hysteresis clear ratio 0.95 and min-dwell 2 ticks.
Token-bucket quota with restart-resilient persistencegovernor, Sensu dedup-key-templateAlertManager-style research dispatcher with quota persisted at ~/.skrills/research-quota.json.
Defense-in-depth XSS postureaxum-htmxServer html_escape, then browser DOMParser and replaceChildren.

The user-pain quotes that anchor the threshold defaults (20 K Advisory, 50 K Caution) come from the Expensively Quadratic HN thread and Simon Willison’s Too many MCPs post. Geoffrey Huntley’s measurement that the GitHub MCP alone “swallows another 55,000 of those valuable tokens” maps directly to the 50 K tier.

Known caveats

  • 80 % Warning vs Anthropic’s 83.5 % auto-compact: skrills fires Warning at 80 % of --alert-budget, slightly ahead of Claude Code’s auto-compact trigger. Community evidence (anthropics/claude-code#28728, #46695) suggests 75 % may be safer for sessions you intend to compact. v0.9.0 is expected to make this configurable per-tier.
  • Kill-switch override: there is no “ignore the kill-switch” flag in v0.8.0. If you hit 100 %, raise --alert-budget and restart. This matches the safer-than-sorry posture of cockpit Warning alerts in FAA AC 25.1322-1. If it proves too restrictive in practice, we may add an opt-in --allow-budget-override.
  • SSE shutdown semantics: the browser surface merges a shutdown notify into the SSE response stream so Ctrl-C returns within the 2 s budget. Without the merge, a pending broadcast-await would block graceful shutdown indefinitely (axum #2673, hyper #2787). Future maintainers: do not “simplify” by removing the merge.

Dogfooding the surfaces

A dogfood-all Make target exercises every cold-window surface end-to-end against real fixtures. Useful when verifying a release candidate or after touching the engine, browser, or shutdown code:

make dogfood-cold-window-headless   # engine ticks 3 s, expects clean SIGTERM
make dogfood-cold-window-chaos      # --no-adaptive and budget=1, kill-switch path
make dogfood-cold-window-browser    # HTML/SSE parity and 2 s shutdown budget
make dogfood-tui                    # tui TTY-or-graceful-refusal contract
make dogfood-dashboard              # dashboard TTY-or-graceful-refusal contract
make dogfood-skill-diff             # skill-diff --format json round-trips
make dogfood-all                    # everything above and the original dogfood

The browser target is the load-bearing one: it boots cold-window --browser on :18888, asserts the rendered HTML declares EventSource listeners for the four canonical event names (alert, hint, research, status), then opens a 2 s curl -N against /dashboard.sse and confirms the engine emits matching event: lines plus at least four data: payloads. After that it sends SIGTERM and asserts the process exits inside the 2 s graceful-shutdown budget. The contract being tested is “the HTML page’s listener set equals the SSE endpoint’s emitter set”, the same parity guarantee that the in-tree integration test crates/server/tests/cold_window_parity.rs verifies via the broadcast bus directly. Together they cover both the Rust-internal path and the externally-observable HTTP contract.

The tui and dashboard targets validate the TTY-guard contract: under a real terminal the process renders until the 3 s timeout fires (rc 124 / 143). Under a non-TTY environment (CI, redirected stdio) the process exits 1 with a clear requires a TTY message rather than crashing on a termios syscall against /dev/null. Both surfaces use the same guard pattern (crates/server/src/tui.rs:20-22 and crates/server/src/app/dispatcher.rs:417-419).

Hint patterns (ISA-18.2 inspired)

The hint scorer surfaces these operational patterns when it detects the matching signal in the snapshot:

  1. Hysteresis flapping: if a Caution fires within min-dwell of the previous Caution on the same fingerprint, suggest raising the hysteresis floor by 5% (the signal is oscillating near the trigger boundary).
  2. Research quota storm: if the research dispatcher drains

    80% of its hourly bucket in under the group interval, suggest widening the fetch interval or adding an inhibition rule for low-tier alerts.

  3. Chattering Warning: if a Warning resolves and re-fires within the repeat interval, suggest adding a dead-band or shelving per ISA-18.2.
  4. Cascade suppression: if an Emergency alert fires while a Critical on the same fingerprint is unacked, surface a keystroke hint to master-ack the superseded Critical.
  5. Span-of-control overload: if the hint pane shows > 7 active hints simultaneously, suggest shelving advisories or raising the Caution floor (ISA-18.2 §6.4 operator limit).

Roadmap

  • Production tick producer using analyze::tokens::count_tokens_attributed against real discovery output (replaces the demo producer).
  • Per-tier configurable thresholds (community evidence supports 75 % Warning, deferred to v0.9.0).
  • Clippy-style Applicability axis for hints (MachineApplicable / MaybeIncorrect / HasPlaceholders / Unspecified) orthogonal to severity (rust-clippy precedent).
  • ISA-18.2 ack state machine for master-acknowledge (Normal → Unack → Ack → RTNUnack, plus Shelved / Suppressed / OOS).
  • gRPC service surface for external clients. The wire-format crate skrills-snapshot is already designed proto-friendly per the brief.

Reference