Heartbeats

Why heartbeats exist

Stimulus channels describe what happened. They cannot describe what didn't.

The promoter who didn't check in by 09:15. The KYC that didn't get submitted before the campaign started. The brand approval sitting for three days. The invoice now overdue. The ZATCA stamp that didn't come back from the gateway in 60 seconds.

Absence-as-signal is invisible to incoming stimuli. To detect it, the engine itself must emit a signal — on a schedule — when an expected stimulus fails to arrive.

That signal is the heartbeat.

Heartbeats are not standalone. They are the pulse of a specific thread.

Every step of every live thread schedules at least one heartbeat at step entry.

Thread pulse, not standalone stream

Earlier framings of ContextOS positioned heartbeats as one of three parallel streams. That framing is preserved as a useful mental model — absence is one of the three roles a stimulus plays. But mechanically, a heartbeat is the pulse of a single thread step, not a top-level stream.

This matters because heartbeats are scoped. They belong to one thread. They check that thread's chain. They consult that thread's stimulus log. They suppress, fire, or reschedule based on the state of that one thread. The engine fires thousands of heartbeats per minute across thousands of threads — each is a tightly-scoped pulse, not a global poll.

The mechanism

When a thread step is entered, the engine schedules its heartbeats immediately for their future firing times. Durable wake-ups, written once. No polling, no recurring scans. On firing, the heartbeat handler checks the thread's state and decides one of three things:

Suppress — the expected stimulus arrived; the world resolved itself in time. Log only.
Reschedule — an expected stimulus is in flight (e.g. a user CTA was pressed but the resulting event hasn't arrived yet); give it more time.
Fire — the expected stimulus did not arrive and is not in flight. Emit a signal, route through the engine, possibly summon a human.

This is durable, idempotent, and cheap. It is also the difference between a logging system and an operating engine.

The suppression decision matrix

The heartbeat handler looks at two evidence streams: the thread's event chain (what facts have happened) and the thread's stimulus log (what intents are in flight). Together they answer the question, is the human doing their bit, on time, or are they actually silent?

Evidence found	Decision
Expected terminal event present in chain	Suppress. Terminal reached.
Stimulus in-flight (e.g. user CTA pressed in last 60s, no resulting event yet)	Suppress + reschedule. Give it more time.
Stimulus shows explicit rejection (e.g. user pressed Decline)	Branch. No nudge; thread takes alt path.
Stimulus shows offline-queued (sync pending)	Suppress + extended reschedule.
No relevant stimulus, no event	Fire. Human is silent; nudge.
Already nudged, recipient opened notification, no follow-up CTA	Fire as escalation. Try alt channel or person.

This is what makes heartbeats informed nudgers, not blind timers. A blind timer fires whether you're in the middle of pressing the button or genuinely absent. An informed pulse knows the difference.

The heartbeat envelope

Heartbeats use the same canonical event envelope as any other stimulus. They carry the thread they belong to, the type of expectation they enforce, and a recommended AI action with a confidence score.

{
  "event_id": "evt_hb_01HW...",
  "event_type": "heartbeat.attendance.overdue_5min",
  "stimulus_channel": "scheduled",
  "thread_id": "thread.shift.sh_abc123",
  "thread_step": "step_awaiting_check_in",
  "thread_role": "heartbeat",

  "occurred_at": "2026-05-01T05:05:00Z",
  "correlation_id": "cor_abc123",
  "causation_id": "evt_roster_xyz",
  "actor_type": "system",

  "payload": {
    "expected_event_type": "attendance.checked_in",
    "expected_by": "2026-05-01T05:00:00Z",
    "actual_minutes_overdue": 5,
    "policy_ref": "policy_uae_lateness_default",
    "suppression_check": {
      "checked_at": "2026-05-01T05:05:00Z",
      "found_terminal_event": false,
      "found_in_flight_stimulus": false,
      "decision": "fire"
    },
    "ai_action_recommended": {
      "type": "notify_persona",
      "personas": ["promoter", "store_manager"],
      "channel_strategy": "preferred_per_user",
      "confidence": 0.92
    }
  }
}

Heartbeats are structured data, not message strings. Notifications are derived from this payload at dispatch time, in the recipient's locale, by their channel constraints. Audit captures the structured signal — template changes don't break the record.

Policy as declarative config

What heartbeats fire, when, and what they do is config — not code. Policy lives next to the thread step that triggers it. Editable per tenant, per jurisdiction, versioned and audited.

thread: shift.execution
step: awaiting_check_in
heartbeats:
  - id: shift.start_in_30min
    fire_at: "{shift.expected_at} - 30min"
    suppress_if: terminal_event in chain
    on_fire: ai_action.notify_promoter

  - id: attendance.expected_by
    fire_at: "{shift.expected_at}"
    suppress_if: attendance.checked_in in chain
    on_fire: ai_action.evaluate_chain

Each thread step that has a downstream expectation declares its heartbeats this way. Editable per tenant. Versioned and audited. New jurisdictions get a config diff, not a code change.

Idempotent on firing

Each heartbeat carries an idempotency key:

idem_key = hash(thread_id + heartbeat_kind + scheduled_at)

Retries are safe. The same heartbeat firing twice is a no-op the second time. When a thread reaches its terminal event, all pending downstream heartbeats are marked superseded in the scheduler. They don't get cancelled (audit trail preserved); they just suppress on fire.

Sweep heartbeats

Most heartbeats are scoped to a single thread step. A small number of sweep heartbeats run on cron and scan across many threads at once. They detect patterns no single thread would notice.

Dead chains daily — threads with no event in 7+ days, no terminal. Surface as stuck.
Ghost intents hourly — user CTAs past SLA without resulting events. Engineering signal.
Dead buttons monthly — CTAs in the registry never pressed in 30+ days. UX hygiene.
Notification unconverted daily — nudges that opened but didn't convert to action. Channel preference signal.
AI override streak daily — users repeatedly rejecting same AI suggestion. Model retraining signal.
Confidence calibration weekly — actual outcomes vs predicted confidence. Threshold tuning.

Sweep heartbeats are also the engine's safety net. Even if a thread's per-step heartbeats are misconfigured, the dead-chain sweep will catch it before it disappears entirely.

Substrate options

The heartbeat scheduler is the engine's most-stressed component. Substrate choice is about scale, not semantics.

Postgres + pg_cron — simplest. Works to ~10M scheduled wake-ups. Best starting point.
Temporal.io — purpose-built for durable timers. Scales horizontally. Recommended for high-volume tenants.
EventBridge / Cloud Scheduler — managed schedulers from AWS or GCP. Vendor-locked, operationally simple.
Cloudflare Durable Objects — edge-native; pairs well with low-latency, geographically distributed ingestion.

Heartbeats are stored alongside events in the same log; the scheduler reads the log to know what wake-ups are pending. See the engine for how heartbeat firing connects to enrichment, routing, and the locator.