The Engine

The ant-watcher

The ants are the events. The thread is the work. The human is doing what is most meaningful — all are equal and essential. ContextOS watches the swarm so we don't have to.

Picture an anthill. Thousands of ants at any moment, each carrying a small signal, each contributing to a larger pattern. A naturalist watching the hill doesn't follow one ant; they watch the colony. They notice when ants stop moving in places they shouldn't. They notice when a path that should be busy goes quiet. They notice patterns no individual ant would see.

The engine does this for operations. The ants are the events — small signals flowing in from many sources. The threads are the work — lifecycles in motion that group related events into something meaningful. The engine watches them all simultaneously. It schedules pulses on each thread. It accepts stimuli from any of nine channels. It advances threads when stimuli arrive. It surfaces threads when expected stimuli don't. It learns from every routing decision so the next one is sharper.

The human — the operator — is never an ant. Operators are doing the meaningful work that requires judgment. The engine watches the swarm so we don't have to.

No thread is allowed to die quietly.

Every thread reaches a logical conclusion or is surfaced as stuck.

What the engine does

Across every live thread, simultaneously, the engine has seven duties.

1. Schedule thread pulses

When a thread step is entered, all its heartbeats are scheduled immediately for their future firing times. Idempotent. Durable across restarts. No polling.

2. Normalize stimuli

Accept inputs from nine channels; produce one canonical event envelope. Channel-specific concerns (auth, dedup, parse) are handled at the adapter; downstream code never sees them.

3. Advance threads

On each event, close the current step, expose the next expected stimulus + SLA, branch on alternative events, compensate on rewind events, terminate on closing events. The thread's spec drives the transitions; the engine executes them.

4. Surface stuck threads

When a heartbeat fires and no expected stimulus has arrived, the engine routes a signal — to the right persona's surface, at the right urgency, on the right channel via the locator.

5. Drive forward motion

Multi-channel locator picks the right person, the right channel (WhatsApp / push / SMS / email / Slack), the right moment (respecting quiet hours, do-not-disturb, per-user nudge budgets). When the engine picks wrong, the next iteration learns.

6. Learn from outcomes

Every routing decision — what AI proposed, what the human did, how the situation resolved — feeds the decision graph. Confidence sharpens. Channel preferences sharpen. The locator sharpens.

7. Close threads

Terminal events trigger thread closure: meta-event recorded, pending heartbeats marked superseded, hand-off events emitted to dependent threads, retention timer started per the domain's policy. Closed threads stay queryable for the full retention period.

Engine guarantees

For every thread, the engine guarantees:

Spawn is recorded. The trigger event is durable before the thread is "live." No phantom threads.
All steps are pulse-watched. At least one heartbeat per step, scheduled at step entry. No silent step.
Stimuli land on the right thread. Channel adapters resolve thread membership from the envelope or quarantine the stimulus. No orphan beads.
No silent death. Every thread either reaches a terminal event or is surfaced as stuck by the daily dead-chain sweep.
Cross-thread links are bidirectional. Primary + reference events keep the relationship discoverable from either side.
Replays are deterministic. The same event sequence always yields the same projections. Time-travel debugging works.

These guarantees are the contract. Code that violates them is wrong.

Concurrency at scale

A typical mid-sized operation runs in the low thousands of concurrent threads. A regional operation runs tens of thousands. A national logistics operator at peak: a few million. The engine handles all of it.

Per-thread state is small. Per-thread heartbeats are cheap — most fire and suppress without consequence. The expensive operation is the moment a heartbeat fires and the engine has to evaluate, route, and possibly summon — and that's the operation that earns its cost, because it's the moment a human's judgment is genuinely required.

This is the inversion of attention-extracting software. Most software's cost grows linearly with volume because every event becomes a notification. In a ContextOS-shaped app, the human-attention cost stays roughly flat because the engine absorbs the high-confidence band as automation, the medium band as queued-and-cleared, and only the low-confidence band reaches a person.

Substrate options for the engine itself

The engine needs three things from its substrate: an append-only event store, a durable scheduler for heartbeats, and a projection store for the read side. Choice is about scale and operational maturity, not semantics.

Postgres + pg_cron + pg_partman

The simplest substrate. Same database as the event store. Transactional with events. Up to roughly 10M scheduled wake-ups per tenant manageable. Best starting point for any single organization.

Temporal.io

Purpose-built for durable timers and workflows. Idempotent. Scales horizontally. Recommended when scheduled-wake-up volume crosses several million live or workflows get complex.

AWS EventBridge / GCP Cloud Scheduler

Managed schedulers. Operationally simple, vendor-locked. Good when the rest of the stack already lives there.

Cloudflare Durable Objects

Edge-native. Pairs well with low-latency, geographically distributed stimulus ingestion. Newer; less mature ops tooling but cheap at scale.

None of these is the engine. The engine is the orchestration logic that runs on top — thread spec interpretation, stimulus normalization, suppression checks, routing decisions, locator dispatch, decision graph maintenance.

The orchestrator loop

The engine's main loop is small because the discipline is small.

on_stimulus_arrive(stim):
  channel = stim.channel_adapter
  event   = channel.normalize(stim)
  thread  = lookup_thread(event.thread_id)
  thread.advance(event)
  schedule_next_heartbeats(thread, current_state)
  emit('event.placed', event)

on_heartbeat_fire(hb):
  thread = load_thread(hb.thread_id)
  if thread.has_expected_stimulus(hb.expected):
    emit('heartbeat.suppressed', hb)
    return
  if thread.has_in_flight_stimulus(hb.expected):
    reschedule(hb, give_more_time=True)
    return

  signal     = enrich(hb, thread, joins=[history, policy, persona, channel_trust])
  confidence = score(signal)
  surface    = route_by_confidence(confidence)
  persona    = pick_persona_for_signal(signal, surface)

  if surface == 'automation':
    execute(signal.recommended_action)
  elif surface == 'action_queue':
    upsert_to_action_queue(persona, signal)
  elif surface == 'approval':
    upsert_to_approval_surface(persona, signal)
    locate_and_notify(persona, signal)
  elif surface == 'follow_up':
    upsert_to_follow_up_calendar(persona, signal)

  schedule_next_heartbeats(thread, current_state)

That's the whole orchestrator in twenty-five lines. The complexity that scales the engine isn't in this loop — it's in the thread specs, the channel adapters, the routing model, the locator, the learning graph. Each of those is a small specialization that plugs in.