The system does not learn from raw data. It learns from the chain of signals, decisions, actions, outcomes — and thread closures.
Method step: Loop. The meta-loop that makes the four-verb method sharper over time.
Most ML systems learn from labelled training sets. Operational systems do not have that luxury — operations is messy, the labels arrive late, and the right thing to do today depends on what worked yesterday.
ContextOS treats this as a feature, not a problem. Every signal raised, routed, and acted on becomes a row in the decision graph — a structured record of what the system saw, what it proposed, what a human did, and how it turned out.
The decision graph is what makes the platform self-improving.
Every signal produces one decision record. Its structure looks like this:
{
"decision_id": "dec_01HW...",
"signal_id": "sig_01HW...",
"raised_at": "2026-05-01T05:05:00Z",
"signal_type": "heartbeat.attendance.overdue_5min",
"thread_id": "thread.shift.sh_abc123",
"thread_step": "step_awaiting_check_in",
"stimulus_channel": "scheduled",
"channel_trust": 1.00,
"ai_proposal": {
"action_type": "notify_persona",
"personas": ["promoter", "store_manager"],
"confidence": 0.92,
"model_version": "conf-v3.4",
"reasoning_inputs_ref": "ctx_chain_snapshot_01HW..."
},
"routing": {
"surface": "automation",
"threshold_used": 0.85,
"band": "high_confidence"
},
"human_response": {
"actor_id": "usr_...",
"action": "accepted", // accepted | edited | overridden | ignored | snoozed
"responded_at": "2026-05-01T05:05:18Z",
"edit_diff": null,
"override_reason": null
},
"outcome": {
"observed_at": "2026-05-01T05:42:11Z",
"result": "promoter_arrived", // resolved | escalated | timed_out | unresolved
"downstream_signals": ["sig_attendance.checked_in"]
}
}
One row per signal. Immutable. Joined to the rest of the substrate (events, stimuli, heartbeats) by thread_id.
Human response is not binary. Each shape of response carries different information about what to adjust.
Human accepted the AI proposal as-is. Strongest positive signal. Reinforces the routing decision and the confidence band.
Human kept the proposal but adjusted it (different persona, different message, different timing). Tells the model the signal class is correct but a feature within it is miscalibrated.
Human rejected the proposal entirely or did something different. Strongest negative signal — the model's confidence was overplaced. Lower the threshold for this signal class.
Human did neither accept nor reject in time. Tells the model: this surface was wrong, the urgency was wrong, or the channel was wrong. Different remedy than override.
Beyond the four human-response signals, every thread closure is itself a learning event. The terminal disposition — success, failure, or compensated — tells the model whether all the routing decisions taken across that thread's lifecycle added up to the right outcome.
Patterns of recurring failure-mode in a given thread type are especially valuable. If shift-execution threads in store X consistently close as no-show despite the engine routing reminders correctly, that's not a routing problem — it's a deeper signal worth surfacing to the operator. The decision graph aggregates thread closures by type, persona, store, time-of-day, and other features, and surfaces the patterns that matter.
This is how the system gets better at catching trouble before it becomes trouble: not just by sharpening individual routing decisions, but by learning which thread shapes deserve more attention earlier in their lifecycle.
Every link in the chain produces an immutable record.
Replay is always possible. Audit is the default.
The decision graph does not improve a single number. It improves several things in parallel.
Patterns the system gets right see their thresholds raised — quietly automating cases that previously sat in the queue. Patterns the system gets wrong see thresholds lowered — surfacing those signals for human review more often.
The mix of which signals go to which surface adjusts based on outcome. If a signal type repeatedly arrives in the action queue and gets accepted without edit, it migrates toward automation. If it consistently gets overridden, it migrates toward approval.
Channels that reliably produce correct decisions see their base trust quietly raised. Channels that produce regular false-positives see their trust lowered. New channel integrations start at conservative defaults and earn trust by their record.
When the human-locator picks the wrong channel (no response, escalates) the user-channel preference function downgrades that channel for that user/signal-class combination.
Patterns of thread shape that lead to closed-failure terminals get flagged earlier in their lifecycle on subsequent runs. The system learns not just what to do but which lifecycles deserve early attention.
It is not a black box. Every record names the model version, the inputs it saw, and what the human did. Replay is always possible.
It is not a substitute for ground truth. The graph is operational truth — what the team did and what happened. For regulated decisions, ground-truth audits remain a separate process.
It is not auto-tuning beyond policy. Policy thresholds (e.g. "refunds always go to approval") are guardrails the graph cannot override.
Because every AI proposal is logged with model version, confidence, channel trust, and inputs, ContextOS supports requests for human review of any AI decision affecting an individual.
The decision record is the audit artefact. Tenants in regulated jurisdictions configure retention windows accordingly.
See how the decision graph connects to confidence and routing.
Calm technology with teeth.