Every signal carries a confidence score. The score governs where it goes — not the signal type.
Method step: Route. The decision rule that splits silent automation from human approval.
Most systems either fully automate or always require review. Both are wrong defaults.
Operations is a spectrum. Some signals are obvious; some are ambiguous. The system should react accordingly. ContextOS attaches a confidence score to every signal it raises and routes the signal based on that score.
The result is silent automation when the system is confident, lightweight review when it is helpful, and full human ownership when stakes are high.
Confidence determines where a signal goes — and who acts on it.
Silent automation. The system executes the recommended action. Humans are informed via the timeline, not interrupted.
Examples: retry a failed integration, refresh a channel sync, fire a deterministic policy rule, dispatch a templated reminder.
Action queue. AI proposes the action with full context. A human approves, edits, or rejects in seconds.
Examples: a refund within policy bounds, a courier swap, a re-route, a flagged duplicate.
Approval surface. A human owns the decision. AI provides context but no recommendation, or a recommendation marked low-confidence.
Examples: refunds outside policy, regulatory filings, customer-impacting carrier changes, anything irreversible.
The score is a composite — not a single model output. Four inputs feed it.
How closely does this signal match a labelled, previously-resolved case? Strong matches against a clean ground-truth set produce high confidence. Novel signals or edge cases produce low confidence.
A high-risk action (irreversible, regulated, customer-impacting) caps confidence at the medium band by policy — even if the model is sure. Some actions never silently automate.
If humans have overridden similar AI recommendations recently, confidence on this signal type is lowered automatically. The model treats human overrides as a teaching signal, not noise.
The stimulus channel the originating event arrived on carries a base trust score, and that score feeds confidence directly. A government webhook signed via mTLS deserves higher base trust than a parsed inbound email from a personal domain. A user_action from a logged-in operator with permissions deserves higher trust than the same action from an anonymous device. The engine reads the channel from the canonical envelope and adjusts confidence accordingly.
Channel trust is what makes the model robust to source heterogeneity. Two different channels can produce semantically identical events — an invoice-collected event from a payment-gateway webhook vs an inbound email saying "we paid" — and the engine should not treat them with the same certainty. Channel trust is how it doesn't.
Confidence isn't just about the signal. It's about the chain of trust from source to action.
The confidence function evolves with the deployment. Predictability first, learning second.
At launch, confidence is deterministic. Each signal type has a hand-built scoring function based on policy thresholds, observed history, channel trust, and known invariants. This produces predictable behaviour and an auditable trail. It is also enough to route 80% of signals correctly on day one.
As the decision graph accumulates outcome labels (action accepted, edited, overridden, ignored), confidence shifts to a learned function over the same features.
The transition is gradual: the model proposes a confidence band; rule-based boundaries remain enforceable as policy guardrails. Tenants can opt into model-based confidence per signal class.
Every signal generates a decision record: what was raised, with what confidence, where it was routed, what the human did, and what the operational outcome was.
These records feed back into the confidence function. Over time:
This is how ContextOS gets sharper without retraining from scratch.
Confidence thresholds and the actions allowed in each band are tenant-configurable.
A regulated jurisdiction may force every refund into Band 3 regardless of model confidence. A high-trust enterprise tenant may expand Band 1 to cover more deterministic policy actions. A new tenant with no history may run with conservative defaults until decision-graph data accumulates.
The framework supports both; the configuration captures the policy. Configuration is versioned and audited.
For regulated automated-decision-making contexts (e.g. GDPR Art. 22, PDPL equivalent), tenants typically configure:
The framework supports compliance; tenants configure thresholds appropriately.
See how outcomes flow back into the confidence model.
Calm technology with teeth.