How AI-native apps should be built

Most teams building AI-native software start at the wrong end. They build the chat box. They build the dashboard. They wire events to the UI as an afterthought. What they ship is a demo, not a system.

The order is wrong. The UI is a rendering decision. It belongs at the end, not the beginning. Before any pixel is drawn, six things should be settled. Once they're settled correctly, the UI almost writes itself — because by then you know exactly who needs to see what, when, on which surface, with which affordances.

Get steps 1–5 right and the human doesn't need to remember anything.

Today, the CEO or the senior manager carries the whole universe in their head because the software can't. We're done with that. The substrate carries it.

Step 1

Catalog the events.

List every event that happens in the real world. Every one. Pretend AI doesn't exist; pretend AI is everywhere. The list should be the same. An order arrives. A warehouse acknowledges. A carrier picks up. A customer writes in. A truck enters a yard. A patient is admitted. A shift starts. A payment clears.

These are real-world events — things that happened in physical or digital reality, witnessed by some system or person. Treat them as the canonical layer.

Then there is a second category: events your mind constructs. Late. At risk. VIP. Outlier. These don't exist in nature; they're categorizations a domain expert imposes on the canonical events. Keep them on a separate list. They're derived, not primary. Get them wrong and you can redefine them without losing the canonical record.

Give every event a logical name. order.received. warehouse.acknowledged. carrier.picked_up. Nomenclature is half the work — a system that can't name what just happened can't reason about it.

See also: Triggers — the external-ecosystem events that spawn lifecycles. Stimuli — the canonical shape every event takes inside the engine.

Step 2

Build the threads. With heartbeats.

An event by itself is noise. An event belonging to a lifecycle is signal. Group your events into threads: an order from arrival to delivery confirmation, a patient from admission to discharge, a shift from clock-in to clock-out, a lead from inbound to closed-won-or-lost.

A thread is the unit of orchestration. It has a trigger (one event spawns it), a sequence of expected events (the recipe), and one or more terminal events that end it. Different terminal events mean different outcomes — a delivery confirmed is not the same terminal event as a delivery returned-to-sender. Both end the thread; they end it differently.

Build heartbeats into the thread at the same time as the thread itself. Heartbeats are not a separate concern bolted on later. They're how the thread watches for what didn't happen — the missed acknowledgement, the late pickup, the no-show. Every thread step gets a heartbeat at the moment it begins. The heartbeat fires when the expected next event doesn't arrive in time, and generates an event of its own.

See also: Threads — lifecycles from creation to logical conclusion. Heartbeats — how the system watches for absence, which can't be detected from incoming events alone. Thread catalog reference — what the document looks like.

Step 3

Enrich the context.

An event is meaningful only in context. The same event — payment.received — means different things on different threads. On a first order from a new B2B customer, it's a relief. On a recurring shipment, it's routine. On a chargeback flow, it's a settlement.

For every event, decide what context attaches. History on the thread. Policy in force at this moment. Persona of whoever might be involved. Channel trust of the source — a webhook signed by a government gateway is not the same as a parsed inbound email. Recent overrides. Anything the engine needs to make a decision.

Then decide what level of intelligence is required to keep the ball rolling. Sometimes a rule is enough — if X then Y. Sometimes an LLM is needed to interpret a free-text complaint. Sometimes the call needs a human because the cost of being wrong is too high or the judgment is too nuanced. This is a per-event, per-thread decision. Bake it in here, not in the UI.

See also: The Engine — the orchestrator that joins events with context and produces signals. Stimulus channels reference — the taxonomy of how events arrive.

Step 4

Name the persona.

When intelligence isn't enough, a human gets the call. Name them.

Not "a user." Not "the team." A specific persona — the ops manager, the floor lead, the customer service rep, the on-call doctor, the brand manager, the parent. Each persona owns a different set of decisions. Each persona needs a different subset of context to make those decisions.

Existing software hands every persona the same grid listing and asks them to filter their own life.

That is overload by design. The dashboard culture treats personas as interchangeable consumers of the same firehose. They're not.

For each decision-needing event, decide: which persona owns this call? What's the minimum context they need to decide well? An ops manager looking at a late acknowledgement needs the order value, the customer's chargeback history, and the warehouse's recent on-time rate. A customer service rep looking at the same event needs the customer's last interaction, their phone number, and a one-line summary. Same event, two different views.

See also: Operator Inversion — why the human is on call, not on duty. The Confidence Model — how the engine decides when to summon and when to act.

Step 5

Pick the surface.

Once you know the persona, you know where they already look. The ops manager glances at the action queue between shifts. The customer service rep watches a pinned channel. The on-call doctor wears a watch. The kitchen lead reads the tablet on the pass. The parent looks at their phone after the kids are asleep.

The surface picks itself when the persona is known. Don't make the persona come to a new place. Land where they already are.

Then decide what affordances the surface offers. A glance-only surface (a watch face complication) shows one fact and one tap. A two-decision surface (an action queue) shows the context joins and offers approve / defer / escalate. An approval surface (a pinned chat thread) shows the full reasoning and asks for a yes or no. Different surfaces support different decisions. Match them.

See also: Surfaces Thread Catalog Stimulus Channels — the multi-channel locator and the discipline of ambient delivery. ALAP — as late as possible, but not too late.

Step 6

Render the UI. Last.

By the time you get here, the hard work is done. You know what events exist. You know how they group into threads. You know what context each event carries. You know which persona owns each decision and what they need to see. You know which surface to land them on.

UI is now a rendering decision. Layout. Typography. Colour. The watch-face complication is two lines and a glyph. The action queue is a list view with affordances. The pinned chat thread is markdown with buttons. Render them.

If steps 1–5 are right, the UI is small. Most of the system is invisible — the watching, the enriching, the routing all happen in the substrate, not on the screen. Most product surfaces collapse to a handful of designs because the substrate has done the work of deciding what to show.

If steps 1–5 are wrong, no UI can save you. You'll be designing dashboards that nobody reads, notifications that nobody trusts, and approval flows that nobody respects.

The design test.

When you think you're done, ask one question: does the human have to remember anything?

If the answer is yes — they have to remember to check the dashboard, or remember the customer name, or remember to follow up on Friday — you're not done. The substrate isn't carrying enough of the universe. Go back. Find the thread that should have surfaced this. Find the heartbeat that should have fired. Find the context that should have attached.

If the answer is no — the right person sees the right thing at the right time on the surface they already look at, with the decision pre-staged — you're done.

The test is binary. The CEO who is THE ONE because they remember everything is the failure mode, not the role model. Build the substrate so no one has to be THE ONE.

Where this method runs.

OrderHubX is built this way. The events were cataloged before any UI existed: order.received, warehouse.acknowledged, carrier.picked_up, delivery.confirmed, and dozens more. The threads (order lifecycles, returns lifecycles, chargeback lifecycles) were defined next. The context-enrichment rules were written before the engine ran a single event. The personas (ops manager, customer service, finance) were named, their context needs spec'd. The surfaces (action queue, pinned chat, approval dialog) were chosen. The UI was rendered last.

You can see the substrate running in the OrderHubX simulation. The JSON feed driving the simulation is the same shape that drives the production engine. The explainer rendering is one of several surfaces the same substrate supports.

OrderHubX is one place this method runs. Your domain is another. The vocabulary will change — threads will be cases, or shifts, or visits, or campaigns — but the build order will not.

How AI-native apps should be built.