SOPs That Run Themselves

Open the client-onboarding SOP in your Confluence. Last edited 2023. Twenty-seven steps. Three still relevant. Eight obsolete. One — step seventeen, “partner reviews engagement letter for tone and exposure” — should never have been in a document, because it’s pure judgment and no checklist has ever helped anyone do it.

Nobody follows the doc. Everyone does it from memory. The hand-offs break monthly and you don’t know which step broke until the client emails to ask why nothing’s happened in two weeks. Then your COO spends a Thursday afternoon reconstructing what was supposed to happen against what did, and the answer is always the same — somebody forgot to forward something to somebody else.

Anya — the founder from post 2, running the thirty-five-person consultancy out of a Stockholm warehouse — pulled up her client-intake SOP last week. Twenty-three steps. Seven accurate. Eleven wrong because the firm changed and nobody updated the doc. Five that read fine on the page but had migrated into people’s heads, performed differently now by every senior associate.

The doc was fiction. The SOP was real. It just lived nowhere you could point at. The Confluence page is an artefact from when SOPs had to be documents because there was no other place for them to live. There is now.

Why SOP-as-document was always a lie

Humans don’t follow documents. They follow the last person who showed them how.

This isn’t laziness — it’s a structural fact about how knowledge moves through organisations. The Confluence page is a frozen snapshot. The actual SOP is whatever your senior associate did last time the case came up, plus whatever your junior remembered from the hand-over, minus whatever got lost when the senior left in March and nobody backfilled the explanation.

The doc is maintained for auditors, not operators. Look at its change history. Updates twice a year — once when a regulator asks, once when somebody flinches during onboarding and writes a more honest version that gets quietly merged back. Between those events, doc and work drift apart, and the gap is where hand-off failures live.

The doc is one artefact; the SOP is a graph. Some steps depend on other steps. Some branches only fire under certain conditions. Some steps are decisions, not actions. Documents flatten graphs into sequential prose, which is why you’ve all read SOPs that say “if X, do Y, otherwise see appendix B” and felt the page strain against the shape of the work.

So you carry two SOPs — the one you tell people about, and the one that actually runs the firm — and reconcile them whenever someone outside asks. You won’t solve this with a better wiki. The SOP wants to be a running thing, and you’ve been forcing it to be a read thing. For forty years that was the best you could do; the cost (drift, dual-maintenance, hand-off failures, the Friday-afternoon reconstructions) was the price of operating at scale.

The price isn’t the price anymore.

The ghost of a document fading on the left side of the frame — text dissolving into faint pixels — while on the right a single bright golden thread emerges and flows out into open space. The doc is dying; the flow is alive.

What an SOP-as-flow looks like

Walk through Anya’s client-intake SOP — the actual one, the one that runs.

A lead arrives — referral email, contact-form submission, inbound call her commercial director took at a conference. The flow starts the same way regardless of channel — trigger fires, substrate gathers what it knows about the prospect, first step runs.

Step one: conflicts check. Deterministic. The substrate cross-references the prospect’s parent, subsidiaries, and named principals against the active client list, conflict register, and former-client database. Output is yes/no with a reasoned trail. No human required for the modal case — the rule is mechanical, the audit trail writes itself. If conflicts surface, the flow pauses and pages the conflict partner with specifics.

Step two: engagement-letter draft. Judgment, with a deterministic gate around it. The substrate drafts the letter from the lead’s intake notes, the firm’s standard template, and the senior partner’s annotations on similar engagements from the prior six months. This is the AI step — the part that requires understanding context and producing prose. But it doesn’t ship anywhere yet. The draft passes through a deterministic check — required clauses present, no PII in the wrong section, no figures outside the partner-approved range — then routes to the partner for sign-off alongside the receipts.

This pattern — AI step, deterministic gate, human review — is the spine of every flow that does serious work. We wrote about why the best AI systems combine judgment and enforcement at architectural level; this is what it looks like applied to one business process.

Step three: KYC. Mixed. Deterministic checks first — sanctions list, beneficial ownership thresholds, jurisdiction flags. Then AI to classify the uploaded documents (incorporation papers, ID scans, org charts), extract the fields the rules need, flag anomalies. Classification is judgment; the threshold check on the extracted fields is deterministic. The AI never decides whether the client passes — it decides what the documents are, after which mechanical rules decide whether they’re sufficient.

Step four: engagement memo to file. Deterministic. Templates, references, audit log. Nobody reads it unless something goes wrong, in which case the structure is exactly what the regulator needs. No human touches this step.

Step five: kick-off scheduling. Deterministic, with one judgment input. The substrate identifies the partner-track engagement lead by sector and capacity, drafts the invite, suggests a slot, waits for the partner to accept or override. No response in four hours, page the COO; another four hours, auto-reserve at the partner’s first open block.

Five steps. Three deterministic, one judgment-with-gate, one mixed. Two human touch-points totalling under four minutes of partner time per intake. Today the same intake takes ninety minutes of partner time spread across three days, plus two hours of ops support, plus the inevitable Friday-afternoon reconciliation when something didn’t get forwarded.

A structured flow rendered as luminous shapes — distinct zones for deterministic execution (regular geometric pulses), AI judgment (swirling brighter clusters), and human-only review (one held still bright form) — connected by golden hand-off threads.

The audit trail isn’t a separate exercise. Every step records what it did, what gate fired, what the human decided, how long it took. The receipt for the engagement is the run record. When the regulator asks “how did you onboard this client,” you don’t reconstruct — you show them. When something goes wrong, you point at the exact step and the exact gate verdict. Not a Slack search. A click.

The new vocabulary

Three terms, because the old vocabulary doesn’t quite cover this.

Judgment points — the moments inside an otherwise deterministic flow where AI makes a non-deterministic call. Drafting the engagement letter. Classifying an incorporation document. Triaging the inbound RFP. These are the bucket-2 moments from post 3. In the old SOP, judgment points were where humans had to be in the loop; in the new SOP, they’re where the substrate picks up the work and gates check the output before anyone downstream sees it.

Escalation surfaces — where a human gets paged when judgment is uncertain, when a hard rule fires, or when a step needs sign-off before it leaves the building. Not an inbox. Not a Slack DM. A purpose-built console where the page arrives with the smallest possible question, relevant context attached, structured response — approve, reject, modify, abort. This replaces the meeting-as-synchronization-barrier from post 2. The substrate aligns continuously; the human is paged exactly once, exactly when judgment is needed.

Audit-trail-by-default — every step records what happened and why, with no operator having to remember to log anything. The trail isn’t generated after the fact by someone summarising what they did; it’s emitted during the fact by the engine that did it. Logging stops being a discipline and becomes a property.

Judgment points are where the AI earns its keep. Escalation surfaces are where the human earns theirs. The audit trail is what makes both legally tenable.

Here is what we built

We’ve been building this for the last year. Here’s what’s shipped today, and what we’ll add next. The line matters; we won’t blur it.

The core is a workflow engine that runs deterministic, AI-judgment, and human-in-the-loop steps in one DAG. Ten step types live in the engine today — condition, llm, approval (HITL pause), api_call, connector, code, delay, event, platform, research. A workflow declares its steps, dependencies, retries, timeouts, and a budget cap. The executor runs it to completion — suspending and resuming on human decisions, retrying transient failures, failing closed when the budget is exhausted. Today it schedules ready steps in topological order; concurrent dispatch of independent branches is on the runway, not shipped.

Deterministic gates around every AI step. Every step’s output passes through a filtering handler that runs the firm’s inbound security mesh — regex, PII detector, secret scanner, threat classifier — before the next step is allowed to read it. PII or secrets that an AI step accidentally surfaces get caught by deterministic code, not by another AI. The gate’s verdict is recorded on the step. This isn’t a feature we’re going to add; it’s in the path right now, on every output, every run.

Per-workflow encryption. Each workflow lives in its own encrypted store, keyed to the user who owns it. Specs, instances, results — all encrypted at rest with a per-workflow key derived from a service master key. The engine never reads across that boundary. Per-user is the floor; per-workflow is what we actually do.

HITL escalation surfaces with four request modes. When a flow needs a human, it doesn’t email them. It posts a card to a console mounted globally in the operator’s interface, smallest-possible question and relevant context attached. Four modes today — approval checkpoint (engagement-letter sign-off), budget extension, guidance request (substrate is uncertain), tool authorisation (a step wants a sensitive capability). If the operator’s WebSocket disconnects mid-decision, the console re-fetches pending pages on reconnect — no orphaned approvals on a network blip.

Triggers that start flows the same way. Cron, webhook, email-inbound, file-upload, event. Five trigger types, one execution path. A duplicate webhook fire — the kind that happens whenever an upstream provider’s at-least-once delivery hiccups — gets deduplicated in Redis before any downstream work runs, so the SOP doesn’t run twice for the same event. (Trigger persistence across restarts depends on a service key; production sets it, dev defaults to ephemeral.) Every fire — succeeded, failed, or skipped-as-duplicate — is recorded to an encrypted history log with a seven-day TTL.

Five operational templates shipped. Not hypothetical. Compliance-check, ingest-batch-docs, process-webhook-data, refresh-stale-reports, weekly-memory-summary — all in the templates directory today, parameterised so the operator instantiates without editing YAML, running the same engine that runs every other flow.

Audit trail. Part shipped, part on the runway. The engine records every step’s outcome, attempt count, cost, error, and gate verdict in the per-workflow encrypted store. The operator console surfaces the run timeline today — status, duration, output, error. The full per-step gate trail — here’s why the inbound filter blocked this output, here’s the exact pattern that matched — is what we’re surfacing next. Data recorded; UI catching up.

On the runway, future tense. Concurrent dispatch of independent branches — today the executor processes ready steps sequentially even when dependencies would allow parallelism. Document-search adapter is currently a stub; flows that search the user’s knowledge base as a step will work next milestone. The full audit-trail UI is the next operator-console iteration.

That’s the line. Above it is running. Below it is coming. We’re being precise because the alternative — telling you everything works today and hoping you don’t notice — is the demo culture that produced the Confluence-SOP problem.

Many soft amber paths flowing horizontally at low luminosity — calm, steady, busy — with one path lit bright gold where the flow has paused and a luminous human-warm form is approaching it. The system runs; the human is paged.

What the CEO actually does with this

The point isn’t to deploy a piece of software. The point is the lens.

Pick one of your SOPs. The most-broken one — where hand-offs fail monthly and your COO does the Friday-afternoon reconstruction. Walk through it step by step and label each one deterministic, judgment, or human-only. You did this exercise for roles in post 3; now you’re doing it for steps. The SOP is where roles meet processes.

Most of the steps that fail are deterministic ones being performed by a senior person because the only reason they were attached to a senior person was to make sure the judgment steps right next to them got done. Separate them. Let the deterministic steps run as code, the judgment steps run as gated AI, the human-only steps sit on an escalation surface where they get touched once and resolved.

What falls out is not a software requisition. It’s a different shape for your firm. The twenty-seven-step Confluence document collapses into five real steps and three judgment points. Headcount plan changes shape. Ops support becomes mostly unnecessary. Senior partners stop spending Thursdays reconstructing hand-offs and start spending them on bucket-3 work — relationships, taste, accountability that nobody else can have on their behalf.

The Confluence page can stay. Auditors still want it. Just don’t pretend it’s the SOP anymore. The SOP is the flow that ran last Tuesday at 9:14am, took three minutes of partner time, produced an engagement letter that passed every deterministic gate, and wrote its own receipt.

What to do now

Pick the one SOP that breaks the most. Not the most important — the most broken. Hand-offs that fail. Reconciliation Thursdays your COO dreads. That’s where the gap between document and reality is widest, which means it’s where the substrate has the most to do.
For every step, label it deterministic, judgment, or human-only. Force the call. Same exercise from post 3, applied to steps instead of roles. First pass takes an hour; second pass, where you correct the items you flinched on, takes twenty minutes.
Highlight every hand-off that’s human-to-human just to “pass it along.” Those are the steps that break. Anya’s old client-intake SOP had nine; the new flow has zero — the substrate is the routing layer.
Identify the judgment points. The moments where someone with experience makes a call. Those are the only places you actually need a human in the loop. Most SOPs have two or three. None have twenty-seven.
Draw the SOP again as a flow. Trigger at the top, deterministic steps where they belong, judgment points wired to gates, escalation surfaces where humans get paged. Notice how short it gets. Notice how the audit trail stops being a separate document and becomes a property of the run.
Compare the new flow to the headcount it implies. Pull post 3’s bucket exercise out again. The SOP collapsed into a flow needs a different team than the SOP staffed as a document. That’s your 2027 hiring plan.
Keep the Confluence page. Auditors and regulators still want it. Just stop pretending it’s how the work actually gets done. The flow is the SOP; the page is the artefact. Reconcile them once a quarter, not every Thursday.

Last post in the series. Across the four, the shape is one shift told four ways — cell stops being the unit of thought, desktop stops being the unit of work, org chart stops being the right shape, SOP stops being a document. Business is moving from artefacts humans push around to flows that run themselves and page humans only when judgment is actually required.

Some shipped today, some on the runway, all concrete enough to plan against. Pick your most-broken SOP. Start there.