Autonomous agents & workflow intelligence.

Most agentic workflows work in a demo and fall over in production. The work is making them honest about what they don't know, so they act only where they should, and hand back when they shouldn't.

The gap.

Most agentic workflows work in a demo and fall over in production. The gap between the two is where every real deployment lives, and it is wider than the demos suggest.

The problem is not that agents can't do the work. It's that businesses can't yet trust them to do it unsupervised. Five separate things stand in the way, and they reinforce each other.

Where it breaks.

01Reliability under real conditionsDemos run on clean inputs. Production does not. Real workflows bring partial data, systems that time out, formats nobody planned for, and edge cases that were never scripted. An agent that succeeds ninety-five times in a hundred sounds dependable until it runs ten thousand times a day. And because steps chain together, a small error early becomes a large one by the end. Most teams cannot say in advance which cases will fail, or why.
02Visibility into the decisionWhen an agent acts, the business often cannot see how it got there, or whether it was working inside its competence or guessing past it. Without that, the decision cannot be audited, defended to a regulator, or improved. “The model decided” is not an answer a risk committee will accept.
03The cost of being wrong is not symmetricA poor product suggestion is cheap. A wrong payment release, a wrong credit decision, a wrong fraud block on a genuine customer is not. The more an action is worth, the higher the bar, and that is exactly where agents are least trusted to act alone.
04Knowing when to stopA dependable agent has to recognise the edge of its own competence and hand back to a person rather than guess. Most do not. They answer with the same confidence whether or not they should, which is worse than not answering at all.
05Integration and accountabilityAgents have to plug into systems built for human operators, under rules that assume a human is responsible for the outcome. Who answers for it when an agent gets it wrong is still an open question in most organisations.

What we work on.

The common thread is confidence, not capability. The work that matters is not making agents more powerful. It's making them honest about how certain they are, holding them to act only within that, and handing control back to a person at the edge.

That is the problem we work on. Agents that carry a measure of their own certainty, that act when it is high and escalate when it is not, and that leave a record of why. It draws directly on the same ground as our work in decision systems and model risk: calibration, uncertainty, and judgement under pressure.

An agent that knows its limits is worth more than one that doesn't, even when the second one is more capable. Especially then.