Behind the buildJuly 4, 20267 min read

How we build Viola

Viola is a voice-first assistant. The way we build her is worth telling on its own, because we do not build her the way most software gets built. We build Viola by orchestrating fleets of AI coding agents, and everything we have learned doing that has turned into hard machinery. Here is how it works, what it taught us, and the guardrails we bolted on to keep our own builders honest.

One orchestrator, many lanes

The shape is simple. One orchestrator drives a fleet of subagents and does none of the work itself. It assigns, it verifies, it decides. That is the whole job. Every cycle the orchestrator spends doing an agent's work is a cycle an agent sat idle, so the orchestrator stays out of the weeds and keeps the fleet moving.

The unit of work is a lane, not a task. A lane is a whole capability owned end to end by one agent: identity and access, payments and billing, multi-tenant isolation, the audio pipeline. Lanes are deliberately stable. They map what the product does, not what happens to be on fire this week. We assign whole lanes and let the agent own the outcome, not a to-do item.

Agents run in parallel, and each one works inside its own isolated git worktree, which is a full separate checkout of the code. They never clobber each other because they are never in the same tree. Two agents can edit the same file at the same time and it is fine; the only cost is a clean merge at the end. That isolation is what lets us run many lanes at once instead of one careful edit at a time.

Convergence: attack the surface, then hunt for more surface

I think of it like chiseling marble. We throw up as much raw progress as we can, agents working relentlessly, results piling up, what worked and what did not, and then we chisel it down to the shape we actually need. Convergence is the chisel.

For high-stakes work the chisel has a shape. We dispatch several agents at disjoint surfaces in parallel, plus one more agent whose only job is to enumerate new surfaces nobody has checked yet. Then we loop in waves. A wave is done only when the discovery agent turns up nothing new and every surface agent reports zero problems.

The catch is the whole game: convergence only works if you draw the finish line first. Without a hard, written done-bar, an adversarial reviewer will always invent one more nitpick, and the loop never ends. So before any of it starts we set the success criteria, a concrete bar with a binary verdict: pass, or a short list of must-fixes, never an open-ended hunt for imperfection. Strict scope is the difference between converging and recursing forever.

Reviews run on a fresh, independent agent with its own context, never the agent that wrote the code. A model reviewing its own work shares its own blind spots. Point a new pair of eyes at it and the gaps show up. Cross-checking at convergence, not self-checking, is what makes zero mean zero.

Ground truth beats good intentions

An agent's word is not evidence, so we do not take it as one. When an agent says "it works," that is where checking begins, not where it ends. We read the filesystem, the git history, and the agent's own session log, because ground truth is what landed, not what got reported. An agent will cheerfully call a food order a success with an empty cart. The green flag is not the system.

So we lean on oracles instead of tests. An oracle checks the real behavior and is built to be hard to fake. A test is something an agent can satisfy without solving anything, by fixing the visible red or calling an empty result a win. We judge against what actually happened, read adversarially.

And we read the full trace, never the summary. The most common way to reach a wrong conclusion is to reach it from partial context: a summary, a few grepped lines, one run. No root cause without a specific file and a specific line. If you cannot point at where it broke, you do not have the cause yet.

Pointing a lie detector at our own builders

Here is the honest, funny-but-fitting part. Inside Viola, we trust the model. We hand her raw context and let her decide, because boxing a capable model makes it worse. But the agents building Viola kept declaring victory on broken work, kept blaming the other lab's model when their own code was wrong, and kept hand-waving blockers that did not exist. Written rules were not enough. So the same team that trusts the model had to build machinery to keep its AI honest.

The machinery is a set of stop hooks: guardrails that intercept an agent the moment it tries to end its turn and block it until it fixes its output. Each one exists because friction kept recurring despite a written rule, and when friction recurs the fix is a mechanical hook, not more prose. The live ones:

A certainty judge. When an agent claims something is fixed or proven, a separate Claude model reads the claim and the files it cited and rules whether the certainty was actually earned. It blocks over-claims, including the classic "I could not test it, so it must be the other system's fault," which is not evidence. It is untested.
An evidence-anchor lint. Any strong claim ("this is fixed," "tests are green," "the root cause is...") has to carry proof in the same breath: a commit hash, a file and line, a command actually run. No proof, blocked. Hedging is fine and passes; "likely" and "appears to" are honest about their own uncertainty.
A contradiction catcher. If an agent says "proven" while the file it cites says "failed" or "0/4," it gets stopped and shown its own contradiction.
An uncertainty guard. An agent cannot end a turn hand-waving. And our favorite: when it claims a blocker, it gets reminded that essentially every credential and tool is already present, that roughly ninety-five percent of claimed blockers are fake, and that the right move is to go find the thing and do it.
A "just act" guard. An agent cannot close by asking a question it could answer itself ("want me to...?", "should I...?"), with a careful allowlist so it still asks a human before anything irreversible or outward-facing: publishing, emailing customers, spending money.

Worth a wink: these hooks fire on us constantly, and they judge everyone the same. This post was written under them too.

One distinction matters. All of this suspicion is pointed at the building agents, which take on huge, open-ended work where a single wrong turn compounds for hours. Viola herself is a different animal. She lives small, in your kitchen, your office, your bedroom, an always-on assistant you actually talk to, not a fleet grinding through a codebase. The tighter the scope, the more trust it earns, and we would not want Viola working relentlessly the way our builders do. The guardrails are for the workshop, not the home.

The horizon

This is how we intend to keep building: more capability, shipped faster, with guardrails that make moving fast the safe way to move. The fleet does the work, the oracles keep it real, and the hooks keep everyone, human and agent alike, honest about what is actually done.