Build, buy, or borrow
Every product is a stack of decisions about who does the work. Some of it you build, because nobody has built the thing you actually need. Some of it you buy or rent, because the work is real, ongoing, and not yours to own. And some of it you borrow, because brilliant people already solved the hard part and gave it away. Viola is all three, and being honest about which is which is the whole point of this post. So here is the split: what we built, what we pay for, and what we borrowed.
Part one: what we built
The rule we hold is not "never build." It is "do not rebuild what already exists and works." That leaves a short, deliberate list of things that were genuinely ours to make, because nothing off the shelf fit.
- ViolaWake, our own wake-word model. The little listener that answers to "Viola" is trained by us and runs free, locally, on your own CPU. It never leaves your machine and never costs a cent to run.
- One agent loop that trusts the model. Viola has no intent classifiers, no keyword-parsing of your words, no hidden hints injected to steer the answer. The model gets the raw context and is trusted to decide. That is a philosophy we built on purpose, not a framework we adopted. We wrote about the thinking behind it in How We Build Viola.
- The intent waterfall. Roughly three-quarters of what you ask Viola never reaches a paid model at all, because a cheaper local path handles it first. That is a cost-avoidance engine we designed and built, and it is a big reason Viola stays affordable to run.
One honest note in this section: we also hand-built a couple of things we should have borrowed, and later corrected course. That is the reason the rest of this post reads the way it does. Building the wrong thing teaches you fast which things are worth building.
Part two: what we buy and rent
This is where the money goes, and every line of it earns its place.
The server we rent instead of being. Production used to run as Docker containers on the founder's own Windows desktop, tucked behind a tunnel. Under memory pressure the Docker engine wedged, again and again, five times in a single bad day. Each time everything went down. There was no auto-restart and no alert, and one of those 502s was caught only by chance. Renting an always-on cloud server (Hetzner) fixed that entire class of problem at once: it is somebody's whole job to keep the box up, and it is not sitting under a desk fighting other programs for memory. We keep the desktop only as a warm rollback. Renting the box beat being the box.
Inference we rent per token. Viola runs a managed model (gpt-5.4-mini) for the everyday work, and when a job is big enough she hands it to a sub-agent that runs at a higher reasoning level, doing more thinking per turn and costing more to run. Local inference is free on your own CPU and stays the default for anyone who wants it, bring-your-own-key included. For the managed path we pay per token, and the reason is not that tokens are cheap. The reason is that owning enough compute to serve real users is ridiculous for a team our size. The GPUs, the power, the cooling, at the scale this actually needs, is not a bill a small team can carry. So we rent the compute by the token instead, and the only GPU we ever rent outright is serverless and optional, spun up for a task and gone the moment it finishes.
The phone line we cannot build. You cannot build a phone carrier (we tried), so you buy one. Telnyx is the purest "buy" on the whole list. We really did try hand-rolling the call audio path ourselves, and it failed silently, no audio ever reaching the wire. Switching to the carrier's own audio primitive worked on the first real call. Some problems are the vendor's specialty, and phones are theirs. Maybe building our own carrier makes the roadmap one day. It is very much not on it.
The edge in front of the origin. Cloudflare is the front door: the tunnel, the CDN, Pages for the website, R2 for storage. We buy it because self-hosting the edge would re-expose the single point of failure the server move was meant to remove. The whole point was to stop being one fragile box; putting our own edge back in front would quietly undo that.
The card rail we deliberately do not touch. We pay Stripe's fees precisely so we never become a custodian of your card. Card data lives with Stripe and never lands with us, which keeps us out of the compliance blast radius entirely. The one payment rail we do self-host is crypto, through BTCPay, because there the tradeoff runs the other way.
Part three: what we borrowed
Most of what you would want to build already exists, built by people who spent years on that one problem and gave it away. Before building any real piece, we go find out whether it already exists and is good. Not a lazy glance, an actual look. Here is who we owe.
How she hears you. Silero VAD tells Viola when someone is actually speaking versus when a fan is running. faster-whisper (running OpenAI's Whisper) turns speech into text. These are her ears, and we wrote neither.
How she speaks. Kokoro is her voice, a small neural text-to-speech model that sounds genuinely warm for its size. Pipecat is the real-time pipeline that stitches hearing, thinking, and speaking together fast enough to hold a live phone call.
How she acts and sees. Playwright drives a real browser, which is how Viola both reads web pages and does things on them. Her tools plug in through the Model Context Protocol, an open standard, so her abilities are a set she can reason about rather than a hard-coded list. We covered all of this as her hands and eyes in Breaking Viola Out of Her Box.
The story that made the whole point. Here is the one that convinced us. We did build our own login and accounts system. Thousands of lines, tested, working. Then we took the honest look every team eventually takes at its own authentication: getting it exactly right is a full-time specialty, and the quiet details are the ones that matter most. A mature open-source project, GoTrue, already handles all of it and has been battle-tested by far more people than we will ever have. We moved to it. It stung for about a day, and then it was obviously right. The thing we were proudest of was the thing best left to specialists, and someone had already done it better.
The spine. Underneath all of it, PostgreSQL holds the data with real per-user isolation, FastAPI is the backbone of the API, and Qt (through PySide) is the desktop app you see and click. The wake word that answers to "Viola" reuses the front-end of OpenWakeWord. None of these are ours, and we are better for it.
Where the lines land. Build the thing that does not exist. Rent the work that is real, ongoing, and somebody else's specialty. Borrow what brilliant people already gave away, and say thank you. Get any of the three wrong and you pay for it in months, outages, or your users' safety. (One honest footnote: a couple of tools we lean on, like the animation library on this very website, are free to use but not open-source licensed. We are grateful for those too, and we try not to blur the line.)
So this is less a tech post than a map of our tradeoffs, and a thank-you. If you maintain one of the borrowed projects above, you helped build Viola whether you meant to or not. And if you sell one of the services we rent, you are the reason we sleep at night instead of watching a desktop for the next 502. We are trying to be a good guest in every house we stand in.
A few of the shoulders we stand on
