<p>There's a version of this post that starts with "I'm excited to share" and ends with a screenshot of a clean UI. This isn't that post.</p><p>Nova — my personal multi-agent AI system — is now in what I'm calling the production orchestration phase. That means 34 agents are running across general chat, pipeline execution, ERP specialists, language tasks, and more. It's the first time I can say the system is genuinely doing useful work rather than being an interesting experiment. And getting here has taught me more about what production AI actually means than any blog post, course, or framework doc managed to.</p><p>The core lesson: production is a completely different problem from building.</p><p>When you're prototyping an AI agent, you're asking "can this work?" The answer is almost always yes, with enough prompt engineering and the right model. The question nobody warns you about is: can this keep working, correctly, when you're not watching? That's the production problem. And it's not primarily an AI problem — it's a software engineering problem that AI makes harder.</p><p>With 34 agents, you have 34 things that can fail silently. An agent that produces plausible-sounding output when it's actually confused is worse than one that throws an error. At least errors surface. Confident wrongness doesn't. So a lot of what I've built around the agents isn't more AI — it's the boring scaffolding: structured output validation, logging, failure routing, and clear contracts between agents about what they're responsible for.</p><p>The architecture I landed on separates agents by role pretty strictly. General chat agents don't touch pipelines. Pipeline agents don't improvise. ERP specialist agents have tight scope and always defer to a human decision for anything with side effects. This sounds obvious written down. It took several iterations of watching agents cheerfully step outside their lane to learn it.</p><p>The other thing that becomes real in production is the question of trust calibration. How much do you trust the output of any given agent, and does that trust level match what you're actually using it for? Nova handles tasks ranging from low-stakes (drafting a note) to high-stakes (informing decisions about client systems). Those aren't the same trust threshold. I treat them differently, and the system is designed to make that distinction explicit rather than leaving it to feel.</p><p>What I'm working toward now is a self-improvement loop — agents that can review, refactor, and improve Nova's own code and configuration. It's the most technically interesting goal I've set for the system. It's also the one that requires the most rigorous constraints. An agent with write access to the system it runs inside is not something you give a loose brief and walk away from. The safety model has to come first, and right now I'm still designing it carefully before I build it.</p><p>I think the broader point worth making is this: AI systems become real software when you stop optimising for demo quality and start optimising for operational reliability. That shift requires treating AI components the same way you'd treat any other critical dependency — with defined interfaces, observable behaviour, and failure modes you've thought about in advance. The models themselves are remarkable. The engineering discipline around them is what makes them useful.</p><p>Nova started as a personal productivity experiment. It's become the most instructive software project I've run in years — not because the AI is magic, but because building something you actually rely on forces honesty about what's working and what isn't. There's no hiding behind "good enough for a prototype" when the prototype is your daily driver.</p><p>I'll keep writing about specific pieces of the architecture as they solidify. The self-improvement loop, when I get it right, will be worth a post on its own.</p>
34 Agents in Production: What Running a Real AI System Actually Looks Like
Nova, my personal AI system, just hit a production orchestration phase with 34 active agents. Here's what that actually means — and what nobody tells you about building AI systems past the demo stage.
Comments
No comments yet — be the first!
Leave a comment