What Running a Real Multi-Agent AI System Actually Looks Like

Nova is my personal AI system and it's now in production with 34 agents spanning chat, pipelines, ERP, and language tasks. Here's an honest look at what that actually means day-to-day — and what I'm building toward next.

I've been building Nova for a while now, and somewhere in the last few months it crossed a threshold I wasn't explicitly aiming for: it became something I actually depend on. Not in a demo sense. In a "this is running in production and I'm routing real work through it" sense. That shift changes everything about how you think about the system.Nova is my personal multi-agent AI system. Right now it spans 34 agents across a handful of domains — general chat and reasoning, pipeline execution, ERP specialists that understand AIREP's data model, and language task agents. The orchestration layer decides which agents get invoked, in what order, and how their outputs compose into something useful. That's the phase I'm in now: production orchestration. Not prototyping. Not demos. Actual routing of real tasks through a real pipeline.The honest version of what this looks like: it's less glamorous than the marketing around "agentic AI" would suggest, and more interesting. The hard problems aren't the ones AI Twitter talks about. They're not "will the LLM hallucinate" — that's mostly a prompting and validation problem. The hard problems are orchestration correctness, context management across agent boundaries, and making the system legible enough that when something goes wrong, I can actually debug it.Context boundaries are the thing nobody talks about enough. When one agent's output becomes another agent's input, you're making a decision about what to carry forward and what to drop. Get that wrong and you get agents confidently working on stale or incomplete information. The system looks like it's working until it isn't. I've spent more time thinking about context scope — what each agent needs to know, and nothing more — than I have on the agents themselves.The 34-agent count probably sounds like a lot, and in some ways it is. But it's not 34 things running simultaneously. It's more like a roster. Most tasks invoke two or three agents. The value of having a wide roster is specificity — a language task agent that knows it's summarising a customer-facing ERP note has a different system context than one summarising a developer log. Same underlying model, different scope. That specificity matters more than the raw count.What I'm building toward is a self-improvement loop: agents that can review Nova's own code, identify weak points, and surface refactor suggestions. This is the part that actually excites me. Not because it's autonomous in some science-fiction sense — it won't be running unsupervised, rewriting itself overnight. But because the bottleneck in any software system is the developer's time and attention, and if I can build agents that do the first pass of code review, identify patterns across the codebase, and flag things worth looking at, that's a real compound advantage. The agent doesn't need to be right all the time. It needs to surface things I'd otherwise miss.The broader point I keep coming back to: AI is only useful in software development if you treat it as a systems problem, not a feature problem. Dropping a model into a product and calling it an AI feature is table stakes now. The advantage comes from building infrastructure around AI — memory, orchestration, feedback loops, domain-specific context — that makes the system smarter over time rather than just smarter on the first query.Nova is that infrastructure, for me personally. And increasingly it's informing how I think about AIREP and Find a Sign — not just "what AI feature can we add" but "how do we architect the system so AI makes the whole thing more capable over time."I don't have a tidy conclusion here. This is ongoing work. But if you're building anything in this space and you want to talk about the actual engineering of multi-agent systems — the orchestration, the context design, the debugging — I'm more interested in that conversation than most of what's being written about AI right now.

What Running a Real Multi-Agent AI System Actually Looks Like

Comments

Leave a comment