What Brain Science Teaches Us About Building Better AI Research Agents

Imagine you’re running a research project. You’ve got 30 search agents fanned out across the internet, each chasing down sources on overlapping subtopics. Sounds productive — until you check the logs. Agent 4 crawled the same Wikipedia article as agents 7, 12, and 19. Agent 11 is re-running a search query that agent 3 already answered ten minutes ago.

A third of your compute is doing work that’s already been done.

This isn’t a hypothetical. We measured it. Across real research projects, sibling search nodes had a 3.2x URL reuse ratio — meaning the same URL got crawled more than three times on average — and 31% of their search queries overlapped. That’s a lot of wasted tokens, wasted time, and wasted money.

Across real research projects, sibling search nodes had a 3.2x URL reuse ratio and 31% query overlap. A third of compute was doing work that had already been done.

Here’s the thing: your brain doesn’t work this way. When you’re researching a topic, different parts of your brain don’t independently go looking for the same information without telling each other. There’s a coordination mechanism — and a recent paper called BIGMAS borrows that mechanism to build better multi-agent AI systems. The ideas are worth stealing.

What BIGMAS actually does

BIGMAS stands for Brain-Inspired Graph Multi-Agent Systems. The “brain-inspired” part isn’t just marketing — it’s built on Global Workspace Theory, a neuroscience model that explains how consciousness coordinates specialized brain regions. The short version: your brain has specialized modules (vision, language, planning), and they coordinate by broadcasting to a shared workspace that everyone can read.

BIGMAS takes this idea and applies it to multi-agent AI with three components.

The GraphDesigner is a meta-agent — an agent that designs other agents. Give it a simple math problem, and it builds a small, focused pipeline: three agents in a line. Give it a complex spatial planning puzzle, and it builds a nine-agent graph with loops for iterative refinement. The key insight is that different problems need structurally different teams. A one-size-fits-all topology wastes resources on easy problems and starves hard ones.

The Global Workspace is the shared board where every agent reads and writes. No private channels, no information silos. When a generator agent writes candidate solutions, the validator can immediately see them. When the validator flags an error, the generator sees the feedback without waiting for a manager to relay it. Think of it like a shared Google Doc where everyone has edit access and can see changes in real time — except it’s structured into partitions (context, working memory, system state, answers) so agents know where to look.

Multiple luminous orbs feeding streams of light into a central glowing pool — the shared workspace where every agent's knowledge becomes visible to every other agent

The Orchestrator watches the workspace and routes work based on what’s actually been discovered, not just which agents have finished. It can detect when a solution has already been found (stop early), when agents are stuck in unproductive loops (break the cycle), and when the budget is running out (trigger a fallback). Failed runs in their experiments consistently required more routing decisions than successful ones — the Orchestrator earns its keep by catching problems early.

The results are solid. On mathematical reasoning and spatial planning benchmarks, BIGMAS consistently outperformed baselines across six different frontier models. GPT-5 hit 100% accuracy on two of three benchmarks with BIGMAS — something no single-agent approach achieved.

GPT-5 hit 100% accuracy on two of three benchmarks with BIGMAS — something no single-agent approach achieved. The difference is coordination, not capability.

Why this matters for research agents

The benchmarks are nice, but we don’t build puzzle solvers. We build research systems that manage 400-node graphs spanning hundreds of web sources. The problems are different — open-ended research has no verifiable “correct answer” — but the coordination failures are eerily similar.

The “30 searchers for a 5-searcher problem”

When we investigated our search pipeline, we found 918 cached pages across 581 domains from real research projects. About 14% of those pages were filterable waste: 86 from junk domains (dictionaries, app stores, crossword sites) and 59 with thin content (paywalls, blocked crawls, empty stubs). That’s roughly three garbage pages per 20-URL search node.

But the bigger waste isn’t the junk — it’s the duplication. Sibling search nodes can’t see each other’s work. Each one independently decides what to search, what to crawl, and what to synthesize. The result is that 3.2x URL overlap and 31% query overlap we mentioned earlier.

BIGMAS’s Global Workspace solves this by making every agent’s work visible to every other agent. When one agent has already crawled a URL or answered a subtopic, the others can see it and move on to uncovered ground.

Stop when you’ve found enough

Our coordinator makes all decisions based on node status — running, completed, failed — never on node content. It can’t detect that a search returned zero relevant results, that an aggregation is thin, or that a subtree already answered the parent’s question comprehensively. Every planned node executes regardless of what’s been found.

BIGMAS’s Orchestrator reads the actual workspace content before making routing decisions. It detects convergence (the answer’s already here, stop searching) and identifies unproductive cycles (these agents are going in circles). That’s content-aware routing — and it’s the difference between executing 30 search tasks because the plan says to and executing 12 because the data says that’s enough.

Catch bad outputs before they spread

Right now, agent outputs go straight to storage with no validation. If a searcher returns malformed JSON, an aggregator writes a one-sentence synthesis, or a planner generates unparseable child specs, the error propagates silently. Downstream agents either fail or produce garbage, and you don’t find out until the final report is incoherent.

BIGMAS validates every write instruction before applying it to the workspace. Path exists? Action type matches? Payload non-empty? If validation fails, the agent gets re-invoked with the error message as context — up to a configurable number of correction attempts.

That’s self-correction, not just retry.

Partial report beats total failure

When a research project hits its budget ceiling or cascading failures trip the circuit breaker, users currently see “project failed” — even when 80% of the research was successfully gathered. All that work, invisible behind a failure status.

BIGMAS handles this with a FallbackResolver that scans all completed work in the workspace and assembles the best available answer. It never returns empty-handed if any useful work was done. The same principle applies to research: a partial report with noted gaps is infinitely more useful than a blank screen.

The numbers that make this real

Let’s ground this in specifics. These aren’t numbers from a paper — they’re measurements from real research projects running through our engine.

We analyzed the crawl cache from production research runs: 918 cached pages across 581 unique domains. Of those, 86 pages came from junk domains — dictionaries like Merriam-Webster, app store listings, crossword puzzle sites, generic Wikipedia articles about single-word concepts. Another 59 had thin content: paywall blocks, failed crawls, stubs under 200 characters. With some overlap between categories, that’s 133 filterable pages — about 14% of the total — that shouldn’t have been sent to an LLM at all, burning roughly three pages of junk per search node.

The duplication story is worse. Across sibling search nodes working on related subtopics, the system crawled the same URLs an average of 3.2 times each. About 31% of search queries overlapped — siblings independently asking SearxNG nearly identical questions. The reranker analysis can’t be cached across queries because it’s query-specific (the same Wikipedia page gets completely different extractions for “wind speeds” versus “rainfall”), so every duplicate crawl means a full duplicate LLM call too.

Converging threads of light with tangled, overlapping zones where multiple paths redundantly trace the same ground — the cost of sibling agents that can't see each other's work

These aren’t hypothetical problems from a benchmark paper. They’re measurements from the system we run every day, on real research projects that real users pay for. That 14% junk rate and 3.2x duplication ratio aren’t edge cases — they’re the baseline we’re improving from.

What we’re building

BIGMAS validates its ideas on 10-node graphs solving math puzzles. We’re running 400-node DAGs doing open-ended research across hundreds of web sources. The ideas transfer — but you have to translate them to a very different scale and domain. Here’s what that looks like in practice.

Shared workspace for sibling visibility. A per-project context store where searchers write summaries of what they’ve crawled and found. Before a sibling starts searching, it reads the store to see what’s already covered. The planner reads it before creating deeper search tasks. Goal: 20-30% reduction in redundant LLM calls.

Output validation with self-correction. Per-agent-type validators that check outputs before they’re persisted. Searcher output must be valid JSON with substantive content. Planner output must parse into valid child specs. Aggregator output must reference multiple sources. On failure, re-invoke the agent with the error — up to two correction attempts. Goal: catch 5-10% of garbage outputs early.

Content-aware adaptive routing. Lightweight quality signals extracted from completed nodes — output length, citation count, JSON validity, “no results” keyword detection. The coordinator checks these before dispatching the next wave. If a subtree’s already well-covered, skip the remaining planned searches in that area. Goal: 10-20% fewer unnecessary node executions.

Fallback resolver for graceful degradation. When budget exhaustion or cascading failures prevent completion, walk the graph, collect everything that succeeded, and invoke the report writer with a salvage prompt. Mark the project “completed partial” instead of “failed.” Goal: convert 30-50% of failures into useful partial reports.

Adaptive replanning from quality signals. After each wave of search nodes completes, evaluate coverage gaps. Spawn targeted follow-up searches where results are thin. Skip remaining queued searches where coverage is already strong. Goal: more uniform research depth instead of ten searches on one subtopic and two on another.

A partial report with noted gaps is infinitely more useful than a blank screen. Graceful degradation turns failures into value.

The honest gap

Before anyone gets too excited — the BIGMAS authors designed it for a very different world than ours, and the ideas don’t transfer one-to-one.

BIGMAS solves closed-domain puzzles with verifiable solutions. Game24 has exactly one right answer. Tower of London has an optimal move sequence. You can check whether the system succeeded.

Open-domain research doesn’t work that way. “What are the competitive implications of the EU AI Act?” has no ground truth to validate against. Quality signals are heuristics, not verification.

BIGMAS operates at small scale: 10 nodes maximum, 15 execution steps. Our research graphs routinely hit 400 nodes across six levels of depth. Coordination strategies that work at small scale might not survive the combinatorial explosion of a 400-node graph. A shared workspace that’s easy to scan when 10 agents are writing to it could become a noisy mess when 200 agents are writing simultaneously.

BIGMAS uses cyclic graphs — agents can loop back for iterative refinement, which makes sense for puzzle-solving where you’re converging on a known answer. We use DAGs — directed acyclic graphs — because research is fundamentally breadth-first. You’re exploring a topic space, not converging on a solution. Cycles would create infinite loops in a research context where there’s no convergence criterion.

The ideas transfer. The implementation doesn’t. Every BIGMAS concept needs to be re-engineered for scale, for open-ended domains, and for DAG topology. That’s what the engineering work above is actually about — not porting BIGMAS, but learning from it.

What to do now

If you’re building multi-agent research systems, here’s what you can act on today:

Measure your duplication. Log URLs and queries per agent node, then compute overlap ratios across siblings. You can’t fix what you haven’t quantified. We found 3.2x URL reuse — yours might be higher or lower, but you won’t know until you measure.
Add output validation before storage. Even basic checks — is the JSON valid? Is the output longer than 200 characters? Does it contain the expected fields? — catch a surprising amount of garbage before it propagates downstream.
Make sibling work visible. A shared key-value store scoped to the current project, where agents write summaries of what they’ve found, gives every subsequent agent a chance to avoid redundant work. Start simple: URL-level dedup across siblings.
Build a salvage path. When a project fails, don’t throw away everything. Walk the graph, collect completed outputs, and generate a partial report with noted gaps. A partial answer is almost always more useful than no answer.
Add quality signals to your routing. You don’t need an LLM to assess output quality. Simple heuristics — length, citation count, keyword detection for “no results” or “insufficient data” — give your coordinator enough signal to make smarter dispatch decisions.
Read the BIGMAS paper. The implementation is narrowly scoped to small-scale puzzles, but the architectural patterns — global workspace, content-aware routing, structured validation — are generalizable. Figure 2 (the workspace schema) and Section 4.2 (the Orchestrator) are worth the most time.
Question your graph topology. Are you using DAGs because they’re right for your problem, or because they’re simpler to implement? Some research tasks genuinely benefit from iterative refinement loops. Others need pure breadth-first exploration. Match the structure to the problem — that’s the GraphDesigner’s core insight.