Concept optimization for Google Search

If you don’t know, ADK is Google’s new open-source agent framework. Over the last couple of months, I have been experimenting with designing and building multi-agent systems.

One of the challenges I have found along the way is choosing the most appropriate agent architecture for my use case. Online examples and labs often show simplistic use cases with unrealistic multi-agent architectures that are too complex and expose the agent’s nondeterministic nature. E.g. a weather app or simple calculator

Questions I have been left with are:

  • How should I design a multi-agent system?
  • Are specialist agents the way to go?
  • Do I need an orchestrator agent?
  • Is it ever appropriate to invoke an LLM within a tool?

Context

My use case is as follows: I want to generate a report that explains and summarises the results of a study. The report has many sections, and needs access to internal data.

I am using google-adk version 1.7.0.

Agent Architecture Lessons

In the world of agent systems where everything is new and developing constantly, it is  impossible to know the correct architecture & limitations straight off the bat. I had good intentions when I started my work with the aim to begin small and simple to get it working, then add complexity later. However, I was influenced by blogs promoting a micro-services approach to agents and ended up adding more and more agents in my workflow. However this brought problems with context management and understanding agent behaviour - keep reading - and I eventually scaled it back down.

Did it feel productive to spend days amending prompts and then reverting my commits? Definitely not. But this can be the best way to learn. Here are some lessons I want to share.

Only bring a new agent into the picture if the growing tasks of one agent become too broad

In an ideal world, each agent could handle a distinct task and know when it’s time is up, and to pass its response to the next agent. However, in practice, I found that having more than 2 agents increases confusion over their own responsibilities and they do not delegate when they should be. They also got stuck in loops between themselves, spending unnecessary ££.

I’d recommend starting with one agent and only split out into another agent if the task becomes too broad and complex for one agent to handle. In my case, I have maintained two agents, one for data retrieval, and one to do everything else. Limiting your number of agents provides more control and understanding over their decision making process, allowing for easier debugging. Despite longer prompts, this massively outweighs having an extremely nondeterministic workflow.

from google.adk.agents import Agent
from google.genai import types


from .prompt import INSTRUCTION
from .sub_agents.generation.tools import (
   generate_summary_section,
   generate_brand_metrics_section,
)
from .sub_agents.retrieval import retrieval_agent

root_agent = Agent(
   model="gemini-2.5-pro",
   name="report_generation_agent",
   description="Agent for generating and refining sections of an analytical report",
   tools=[
       generate_summary_section,
       generate_brand_metrics_section,
   ],
   sub_agents=[retrieval_agent],
   instruction=INSTRUCTION,
   generate_content_config=types.GenerateContentConfig(temperature=0.2),
)

Choose a multi-tool agent over multi-specialist agents

flowchart TB
A["Orchestrator Agent"] <--> B["Specialist Agent 1"]
A <--> C["Specialist Agent 2"]
A <--> D["Specialist Agent 3"]
B <--> E["Tools"]
C <--> F["Tools"]
D <--> G["Tools"]

style A fill:#25d2ab,color:#ffffff,stroke-width:0

You may be seeing the term “specialist agents” commonly referenced these days. Take this advice with a pinch of salt. 

I experimented with this microservice architecture with each specialist agent managing an individual report section. Each agent had the ability to retrieve relevant data, generate a summary and execute all postprocessing. The theory was that I could easily scale my system each time a new report section came along, all I’d need to do is add a new agent to be in charge.

However, I found this added complexity to delegation, context management and error handling. Think of having a separate agent to generate each report section like a different person writing each section. They might have a slightly different style, contradict each other or just not have enough context of the full report. Agents do not act how you expect them to sometimes, and managing the flow of execution is challenging. 

Providing tools to an agent instead, enables easier context management. Each tool has its own purpose and instructions provided within the function’s docstring. The concern of the agent is tool selection and parameter passing, whereas, the tools are the actual building blocks of your workflow. Each tool my retrieval agent has access to, retrieves the data for a specific report section i.e. brand metrics. My agent’s system instructions detail how to decide which tool to invoke based on which section is being generated. This stricter design limits the possibility of retrieving irrelevant data, or the agent getting stuck choosing which data to retrieve.

Split your context / tasks into multiple tools and only take a multi specialist agent approach when the aim of your entire system is very broad, requiring tasks to run in parallel, and do not require shared context.

No, you do not need an orchestrator agent

I first designed my architecture to have an orchestrator agent to handle all conversation. The prompt contained instructions of how to interact with the user, determine intent and decide which sub-agent to delegate to. However, I found this to be overkill. Let the other agents do this themselves, they’re smart enough and can talk to each other. And anyway, a lot of the time they did not take notice of my instructions to pass back to the orchestrator / chat agent. Allow each agent to have their own autonomy and interact with the user, it means simpler prompts and less LLM calls.

Invoke an LLM within a tool when you need structured output

Since an agent calls an LLM itself, it may seem strange to provide a tool to an agent that also calls an LLM. Can’t the agent handle the generation itself? I found a limitation of ADK agents being that they cannot have defined output schemas if they have access to tools or control transfer to other agents.

The google-genai package allows for structured schema outputs. I have defined a schema with Pydantic and provided this to an LLM inside a tool that my generation agent has access to. This way, the LLM’s response is always consistent to my schema and my generation prompt is separated out from the agent’s instructions which helps to limit variability. There is separation of concerns now that my tool is solely responsible for the generation of a specific desired output, and my agent can handle the decision making for when to invoke it.

from google import genai
from google.genai import types
from pydantic import BaseModel, Field


class SummarySchema(BaseModel):
   insights: str = Field(
       """Markdown summary of insights"""
   )
   recommendations: str = Field(
       """Markdown summary of recommendations"""
   )

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
generated_content = client.models.generate_content(
           model="gemini-2.5-pro",
           contents=prompt,
           config=types.GenerateContentConfig(
               temperature=0.3,
               top_p=0.95,
               top_k=40,
               max_output_tokens=4096,
               response_mime_type="application/json",
               response_schema=SummarySchema,
           ),
       )

Value real user & agent interactions when designing your architecture

My sequence of agent execution is retrieval first, and generation second. Despite instructing my agents to follow this pattern, the order of execution was never guaranteed. I explored ADK’s SequentialAgent, thinking it would be the answer to my problems. Whilst experimenting with requests, I realised the execution order is more nuanced than a strict sequence. Think about how your users will realistically be communicating with your agents.  If they ask “hello, what can you do for me?”, will you really want the entire sequence of agents to be invoked? Seems like an unnecessary use of resources. My recommendation is to play around interacting with your agents to determine the real flow you desire. SequentialAgents are a nice option if this works for you, but they did not provide me with enough flexibility.

from google.adk.agents import SequentialAgent
from .sub_agents.retrieval import retrieval_agent
from .sub_agents.generation import report_generation_agent

root_agent = SequentialAgent(
    name="root_agent",
    description="Executes a sequence of retrieving data and generating a report.",
    sub_agents=[retrieval_agent, report_generation_agent],
)

Conclusion

The fun of developing with new tooling is the iteration process and the lessons that come from them. My current architecture may still not be perfect, but the nature of this process is trial and error. Prioritise real development experience compared with hypothetical scenarios. Agents are nondeterministic in nature and so experimenting with them will teach you how they really respond and interact. I recommend starting with a single multi-tool agent to maintain control, and grow the number of agents gradually so you can better monitor their decision making and behaviour.