"Principles of Building AI Agents" is a worth a Read

The other day on LinkedIn, Sam Bhagwat, the founder of Gatsby and founder/CEO of Mastra.ai, was sharing his book, Principles of Building AI Agents, with many people. He posted that after telling an investor about it, they said he should distribute it even more widely, so he would share the book with anyone who commented "Book." I commented "Book," and even though I haven't linked my company email to LinkedIn, a PDF download link was sent to my work address lol.

The book is structured to explain keywords for each part while discussing what to be mindful of when building agents. The sample code, naturally, uses Mastra. I believe anyone who has ever worked with Mastra will find the code easy to understand.

As the title "Principles of Building AI Agents" suggests, the beginning covers many topics that anyone interested in AI agents has likely heard of or already knows. You may be able to skim Parts 1 and 3 on MCP. However, the other parts are a must-read for those who have lightly built agents with a framework or for those who are interested in agents but haven't had a chance to try them out yet.

In conclusion, I think this book is a must-read for anyone interested in AI agents.

Personally Interesting Parts

The example in Part 2 about Alana Goyal, one of Mastra's investors, was very insightful. When I first tried to build an agent myself, I tried to cram in as much as possible and failed spectacularly. After reading this part and Part 6, I feel that writing a simple flowchart before starting to code is quite a good practice for building agents.

Also in Part 2, the section on dynamic agents was interesting.
The following is part of a new Agent function, and I thought, "Ah, that's a valid approach." However, I also thought that with this implementation, if the service doesn't have a trial, users might find the output results underwhelming and think, "I'm not using this anymore."

  return `You are a customer support agent for our SaaS platform.
The current user is on the ${userTier} tier and prefers ${language} language.
For ${userTier} tier users:
${userTier === "free" ? "- Provide basic support and documentation links" : ""}
${userTier === "pro" ? "- Offer detailed technical support and best practices" : ""}
${userTier === "enterprise" ? "- Provide priority support with custom solutions" : ""}
Always respond in ${language} language.`;
  },

The part about streaming updates in Part 4 honestly hits close to home. When I was building a prototype for an AI product at my current company, the PM and the team decided to go without streaming due to time constraints. At the time, I thought, "Are you serious?" but I wasn't able to convincingly argue how important streaming was. I can't help but wish I had known this content last year.

Regarding Part 5 on RAG and its alternatives, I was reading it with interest, but it also made me think that despite presenters at a Google AI event earlier this year loudly proclaiming "Agentic RAG," I haven't seen or heard the term "Agentic RAG" much since.

Finally, the MultiModal section at the end, partly because I recently watched a video about the founder of Eleven Labs, made me feel that this is an area that will develop from now on. I felt that, depending on the domain, it might be possible to compete with the foundational model creators like OpenAI, Anthropic, Google, and Meta.

Below is a summary of the main points from each part

Part 1: Prompting Large Language Models (LLMs)

Key Points for Model Selection:
- Hosted vs. Open-Source: It's efficient to start prototyping with hosted APIs like OpenAI or Anthropic.
- Accuracy vs. Cost: Start with expensive, high-accuracy models. Once the functionality is confirmed, optimize for cost.
- Context Window: Models with huge context windows, like Google Gemini 1.5 Pro, enable new applications.
Principles of Good Prompts:
- Few-shot: Providing multiple specific examples (input + output) improves output accuracy.
- System Prompt: Gives the agent a specific role or personality (persona).
- Formatting: Using structured formats like XML tags or uppercase helps the model understand instructions accurately. Production prompts tend to become very detailed.

Part 2: Building Agents

Components of an Agent: An agent sits on top of an LLM and can use tools, manage memory, and collaborate with other agents.
Tool Design is Crucial: The most important step in building an agent is breaking down a task into concrete operations and designing each as a tool.
Importance of Memory: Essential for maintaining context in long-term conversations. Hierarchical memory, which combines recent interactions with relevant past conversations, is effective.
Dynamic Agents: Can dynamically change models, tools, and instructions at runtime based on user attributes (like subscription tier).

Part 3: Tools and MCP

Key Tools: It's essential for agents to have tools for web search and integration with third-party services like Gmail and Salesforce.
MCP (Model-Context-Protocol):
- A standard for connecting AI agents and tools (like a USB-C for AI).
- It allows agents and tools written by different developers in different languages to be easily connected.
- In 2025, major AI companies have announced support, and it is becoming a de facto standard.

Part 4: Graph-Based Workflows

Ensuring Predictability: When the output of a highly flexible agent is unstable, a graph-based workflow that breaks down tasks into steps is effective.
Basic Workflow Operations: Define complex processes by combining branching, chaining, merging, and conditional logic.
Pausing and Resuming: The ability to pause a workflow to wait for external input, like human approval, and resume it later is important.
Streaming and Observability:
- Streaming: Displaying the intermediate progress of a process to the user in real-time is essential for a good UX.
- Tracing: Recording and visualizing the input and output of each step (observability) is extremely important for debugging non-deterministic AI.

Part 5: Retrieval-Augmented Generation (RAG)

How RAG Works: A technique that searches a proprietary knowledge base (like documents) and provides the relevant parts as context to an LLM to generate more accurate answers.
Alternatives to RAG:
- Giant Context Window: A simple method of feeding the entire text directly to the model.
- Agentic RAG: Giving an agent tools to access data instead of just performing text retrieval.
Guidelines for Building RAG: Instead of immediately building a complex RAG system, one should first try using a giant context window or Agentic RAG. If those are insufficient, then consider building a RAG pipeline.

Part 6: Multi-Agent Systems

AI as a Team of Specialists: A system where multiple agents with specialized roles (e.g., planning agent, coding agent, review agent) collaborate to solve more complex tasks.
Design Philosophy: Designing multi-agent systems is similar to organizational design in a company.
A2A (Agent-to-Agent): A protocol for communication between untrusted external agents. Alongside MCP, it has the potential to become standardized.

Part 7: Evals

Measuring AI Quality: To evaluate the non-deterministic output of AI, evals, which measure quality on a score from 0 to 1 rather than a simple pass/fail, are important.
Types of Evals: There are various evaluation methods to measure the faithfulness of answers, the presence of hallucinations, the appropriateness of tool use, and more.
Human Review: In addition to automated evaluation, A/B testing and human review of production data are also essential.

Parts 8 & 9: Development, Deployment, and More

Local Development: A chat UI for checking agent behavior and workflow visualization tools can enhance development efficiency.
Deployment Challenges: Agents often run for long periods, making deployment on typical serverless platforms with timeout limits difficult.
Multimodal: AI applications for images, audio, and video are still developing compared to text. Real-time audio, in particular, is technically challenging.
Future Outlook: The evolution of reasoning models, agent learning where agents learn from their own logs, and the increasing importance of security are predicted. The field is evolving so fast that everyone will remain a "perpetual novice."

0xkoji @0xkoji