1  What is an AI Agent?

Before we write any code, let’s understand what we’re building.

1.1 The Simplest Definition

An AI Agent is a program where an LLM controls the execution flow.

Traditional Program: Human writes logic → Program executes
AI Agent:           Human sets goal → LLM decides what to do

The key difference: in traditional programming, you specify how to solve a problem. With agents, you specify what you want, and the LLM figures out the how.

1.2 Why Build Your Own Framework?

Existing frameworks like LangGraph, OpenAI Agents SDK, and Google Agent Development Kit are powerful, but:

Problem Consequence
Too much abstraction Hard to debug when things go wrong
Hidden complexity Can’t customize core behavior
Frequent API changes Code breaks between versions
Kitchen-sink design 90% of features you’ll never use

By building your own minimal framework, you:

  • Understand every line — No magic, no surprises
  • Customize freely — Change anything without fighting the framework
  • Stay lightweight — ~1000 lines vs 100,000+ lines
  • Learn deeply — Best way to understand agents is to build one

1.3 Key Components

Every agent has four essential parts:

  1. Brain — The LLM that makes decisions
  2. Instructions — System prompt defining behavior
  3. Tools — Functions the agent can call
  4. Memory — Conversation history and context

flowchart LR
    User[User] --> Agent
    Agent --> LLM[Brain/LLM]
    LLM --> Tools
    Tools --> LLM
    LLM --> Agent
    Agent --> User

1.4 The Agent Loop

The core pattern is surprisingly simple:

while True:
    # 1. Send conversation to LLM
    response = llm.complete(messages, tools)

    # 2. If LLM wants to use a tool
    if response.has_tool_calls:
        for tool_call in response.tool_calls:
            result = execute_tool(tool_call)
            messages.append(result)
        continue  # Go back to LLM with results

    # 3. Otherwise, return the final answer
    return response.content

That’s it. Everything else is details.

1.5 What Makes Agents Powerful

Unlike traditional chatbots, agents can:

  • Take actions — Search the web, write files, call APIs
  • Iterate — Try something, observe the result, adjust
  • Compose — Break complex tasks into steps
  • Delegate — Hand off to specialized sub-agents

1.6 A Concrete Example

Imagine asking: “What’s the weather in New York and should I bring an umbrella?”

Chatbot response:

“I don’t have access to real-time weather data.”

Agent response:

  1. Calls get_weather("New York") → “72°F, 30% chance of rain”
  2. Reasons about the result
  3. Returns: “It’s 72°F in New York with a 30% chance of rain. A light umbrella might be useful, but it’s not essential.”

The agent acts on the world, not just generates text.

1.7 Our Goal

In this book, we’ll build a framework that enables all of this in ~1000 lines of Python:

  • Part I — Single agent with tools
  • Part II — Multi-agent handoffs and composition
  • Part III — Production features (observability, streaming, MCP)

No magic, no hidden complexity. Let’s start.