10  Memory and Callbacks

Managing conversation history and customizing the agent loop.

Note

Code Reference: code/v0.7/src/agentsilex/

  • callbacks.py
  • runner.py

10.1 The Memory Problem

Sessions grow indefinitely. After many turns:

session = Session()
# ... 100 conversation turns later ...
len(session.dialogs)  # Hundreds of messages!
# Token limit exceeded, API errors, slow responses

We need ways to manage history without building it into the core.

10.2 Solution: Callbacks

Hook into the agent loop with callbacks that run before each LLM call:

class Runner:
    def __init__(
        self,
        session: Session,
        context: dict | None = None,
        before_llm_call_callbacks: list | None = None,  # NEW
    ):
        self.session = session
        self.context = context or {}
        self.before_llm_call_callbacks = before_llm_call_callbacks or []

    def run(self, agent: Agent, prompt: str) -> RunResult:
        # ...
        while loop_count < 10 and not should_stop:
            # Run callbacks before each LLM call
            for callback_func in self.before_llm_call_callbacks:
                callback_func(self.session)

            dialogs = self.session.get_dialogs()
            # ... LLM call ...

Callbacks receive the session and can modify it before each LLM call.

10.3 Built-in: keep_most_recent_2_turns

A simple callback that keeps only recent conversation turns (callbacks.py):

from agentsilex.session import Session


def keep_most_recent_2_turns(session: Session):
    MOST_RECENT = 2  # dialog turns to keep

    msg_count = 2 * MOST_RECENT  # include user and agent messages

    if msg_count < len(session.dialogs):
        session.dialogs = session.dialogs[-msg_count + 1:]

This keeps only the last 2 turns (4 messages: 2 user + 2 assistant).

10.4 Usage

from agentsilex import Agent, Runner, Session
from agentsilex.callbacks import keep_most_recent_2_turns

agent = Agent(
    name="assistant",
    model="gpt-4o",
    instructions="You are helpful.",
    tools=[],
)

session = Session()
runner = Runner(
    session,
    before_llm_call_callbacks=[keep_most_recent_2_turns],
)

# Even after many turns, only recent history is sent to LLM
for i in range(100):
    runner.run(agent, f"Message {i}")

# Session is pruned before each LLM call
len(session.dialogs)  # Small number, not 100+

10.5 Custom Callbacks

Create your own memory strategies:

10.5.1 Keep Last N Turns

def keep_most_recent_n_turns(n: int):
    """Factory for keeping N recent turns."""
    def callback(session: Session):
        msg_count = 2 * n
        if msg_count < len(session.dialogs):
            session.dialogs = session.dialogs[-msg_count + 1:]
    return callback

# Usage
runner = Runner(
    session,
    before_llm_call_callbacks=[keep_most_recent_n_turns(5)],
)

10.5.2 Token-Based Truncation

import tiktoken

def keep_under_token_limit(max_tokens: int = 4000):
    """Keep history under token limit."""
    encoder = tiktoken.get_encoding("cl100k_base")

    def callback(session: Session):
        while True:
            text = str(session.dialogs)
            tokens = len(encoder.encode(text))
            if tokens <= max_tokens:
                break
            # Remove oldest message (keep system prompt if any)
            if len(session.dialogs) > 1:
                session.dialogs.pop(0)
            else:
                break

    return callback

10.5.3 Summarization

def summarize_old_history(summarizer_agent: Agent, keep_recent: int = 3):
    """Summarize old turns, keep recent ones."""
    def callback(session: Session):
        if len(session.dialogs) <= keep_recent * 2:
            return

        # Split old and recent
        cutoff = -(keep_recent * 2)
        old = session.dialogs[:cutoff]
        recent = session.dialogs[cutoff:]

        # Summarize old history
        summary_session = Session()
        summary_runner = Runner(summary_session)
        result = summary_runner.run(
            summarizer_agent,
            f"Summarize this conversation briefly:\n{old}"
        )

        # Replace with summary + recent
        session.dialogs = [
            {"role": "system", "content": f"Previous context: {result.final_output}"}
        ] + recent

    return callback

10.5.4 Logging Callback

def log_conversation():
    """Log each conversation turn."""
    def callback(session: Session):
        print(f"[Turn {len(session.dialogs)}] Messages: {len(session.dialogs)}")
        # Or write to file, database, etc.
    return callback

10.6 Combining Callbacks

Callbacks run in order:

runner = Runner(
    session,
    before_llm_call_callbacks=[
        log_conversation(),
        keep_under_token_limit(8000),
        inject_user_context(user_id),
    ],
)

Order matters! In this example:

  1. Log current state
  2. Truncate if too long
  3. Inject user-specific context

10.7 Callback Signature

All callbacks must accept a Session:

def my_callback(session: Session) -> None:
    # Modify session.dialogs as needed
    pass

10.8 Example: Stateful Counter

def count_turns():
    """Track turn count in callback closure."""
    turn_count = 0

    def callback(session: Session):
        nonlocal turn_count
        turn_count += 1
        print(f"Turn {turn_count}")

    return callback

runner = Runner(session, before_llm_call_callbacks=[count_turns()])

10.9 Why Callbacks Instead of Built-in Memory?

Built-in Memory Callbacks
One-size-fits-all You decide the strategy
Hidden behavior Explicit and visible
Hard to customize Easy to customize
Framework decides You decide

AgentSilex philosophy: give you the hooks, not the implementation.

10.10 Key Design Decisions

Decision Why
Callbacks as functions Simple, no class hierarchy
Run before LLM call Last chance to modify history
Session passed directly Full access to modify
List of callbacks Composable behaviors
TipCheckpoint
cd code/v0.7

Callback system ready! Implement any memory strategy you need.