8 Observability
You can’t improve what you can’t measure. Let’s add tracing.
Code Reference: code/v0.5/src/agentsilex/observability.py
8.1 Why Observability?
Agent systems are hard to debug:
- Which tool was called and when?
- How long did each LLM call take?
- What was the full conversation flow?
- Where did things go wrong?
8.2 OpenTelemetry Setup
We use OpenTelemetry — the industry standard for distributed tracing (observability.py):
import os
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from contextlib import contextmanager
_tracer: trace.Tracer = None
def setup_tracer_provider():
"""Setup TracerProvider with OTLP exporter if endpoint is configured"""
endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
if endpoint:
exporter = OTLPSpanExporter(endpoint, insecure=True)
provider = TracerProvider(
resource=Resource.create({"service.name": "agentsilex"})
)
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
def initialize_tracer():
"""Initialize the global tracer instance"""
global _tracer
if _tracer is None:
_tracer = trace.get_tracer("agentsilex")Key points:
- Uses environment variable
OTEL_EXPORTER_OTLP_ENDPOINTfor configuration - Service name is “agentsilex”
- Global tracer instance for the module
8.3 The span Context Manager
Simple wrapper for creating spans:
@contextmanager
def span(name: str, **attrs):
with _tracer.start_as_current_span(name, attributes=attrs) as s:
yield sUsage:
with span("llm_call", model="gpt-4o", tokens=150):
response = completion(...)
with span("tool_execution", tool="get_weather"):
result = get_weather("Tokyo")8.4 ManagedSpan: Manual Control
Sometimes you need more control over span lifecycle:
class ManagedSpan:
def __init__(self, name: str, **attributes):
self.name = name
self.attributes = attributes
self._span = None
self._context = None
def start(self):
self._span = _tracer.start_span(self.name, attributes=self.attributes)
self._context = trace.use_span(self._span, end_on_exit=False)
self._context.__enter__()
return self
def end(self):
if self._context:
self._context.__exit__(None, None, None)
self._context = None
if self._span:
self._span.end()
self._span = NoneUnlike the context manager, ManagedSpan lets you start and end spans at different times.
8.5 SpanManager: Switching Between Spans
For agent execution, we often need to switch spans (e.g., when handoff occurs):
class SpanManager:
def __init__(self):
self.current: ManagedSpan | None = None
def switch_to(self, name: str, **attributes):
if self.current:
self.current.end()
self.current = ManagedSpan(name, **attributes).start()
return self.current
def end_current(self):
if self.current:
self.current.end()
self.current = None
def __del__(self):
self.end_current()This is useful for tracking which agent is currently active.
8.6 Instrumented Runner
The Runner uses these primitives (runner.py):
from agentsilex.observability import (
setup_tracer_provider,
initialize_tracer,
span,
SpanManager,
)
setup_tracer_provider()
initialize_tracer()
span_manager = SpanManager()
class Runner:
def run(self, agent: Agent, prompt: str) -> RunResult:
with span("workflow_run", run_id=str(uuid.uuid4())):
span_manager.switch_to(f"agent_{agent.name}", agent=agent.name)
# ... agent loop ...
# When handoff occurs:
span_manager.switch_to(f"agent_{new_agent.name}", agent=new_agent.name)
# At the end:
span_manager.end_current()8.7 Visualizing with Phoenix
Arize Phoenix is a great free visualization tool:
# Install
pip install arize-phoenix
# Run Phoenix server
phoenix serve# Set endpoint
import os
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:6006"
# Now run your agent - traces will appear in Phoenix UI8.8 What You See in Traces
workflow_run (2.3s) [run_id: abc-123]
├── agent_main_assistant (1.5s)
│ ├── llm_call (0.8s)
│ └── tool_call: transfer_to_weather_specialist
├── agent_weather_specialist (0.8s)
│ ├── llm_call (0.5s)
│ ├── tool_call: get_weather (0.1s)
│ └── llm_call (0.2s)
└── final_output: "The weather in Tokyo is..."
8.9 Configuration
Environment variables:
| Variable | Purpose |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP collector endpoint (e.g., http://localhost:6006) |
If not set, tracing is disabled (no-op).
8.10 Example: Full Traced Run
import os
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "http://localhost:6006"
from agentsilex import Agent, Runner, Session, tool
@tool
def get_weather(city: str) -> str:
"""Get weather for a city."""
return f"Weather in {city}: 72°F"
agent = Agent(
name="weather_bot",
model="gpt-4o",
instructions="You are a weather assistant.",
tools=[get_weather],
)
session = Session()
runner = Runner(session)
# This run will be traced
result = runner.run(agent, "What's the weather in Tokyo?")
# Check Phoenix UI at http://localhost:60068.11 Key Design Decisions
| Decision | Why |
|---|---|
| OpenTelemetry | Industry standard, works with any backend |
| Environment-based config | Easy to enable/disable |
| SpanManager for switching | Clean handoff tracking |
| Global tracer | Simple, single instance |
cd code/v0.5Observability is now built in! Traces help you understand agent behavior.