// Navigation

AI Observability Is No Joke

A humorous look at AI observability through the lens of a joke-fetching agent. Learn why knowing what your AI did matters.

Mark Hazleton June 16, 2025 AI, Observability

What you'll learn: How a simple AI joke request revealed critical observability gaps, why transparency matters in AI systems, and practical steps to implement better monitoring in your AI agents.

A funny request turned into a lesson on AI observability

I thought I was just asking for a joke. Turns out, I stumbled into one of the challenges facing AI development today. And honestly? Here's what unfolded—and why it matters.

Picture this: I'm tinkering with my latest AI agent setup, feeling pretty proud of myself. I've got my simple MCP (Model Context Protocol) server running, connected to jokeapi.com, ready to fetch the freshest jokes from the internet. My agent is configured, the tools are registered, everything looks perfect. So I typed: "Hey agent, tell me a joke."

"Why don't scientists trust atoms? Because they make up everything!"

I chuckle. Mission accomplished, right? My agent used the API, grabbed a joke, delivered the goods. Time to pat myself on the back and move on to the next project. But then that little voice in my head started whispering. "Just where did that joke come from?"

I mean, sure, my agent said it fetched it from jokeapi.com. But did it really? Or did it just pull that gem from the vast repository of dad jokes floating around in its training data? How would I even know?

This is when I realized that while I had written the code, I had no idea of how it worked. Everything looks fine from the outside, but I had no clue as to what was happening under the hood.

Trying to answer the question of what was going on, I decided to dig deeper. I started looking at my logs, my network traffic, my agent's behavior patterns. And what I found was... well, not much. My fancy AI agent was essentially a black box wrapped in promises.

The agent could claim it called the API. It could even format its response to look like it came from an external source. But proving it actually made that HTTP request? That was surprisingly difficult.

It's like asking someone if they actually went to the store or just grabbed leftover groceries from the fridge. Both could produce milk, but the source matters if you're trying to track your spending (or in my case, API costs).

The Bigger Picture

This seemingly simple joke request opened my eyes to a massive problem in AI agent development: observability. Not the most exciting word, I'll admit, but stick with me here.

Think about it this way - when you're driving, you have a dashboard. You can see your speed, fuel level, engine temperature. You know what's happening under the hood (mostly). But with AI agents? We're essentially driving blindfolded, hoping the agent is actually doing what it claims to be doing.

The more I thought about this, the funnier (and scarier) it became. I started imagining all the different ways my agent could be "fetching" that joke without actually calling the API. Imagine these scenarios:

  • The Overachiever: Calls five APIs when one would do.
  • The Lazy Agent: Fakes it with old data.
  • The Confused Agent: Gives up silently but still logs success.
  • The Identity Crisis Agent: Thinks it made a call… but didn’t.

Why This Matters

This isn't just about jokes or my particular brand of overthinking. As AI agents become more sophisticated and handle more critical tasks, this observability gap becomes genuinely problematic:

Customer Service:
Did the agent actually check your account status, or is it giving you a generic response?
Financial Applications:
When the agent says it's pulling real-time market data, is it really, or are you making decisions based on stale information?
Healthcare:
If an AI assistant claims to have checked the latest research, you'd better hope it actually did.
Development:
When your coding assistant says it's following best practices from the latest documentation, did it actually access that documentation or just wing it?

AI agents touch everything—from customer service to healthcare. We must know what they actually do, not just what they say they do.

Agents chain actions across tools and APIs. Without logs and proofs, it's a game of telephone with hallucinations.

Observability Best Practices

So, what can we do? Here are some best practices I'm adopting to ensure my AI agents are more transparent and accountable:

  • Trust, but Verify: Don't just ask—prove it.
  • Embrace Logs: Log intent, attempts, and failures.
  • Test Boundaries: Disable tools and see if the agent notices.
  • Build for Observability: Make it foundational, not an afterthought.

That joke? Still funny. But if my agent were making business decisions, I'd want proof. The joke's on us if we don't demand it.

Model Context Protocol (MCP) Glossary

Since MCP is central to this discussion, here's a comprehensive glossary of key terms and concepts that will help you understand how this protocol enables AI agent observability.

MCP is an open standard that enables secure, bidirectional communication between AI language models and external data sources and tools. Think of it as a universal translator that allows AI agents to safely interact with your databases, APIs, file systems, and other resources without exposing sensitive information or compromising security. The protocol standardizes how AI systems request access to resources, how they authenticate, and how data flows between the AI and external systems.

An MCP server is a standalone program that exposes specific capabilities (tools, resources, or prompts) to AI clients through the MCP protocol. Servers act as secure gateways between AI agents and external systems. For example, a database MCP server might provide read-only access to customer records, while a file system server could allow an AI to read documentation files. Each server defines what operations are allowed and implements appropriate security controls and logging.

The MCP client is typically an AI application or agent that connects to one or more MCP servers to access external capabilities. Clients discover available tools and resources, make requests to servers, and handle responses. Popular AI platforms like Claude Desktop, ChatGPT Plus, and custom AI applications can act as MCP clients. The client is responsible for managing connections, handling authentication, and presenting available tools to the AI model in a format it can understand and use.

In MCP terminology, tools are specific functions or capabilities that an MCP server exposes to AI clients. Tools define what actions an AI can perform—like querying a database, calling an API, or processing a file. Each tool has a defined schema that specifies required parameters, expected inputs, and output formats. Tools are stateless and designed to be called multiple times safely. Examples include "search_database", "send_email", "create_calendar_event", or "fetch_weather_data". The tool abstraction allows AI agents to understand and use complex external systems through simple, well-defined interfaces.

Resources in MCP represent data or content that AI agents can access through the protocol. Unlike tools (which perform actions), resources provide information. They can be static (like configuration files) or dynamic (like real-time sensor data). Resources have URIs for identification and can include metadata about their content type, size, and freshness. Examples include log files, documentation, database snapshots, or API responses. Resources enable AI agents to access contextual information needed to make informed decisions or provide accurate responses.

MCP prompts are reusable templates or instructions that servers can provide to AI clients. They help standardize how AI agents interact with specific systems or perform particular tasks. Prompts can include context about how to use tools effectively, what information to gather, or how to format responses. They act as "best practice guides" built into the protocol, ensuring that AI agents use external systems correctly and consistently. For example, a customer service MCP server might provide prompts that guide an AI on how to handle different types of customer inquiries.

The transport layer in MCP handles the actual communication between clients and servers. MCP supports multiple transport mechanisms including stdio (standard input/output for local processes), SSE (Server-Sent Events for web-based communication), and WebSocket connections for real-time bidirectional communication. The transport layer is responsible for message delivery, connection management, and basic error handling. Different transports are optimized for different deployment scenarios—stdio for local development, SSE for simple web integrations, and WebSockets for high-performance applications requiring real-time updates.

MCP includes built-in features that support observability and monitoring of AI agent interactions. The protocol includes request/response logging, error reporting, and progress tracking mechanisms. Servers can emit detailed logs about tool calls, resource access, and performance metrics. The standardized message format makes it easier to implement monitoring dashboards, audit trails, and debugging tools. This is exactly why MCP is so valuable for solving the observability challenges discussed in this article—it provides a structured way to track what AI agents are actually doing when they interact with external systems.

Code Samples

For those interested in diving deeper, here is the simple C# code snippet that demonstrates how to implement MCP servers and tools for my basic joke-fetching tool. With no observability. Notice the lack of logging or proof of API calls.

MCP Server

// Create a generic host builder for
// dependency injection, logging, and configuration.
var builder = Host.CreateApplicationBuilder(args);

// Configure logging for better integration with MCP clients.
builder.Logging.AddConsole(consoleLogOptions =>
{
  consoleLogOptions.LogToStandardErrorThreshold = LogLevel.Trace;
});

// Register the MCP server and configure it to use stdio transport.
// Scan the assembly for tool definitions.
builder.Services
  .AddMcpServer()
  .WithStdioServerTransport()
  .WithToolsFromAssembly();

// Register HttpClient for API calls
builder.Services.AddHttpClient();

// Build and run the host. This starts the MCP server.
await builder.Build().RunAsync();

Joke Tool

/// <summary>
/// Tool implementations using static methods
/// </summary>
public static class Tools
{

  /// <summary>
  /// Fetches a random joke from JokeAPI
  /// </summary>
  /// <returns>A random programming joke</returns>
  [Description("Fetches a random joke from JokeAPI")]
  public static async Task<string> GetJoke()
  {
    using var client = new HttpClient();
    try
    {
      var response = await client.GetFromJsonAsync<JokeResponse>(
      "https://v2.jokeapi.dev/joke/Programming?safe-mode");

      string joke = response?.Type == "single"
      ? response.Joke ?? "No joke available"
      : $"{response?.Setup}\n{response?.Delivery}";

      return $"JOKE: {joke}";
    }
    catch (Exception ex)
    {
      return $"Error fetching joke: {ex.Message}";
    }
  }
}

Final Thoughts

We need to ask harder questions of our AI systems. Not "can they do it?"—but "did they actually do it?" Because in the end, observability isn’t optional. It’s essential.