Articles by Tag #evals

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

All I Want for Christmas is Observable Multi-Modal Agentic Systems

How Session Replay + Online Evals Revealed How My Holiday Pet App Actually Works Original...

Learn More 0 0Dec 17 '25

AI Hallucinations in 2025: Causes, Impact, and Solutions for Trustworthy AI

TL;DR AI hallucinations - plausible but false outputs from language models - remain a...

Learn More 5 0Oct 27 '25

LLM evaluation: a quick overview of Stax

The views and opinions expressed on this blog are my own and do not reflect those of my employer....

Learn More 0 0Oct 23 '25

LLM evaluation guide: When to add online evals to your AI application

Original article published on November 13th, 2025 The quick decision framework Online...

Learn More 0 0Dec 17 '25

From Prototype to Production: 10 Metrics for Reliable AI Agents

Building an AI agent prototype that impresses stakeholders is one achievement. Deploying that agent...

Learn More 0 0Nov 27 '25

Why Data Management Makes or Breaks Your AI Agent Evaluations

Building AI agents is one thing. Knowing if they actually work reliably is another challenge...

Learn More 0 0Nov 27 '25

Why Your AI Agent Is Failing (and How to Fix It)

Most AI agent failures don’t happen because the model isn’t “smart enough.” They happen because the...

Learn More 0 1Aug 13 '25

Steel Thread, Evals and building reliable agents.

Introduction At Portia we spend a lot of time thinking about what it means to make agents...

Learn More 1 0Jun 4 '25

Best AI Evals Platforms in 2025

As the adoption of Large Language Models (LLMs) accelerates across industries, the demand for robust...

Learn More 0 0Aug 1 '25

Mastering LLM Observability in 2025: Practices, Tools, and Platforms

As AI adoption accelerates, Large Language Models (LLMs) have become the backbone of enterprise...

Learn More 0 0Aug 1 '25

Best Alternative to Braintrust for AI Agent Evaluation

TLDR Maxim AI offers a comprehensive alternative to Braintrust for AI agent evaluation...

Learn More 0 0Dec 10 '25

[Boost]

HoloDeck Part 1: Why Building AI Agents Feels So Broken ...

Learn More 0 0Jan 9

Top 3 AI Agent Evaluation Platforms

TL;DR Maxim AI: End-to-end platform for simulation, evals, and observability across...

Learn More 0 0Nov 5 '25

Top 3 AI Agent Evaluation Platforms

TL;DR Maxim AI: End-to-end platform for simulation, evals, and observability across...

Learn More 0 0Nov 5 '25

Top 5 AI Evaluation Platforms in December 2025

TL;DR AI evaluation has become mission-critical for organizations deploying LLM-powered...

Learn More 0 0Dec 9 '25