Articles by Tag #evaluation

Browse our collection of articles on various topics related to IT technologies. Dive in and explore something new!

How to build a self-improving agent that updates your UI in real time

What my AI agent actually does (and why it's pretty cool) Invoice Copilot: Talk to your...

Learn More 11 0Aug 7

Top Open Source Tools for LLM Observability in 2025

As of 2025, companies are integrating large language models (LLMs) into their applications, ranging...

Learn More 2 0May 1

Debiasing LLM Judges: Understanding and correcting AI Evaluation Bias

Image Source: LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods ...

Learn More 0 0Jul 3

Evaluating AI Agents: Performance, Reliability, and Real-World Impact

The rapid proliferation of AI agents across diverse domains necessitates robust evaluation...

Learn More 0 0Jul 31

Retrieval Metrics Demystified: From BM25 Baselines to EM@5 & Answer F1

“If a fact falls in a database and nobody retrieves it, does it make a sound?” Retrieval‑Augmented...

Learn More 0 0Apr 29

Case Study: How Junie Uses TeamCity to Evaluate Coding Agents

Introduction Junie is an intelligent coding agent developed by JetBrains. It automates the...

Learn More 0 0Jun 3

[Boost]

Evaluation Metrics for Summarization Espoir Murhabazi ・...

Learn More 0 0May 26

Evaluation Metrics for Summarization

Everyone wants GenAI, but no one wants to spend time on evaluation or generating reference...

Learn More 0 0May 26

Exclusive Limited-Time Offer: Work Experience Evaluation Now at $399!

Are you striving to advance your career, apply for a visa, or open the door to new professional...

Learn More 0 0Nov 18 '24