Why Annual Reviews Don't Have to Be Bullshit

This week I ran annual reviews for six engineers. No awkward silences. No fishing for examples. No “let me think back to what happened in Q1” moments.

Every rating I gave had documented evidence behind it. Every growth conversation pointed to specific patterns. Every hard discussion referenced actual data, not impressions.

This isn’t normal. Most engineering managers treat annual reviews as a necessary evil: reconstruct a year from memory fragments, avoid saying anything too specific, give everyone a 3.5 out of 5, and move on.

I’ve watched other managers do exactly this. Here’s why it doesn’t have to be that way.

The Standard Annual Review Problem

You know how this goes. HR sends the review form in November. You open it, stare at the questions, and realize you can’t remember what happened before August.

So you do what everyone does: focus on recent events, pad with vague positives, and avoid anything that might require documentation you don’t have.

"Strong technical contributor." "Good team player." "Meets expectations."

The engineer reads it, nods politely, and leaves, wondering what any of it actually means for their career.
Both of you know it’s a theater. Neither of you says it.

The uncomfortable truth: annual reviews fail because they’re based on vibes, not evidence. And vibes favor whoever had a good last month.

What Changes Everything

I’ve been running weekly health checks and collecting PM feedback for over a year now. I’ve written about how that system works for monthly 1:1s, but the real payoff shows up at annual review time.

When I sat down for reviews this week, I had:

52 weeks of self-reported health data per engineer
12 months of PM assessments
Documented patterns across multiple projects
Specific examples with dates and context

The review prep took 30 minutes per person. Not because I was rushing. Because I wasn’t reconstructing anything, the patterns were already visible.

The Questions That Actually Matter

Standard review forms ask useless questions. "Rate communication skills 1-5." What does that even mean?

Here’s what I actually evaluate, and what the data shows:

1. Trajectory, not snapshot

Is this person accelerating, stable, or declining? One engineer started the year hitting "Often" on most metrics. By month 8, he was at "Always" across the board. That trajectory matters more than any single rating.

Another engineer stayed flat. Same "Often" ratings in January and December. Technically, meeting expectations both times. But one person is growing into a senior role while the other is coasting.

2. Self-assessment accuracy

Does their perception match reality? I had an engineer report "100% completion" and "no issues" weekly, while his PM flagged delivery gaps. That disconnect predicted everything about how our review conversation would go.

People who can’t accurately assess their own performance can’t self-correct. This isn’t harsh judgment. It’s recognizing who needs closer support.

3. Team impact, not just individual output

Does this person make others better? I track hours spent reviewing teammates’ code. Engineers who put in 1-2 hours weekly become multipliers. Those doing zero stay individual contributors regardless of their personal output.

One of my top performers delivers flawlessly but contributes nothing to team knowledge sharing. Another delivers slightly less but elevates everyone around him. The data shows the difference.

4. Sustainability

What’s the energy cost of their performance? An engineer hitting every deadline while their energy drops from 8 to 5 over three months isn’t succeeding. They’re burning out in slow motion.

I caught one case where PM feedback was excellent, while health checks showed declining energy and shorter responses. Turned out they were covering for a struggling teammate. We fixed the problem before it became a resignation.

What the Conversations Sound Like

With data, review conversations change completely.

Instead of: "You’re a strong performer, keep it up."
It becomes: "You hit 100% sprint completion for 24 consecutive weeks across six different projects. Your code review cycles stayed at 1, meaning clean code on the first pass. When I look at who I can fully rely on regardless of project chaos, you’re at the top of that list."

Instead of: "You could improve your attention to detail."
It becomes: "There were a couple of incidents with client communication where the wrong API got checked. Your attention to detail has been solid, but hasn’t hit ‘Always’ yet. As you grow into more senior responsibilities, that’s the area I’d focus on."

Instead of: "Are you happy here?"
It becomes: "Your energy has been stable, zero context switches most weeks, PM feedback went from 7.5 to 9. Excellent year. One thing I noticed: code review hours dropped off in Q3. Was that a project thing or something else going on?"

No guessing. No recency bias. No vague impressions that the engineer can dismiss as subjective.

The Hard Conversations Get Easier

The worst part of annual reviews is delivering difficult feedback without evidence. You know something’s off, but you can’t point to specifics. So you soften everything until it means nothing.

Data changes this.

I had to tell one engineer he wasn’t getting a top rating despite solid delivery. The conversation was straightforward:

"Your PM feedback is consistently good. Your sprint completion is reliable. But when I look at what separates a four from a 5, it’s the attention to detail, incidents, and the code review gap. You’re not lifting the team’s code quality. You’re delivering your own work cleanly but not contributing to others' improvement."

He didn’t argue. The evidence was there. We spent the rest of the conversation building a specific plan: increase code review hours and achieve zero client-facing mistakes in Q1. Clear targets, clear timeline.

Compare that to: "You’re almost at the top rating, just need to step up a bit." What does that even mean? How would anyone act on it?

What You Actually Need

You don’t need my exact system. But you need something that captures:

Longitudinal data. Single observations are noise. Fifty-two weeks of observations reveal a signal. Whatever you track, track it consistently over time.
Multiple perspectives. Self-assessment, manager observation, PM feedback, plus peer signals. When all channels align, you have confidence. When they diverge, you have a conversation.
Leading indicators. Energy trends predict departures 8-12 weeks early. Code review participation predicts promotion readiness. Context switches predict quality drops. Find the metrics that lead outcomes, not just measure them.
Qualitative context. Numbers tell you what. Open-ended responses tell you why. "Anything you’d like to share?" surfaced more actionable insights than any structured metric.

The Uncomfortable Truth

Most managers avoid systematic tracking because it creates accountability. You can’t claim ignorance when you have 52 weeks of documented patterns. You can’t blame "culture fit" when you have evidence of someone failing to absorb feedback.

The data doesn’t let you hide behind comfortable narratives.

But it also doesn’t let good performance go unrecognized. When I tell someone they’re getting a top rating, I can show them exactly why. The conversation isn’t "I think you’re great." It’s "here’s the evidence that you’re great."

That specificity matters. Engineers are trained to distrust vague praise. They know when they’re being handled. Documented evidence is the opposite of handling. It’s respect.

The Real Result

My annual reviews now feel like natural extensions of conversations we’ve been having all year. No surprises. No defensive reactions. No "but what about that time when..." because I already know about that time and accounted for it.

Engineers leave knowing exactly where they stand, exactly what’s expected of them at the next level, and exactly what evidence would demonstrate they’ve gotten there.

That clarity is worth more than any rating. It’s the difference between performance management as a bureaucratic exercise and performance management as actual development.

Annual reviews don’t have to be bullshit. They require treating your team’s performance like you’d treat any other engineering problem: with data, documentation, and intellectual honesty.

The review questions you ask shape the conversations you have. What patterns are you creating space to see?

Denis Stetskov @razoorka