out.of.desk

personal blog of Gaurav Ramesh

Performance Reviews and LLMs

performance-reviews-llms

I completed writing performance reviews for my team yesterday and I used very little LLMs for it.

I had my notes from the year, their self-assessments, and 360 feedback. When I sat down to write, the words flowed effortlessly. I only used LLMs afterward, to correct the grammar, structure, and polish the language.

If you follow the popular advice on the internet, this approach is inefficient. You'd build or use an AI agent that gathers data from all the organizational sources - Slack, Jira, Google Docs, Github - and prepares the initial draft of the feedback. You'd then modify it and hit "Submit".

It's faster. It's modern. I believe it's also wrong.

Synthesis

The hard work of thinking about the feedback you want to convey - the synthesis of what they achieved, where they stumbled, and the potential they hold - should come from the manager, not an LLM. In the tweet-sized thoughts and short-form social media advice that demands "automating away your tedious tasks", what gets lost in translation is the identification of what's actually "tedious".

And so we blindly copy, trading intentionality and common sense for conformity.

The Anchoring Trap

You might argue that you carefully modify the LLM's draft and infuse your own thoughts to make it right. Good for you. But there's a cognitive trap here called anchoring, that's worth knowing, that a majority of us fall into.

Anchoring is the tendency to rely too heavily on the first piece of information offered. When you start with an AI-generated draft, your degrees of freedom as a thinker are narrowed. You're no longer looking at the report through the lens of your own experience, but through the dimensions the AI chose to present.

Here's an example: The AI draft highlights someone's "strong collaboration skills" and "timely delivery". You read it and you agree. They are good at those. You edit a few sentences, maybe add a few sentences that support the claim. What you intuitively know, though, and what the AI doesn't choose to present, is that their biggest strength lies in their ability to ask thoughtful questions.

The structure of someone else's thinking becomes a map for your own.

And then there's the convenience trap, which we are easily prone to slipping into. Because the cost of editing a draft is lower than the cost of creating, we succumb to choice fatigue. When we see a draft that's 70% "good enough", we tend to stop. The energy required to see what's there and what's missing feels too high for the return on that investment.

High-Fidelity Artifacts

It's not different from a common phenomenon we see in design. A low-fidelity artifact, like a wireframe, invites critique and brainstorming. High-fidelity artifacts, either designs or prototypes, limit the creative possibilities and box others into the world the designer presents. Mentally undoing what you're seeing requires a certain degree of self-awareness, intentionality, and will.

Someone on my team once mentioned a saying that's apt: Pencils before pixels.

Viewed through that lens, an LLM-generated feedback draft is the high-fidelity artifact of a human relationship. Its polished and almost final state makes it harder to see what should have been there instead.

Time vs the Climb

But there's also a deeper issue: saved time as the ultimate measure of progress and success. When we measure something like writing performance reviews by the time it takes to write them, it changes the essential ingredients of the whole process and the incentives of leadership around it.

Years ago, I remember biking up Mt. Tamalpais from Stinson Beach. It's a moderately hard ride. But you can also just drive all the way to the top. Most people do. Had my goal been to minimize the time it takes for me to reach the top, I'd have driven. And if I were driving, I might as well just have stayed home. Because the joy for me isn't at the peak. It's in the pain and growth of the climb, and the reward of the glide back down.

In performance reviews, articulating someone's value - their strengths, contributions, weaknesses, and potential - is the climb.

One Time vs Over Time

Project Oxygen, a research project conducted by Google, that set out to prove that managers weren't essential, found the opposite. In doing so, they also found what the most important quality of a great manager is: Coaching.

But coaching requires awareness of reality, not only about someone's output or impact, which can be somewhat quantified, but their interests, motivations, and aspirations. If we measured coaching by the time it takes to deliver it, it'd lose its meaning.

The fact that words flowed to me effortlessly when I sat down to write means that the synthesis was happening throughout the year, with every little interaction I had with the person. Every 1:1. Their facial expressions in meetings. The content and the implicit tone of their Slack messages that I can only understand because I understand the person.

The risk with this modern, "agentic management" is not merely what happens this one time, but what happens over time. It disincentivizes that kind of real-time, background synthesis and situational awareness. You stop paying attention. Because, as long as you capture your interactions, this approach suggests, albeit implicitly, that you don't have to think much. The LLMs will do it for you.

That's a clear signal of outsourcing your thinking, and losing your attentional resources on what's important, which is the biggest risk of all.

No doubt, the models get better over time, but what about you?

Keep the Robots Out of the Gym

Daniel Miessler captured this insight well in an essay called Keep the Robots Out of the Gym.. The metaphor is perfect. We don't use robots to lift weights for us because we understand that the effort is the point. The resistance builds the muscle.

The same is true for thinking. For noticing.

I use LLMs a lot. But I am learning to be more intentional about where and how I use them. In some cases, outcomes matter more, and in others, the process. And knowing the difference matters more than we think.

Share or Subscribe

Get notified when new posts are published. Once a month, I share thoughtful essays around the themes of personal growth, health & fitness, software, culture, engineering and leadership — all with an occassional philosophical angle.