Managing Machine Learning


Say you’ve just started managing a team. You’re working on a stochastic product, like search or recommendations. You want to start instrumenting success. How should you set KPIs? What should you be doing in a metrics review? Here’s an overview of common pitfalls I’ve observed that you should avoid.

Bad KPIs, Unintended Consequences

Like an organism, teams evolve a culture and product in response to a KPI. If you’re not careful with the definition, you’ll produce a distorted product. LinkedIn looks like Minesweeper because the team is optimizing for clicks. A really good metric will have the opposite effect: it unleashes a tremendous amount of creativity (“We had to 10X ‘minutes of video watched’ so… we just started playing the next video in your queue automatically”).

Before solidifying a KPI, I try to imagine the “laziest” way to 10X the metric. If I suspect it will detract from a good product, I adjust. Train yourself by trying to find evidence of this in the products you use.

Example: Amazon Search

Amazon elevates sponsored search results over the organic best seller.

Someone is getting a raise — a revenue KPI is growing, in the short term. I’d argue the grating experience makes for an inferior product long term. As a leader it’s your job to keep a 30,000 foot view and ensure the team is building something good.

Changing KPIs

Sometimes the opposite happens. Instead of the product changing, KPI definitions constantly shift. For example:

“We thought click-through rate was our KPI. Since we show an info-box, we’ve realized that a lot of sessions are “good” even though you don’t click on anything. So we’re changing our metrics.”

This is fine and should be expected. Nevertheless it can be frustrating to manage as you’re lacking a repeatable baseline. The only way I know of overcoming this is just to imagine myself in the N+1 metrics review. What will be the excuses I’ll hear? I then try to preemptively optimize for that.

Pre Launch KPIs

Before a launch managers will rally the team around made-up success metrics: “Our goal is 95% precision”. Why not 20%? Or 99%? Nobody on the team will respect a made-up number. Since you lack data, it’s not clear what success should look like.

Instead, try to simulate your anecdotal reaction using a real-world analogy. For example: “If I were to see an incorrect suggestion in this UI once a week, would that feel terrible? How about once a day?”. You then back out what that translates to.

Post Launch Incrementalism

Once launched managers have the opposite problem: how do you challenge the team to really grow their numbers? “We plan to grow search volume 10% this quarter.” Why not 20%? Or 5%? A good leader will provide a rationale about how they selected the goal. To ensure incrementalism doesn’t set in, I’ll brainstorm the following with the team:

“Drop everything you know about the business today. Let’s imagine we just read Google achieved 30% growth. Hypothetically. How did they do it?”

That format can breathe big-picture thinking into a team that’s been caught in a local minimum.

Memorable Metrics

Teams often opt for a technically correct and complex KPI. For example:

“The number of queries a user runs until they click on a result. And don’t return. For at least for 5 minutes.”

What? This is confusing. A better KPI would be: “Search session length”. Sacrifice technical correctness for simplicity. Frequently a metric is more nuanced under the hood, but a key metric should be explainable just by saying it. You want these numbers to be something people discuss over lunch. Information won’t disseminate when it’s complex.

Input versus Output

The team might suggest reporting metrics that are easy to measure, but wrong to manage by. High-level KPIs should describe the desired output the business needs (“ad revenue”), not the effort the team is putting in (“number of salespeople hired”). Capturing input metrics is important, but you should focus your attention on output.

Intellectually Cute Explanations

Let’s imagine you’ve built an app for hiking:

You: “Why did engagement crash in March?”

Team: “It’s seasonal. People don’t use our product as much when it rains.”

Actually, what happened was that we fixed a bug in the data in March. Engagement was always low. Damn. Good luck in the next board meeting.

When numbers move, teams will come up with rationales about why. Often leaders grasp the first reason that makes intuitive sense. These reasons are almost always wrong. Since the excuse seems like it could be right, teams often don’t bother digging deeper.

Be suspect of anything going horribly. Be very, very suspect of anything going too well. The nightmare scenario I always worry about is a tremendous growth spurt actually being a bug in the data. Your goal as a manager is to be a Boston Globe “Spotlight team” during the review.

Summary

Hopefully this was a helpful summary of some common pitfalls to avoid when defining or reviewing metrics. If you have other ideas, please let me know! For a broader primer on management by metrics, read High Output Management.

Thank you to Elad Gil, Jack Altman, and others for reading drafts of this post.