Measuring the Productivity of Engineering Teams ... successfully!

It's a common request for many managers. "Please improve the team's level of productivity." Sounds simple enough, right? The simple answer is no. No, it is not simple in the slightest.

Measuring the Productivity of Engineering Teams ... successfully!
Photo by Sherman Yang / Unsplash

It's a common request for many managers. "Please improve the team's level of productivity." Sounds simple enough, right? The simple answer is no. No, it is not simple in the slightest. In this article, I will explain productivity and give you some best practice advice for measuring what you can. Let's begin ...

What is productivity?

Productivity. You will probably hear a few variations on what it "means" regarding software engineering. But its literal definition can be defined like this.

The effectiveness of effort as measured in terms of the rate of output per unit of input.

In layman's terms, it is the ratio of the number of things in compared to the number of things out. The more you get out for the same amount, the more productive you (or your system) are. It should be simple to measure an engineering team's productivity. The amount of work produced is divided by the number of people it takes to produce it. But hold on a second, how are you measuring the work produced (output)? And what are we measuring against (input)?

What is our output?

Many of you have already started thinking about story points, tickets closed, and other seemingly obvious choices. But let's step back and think about it for a moment. What are we really trying to measure? We want to measure business impact. I'm sure none of you would put your hands up to tell me you churn out any old thing. You create work items to deliver features which have been determined to provide some value to the customer. But now it gets complicated. How do we measure business impact? How do we correlate new or increased business revenue with a particular feature released on a particular day, built by a particular team? Put simply. You can't. We need a proxy measure. We will return to this in a moment, but before we do, I want to talk about our Input, arguably the simpler of the two.

What is our input?

We are going to find a suitable proxy for our output. But before that becomes useful. What is our input going to be? A person? A unit of time? Most people default to using both and picking the number of hours worked by individuals as their input. Seems like an obvious choice. But it's wrong for a few reasons;

  1. Our productivity is now relative to single people. This might be manageable for 6-8 people, but imagine doing this for a 200+ strong org!
  2. A very simple one, people don't like it. Start tracking everything they do if you want a surefire way to piss off your employees. Big brother much.
  3. How do you account for holidays? Half days? Illness? Not only that but how do you measure what hours count? Meetings? Context switches? Coffee breaks? It's already gotten messy.
  4. You are inadvertently penalising some of the most important people on your team. The "glue". Those people spend most of their day pairing, passing on knowledge, unblocking others, reviewing code, and attending strategy sessions. Your most senior engineers are going to look like your least productive.

I'm sure you get the point. But you were right to think about people and time as a single input. You just zoomed in too far. Software engineering is a team sport. You need to measure teams. Measuring teams remove the considerations about holidays, how days are structured, or what type of work individuals do. And instead of hours, you want to measure in days. How much does 1 team produce in 1 day?

The only time you might consider "people" in this equation is if you're micro-optimising and want to compare the productivity rates of teams based on team size. How does a team of 5 compare to a team of 10? for example. But that is an edge case, and you still measure the team; you compare relative output ratios against total team sizes to see if there is a trend.

Picking your output proxy

Now that's out of the way, we are back to the original thoughts about our output. We know we aren't measuring our actual output (value created) because we can't, which circles back to measuring the number of work items delivered. Before you run off and start creating a dashboard to track work items completed over time, remember a golden rule "You get what you measure". Or, to quote Goodhart's law

"When a measure becomes a target, it ceases to be a good measure."

And to repeat what I said earlier, "I'm sure none of you would put your hands up to tell me you churn out any old thing." And if we measure work items delivered, I guarantee that is what you will get, more random tickets in your system like: "Email Bob about question" and "Write documentation", and your work items will be made hyper granular "Create File", "Add code to file", "Create Test", "Write Test", "Run Tests" etc. etc. Ok, maybe that is a tad extreme, but you get the point; if you tell people that tickets done is the thing you track and that you want them to get more tickets done, they'll do it by creating more tickets, NOT by doing more work.

So, what the hell do I measure?

The answer is that you don't measure things in isolation. You measure relationships, and you measure trends. You measure instead things like the number of work items done in relation to the number of software releases, and the number of software releases in relation to the number of bugs/errors found etc. This way, you can see things like, we are delivering more tickets, but that isn't impacting releases (are we doing too much toil?), we are releasing the same, but our error count is going up (why is our quality dropping?).

This is where there are well-established best practices to help you out. The big one, and the first one I will mention, is the DORA metrics. DORA stands for DevOps Research and Assessment (Group), which is part of the Google ecosystem of companies. They used extensive research to determine a set of measurements that can be reliably used to determine a software company's performance (and relative success).

The DORA metrics

I won't go into much detail here, as that isn't the article's point. However, I will summarise the metrics you will measure;

  • Deployment Frequency: How often you make successful software releases to production.
  • Lead Time for Change: How long it takes to get from committing a code change to that code change being in production.
  • Change Failure Rate: How often a deployment fails or a software change results in a failure in production.
  • Mean Time to Recovery: How long it takes on average between an interruption from a deployment or system failure for the system to fully recover.

You can use these as an excellent proxy measure for your team's productivity. You want to see a high deployment frequency, a low lead time for a change, a low failure rate, and a fast mean time to recover. You will measure the impact of changes by watching the relationships with these measures over time.

Awesome, thanks!

So that is the bare bones, a simple way to measure productivity. But before you run off in celebration, there is more. Just utilising DORA alone leaves a lot of opportunity on the table. Let me take you into SPACE 🚀

I always wanted to be an Astronaut

SPACE ... the great beyond ... at least the great beyond of DORA. DORA is fantastic, but it completely ignores a considerable portion of how we perceive and measure productivity. If I casually drop in the word "flow", I'm sure you all know what I mean. That state of deep focus where the time goes by, and you solve your biggest problems and challenges, and it just feels easy. That has an obvious link to productivity, but how satisfied you feel also correlates with productivity (link). And this is where the SPACE framework comes into play.

Welcome to SPACE camp cadet!

So what is the SPACE framework? SPACE is, unsurprisingly, an acronym for the 5 different metrics you are looking to measure.

(S) - Satisfaction & Well-being:
How fulfilled the engineers feel with their tools, team, culture, and day-to-day work; How happy and healthy they feel and how their work impacts them.
‌(P) - Performance:
An outcome of a system or process.
‌(A) - Activity:
The number of actions or outputs completed whilst performing work.
‌(C) - Communication & Collaboration:
How individuals and teams communicate and work together.
‌(E) - Efficiency & Flow:
The ability to complete or progress on work with minimal interruption or delay caused by people or systems.

These can seem like very "woolly" or "grey" areas to try and measure, and they are, but it's such a significant topic that it will be better served by a follow-up post of its own.

For now, here are a few simple examples of measurements, all at a team level;
(S) - Averaged Developer Satisfactions Scores (eNPS)
(P) - Change Failure Rates
(A) - Deployment Frequency
(C) - ‌ PR Review Times
(E) - Lead time for Change

Conclusion

So there we have it. Measuring the productivity of development teams is hard. We can't measure our work's true impact (outcome) and must be careful how we proxy it. But by using DORA and SPACE, we can have some good insight into how productive our teams feel and how efficiently they can operate. Remember, measure the trends over time, and DON'T MEASURE THE PERSON!