Effective software organizations make fast and consistent progress toward their goals.
The unfortunate reality about complexity in software is that if you just continue doing what you’ve always been doing, you’ll keep slowing down. When starting a fresh project, you’ll be surprised by how much you can accomplish in a day or two. In some other, more established environments, you could spend a week trying to get a new database column added.
Many things that slow down work are systemic, not individual. Even the most talented engineer might not fully understand how much time is wasted when work is bounced between teams, half-completed features are shelved as priorities change, or all the code gets reviewed by just one person. It’s easy to think you’re solving a quality problem by introducing code freezes and release approvals, but you might only be making things worse.
In this chapter, we’ll talk about some of the perils of measuring productivity before we move on to the mechanics of making it happen in a way that’s perceived as broadly beneficial.
But first, let’s talk about the biggest question of all: what is productivity, anyway?
If you ask a group of seasoned engineering leaders to define developer productivity, there will typically be no unified answer. For the purposes of this book, we consider developer productivity in the context of how organizations can minimize the time and effort required in the software delivery process to create valuable business outcomes. We will focus primarily on team- or service-level delivery and eliminating bottlenecks — often process bottlenecks — in the software delivery process.
We’ll also center our conversation on aggregate productivity instead of the efforts and contributions of individuals. A healthy productivity effort may involve automating more parts of the team’s deployment process, addressing flaky tests that cause failing builds, or just getting a team to commit to reviewing open pull requests before starting on their own work. A healthy productivity effort should not, on the other hand, require a certain number of pull requests for each engineer every week. That approach is unlikely to create business value and very likely to create a toxic environment.
Just as we discussed organizational table stakes in the first chapter — empowered teams, rapid feedback, and outcomes over output — there are three clear ways of working that you’ll see on any highly productive team.
It’s okay to admit it: we’ve all added a task to a backlog with a vague certainty that it will never get done.
Limiting queue depth means rigorously monitoring and managing the number of tasks awaiting work. This involves implementing systems to track and control the queue size, such as using a Kanban board to visualize work and enforce limits on the number of items in each stage. This principle also means you can’t let backlogs grow unchecked, as this can lead to delays, rushed work, and increased stress.
Regularly review your work queues and adjust priorities to ensure that valuable and time-sensitive tasks are getting addressed promptly. When you encourage teams to complete current tasks before taking on new ones and use metrics like cycle time to identify bottlenecks, you can significantly enhance the flow and efficiency of the development process.
Implementing this in practice usually means limiting the number of tasks awaiting development, review, or deployment at any given time. In addition to providing clarity about what to work on next, this practice also dramatically improves the predictability of delivery once something reaches that initial awaiting development status.
Breaking down large projects into smaller, more manageable parts allows for quicker completion of each part, enabling faster feedback and iterative improvements. For instance, deploying completed tasks incrementally rather than releasing a large set at once makes it easier to release more tasks in a given period of time; regressions will tend to be small, readily attributed, and readily fixed without blocking other tasks.
Large batches often complicate integration and make it difficult to track down problems. A continuous delivery model, where small updates are released whenever they’re ready, is a practical application of this principle. Encourage teams to think in terms of small changes, which helps in managing risk and improving the ability to adapt to new information.
When you introduce and regularly monitor WIP limits, you ensure that teams focus on completing ongoing tasks before starting new ones. Overloading team members with multiple tasks leads to reduced focus and increased cycle times. A culture where teams are encouraged to complete current work before embarking on new tasks improves focus, reduces waste, and speeds up work delivery.
The Kanban process embraces this explicitly, although you don’t need to use Kanban to follow this principle. In that process, the team always focuses on completing the team’s in-flight tasks before starting new ones — a process sometimes called “walking the board from right to left” — to encourage teammates to help each other before starting a new task. Similarly, scrum limits the number of story points in an individual sprint.
In the absence of WIP limits, a team can quickly start to juggle more than it can reasonably handle, and it’s common for tasks to remain in progress for an extended period even though they aren’t being actively worked upon.
A common misconception is that productivity and quality are in tension. If your version of quality is to manually test every change you make and test your whole product before releasing it, there will naturally be tension between the two. Any scenario that relies heavily on manual testing often leads to the creation of more processes — like a definition-of-done checklist on every pull request — further delaying time to value.
Fascinatingly, one of the best ways to achieve developer productivity involves improving the quality of your product through automated testing. If you’re doing productivity right, quality will tend to increase over time, as it becomes easier to ship smaller changes and easier to roll back or disable features.
Broadly, this involves four things.
If you’ve put these pieces in place — which can be harder than it sounds — you’ve given your engineers powerful tools that make their job easier, and you’ve also taken a big step toward a better product. Add a ratchet to CI to make sure test coverage of your code only goes up, and incentivize writing tests and sharing strategies within and across teams.
Once again, team structure (as discussed in Chapter 2) comes into play. Establishing a culture of (automated) quality requires that your teams have sufficient domain knowledge in testing methods for the language or framework being used. Emphasizing automated testing also encourages you to limit the complexity any single team has to deal with, so you in turn limit the surfaces they need to test.
There are a couple of frameworks that can be useful when considering the broad topic of productivity.
The DevOps Research and Assessment (DORA) framework has become a standard in the productivity realm for a reason: it offers a set of valuable metrics that shed light on where engineering teams might be able to improve their software delivery. By providing a baseline that captures a team’s current state, DORA sets the benchmark for your team’s processes. The aim isn’t to become obsessed with numbers but to continually evaluate whether you’re satisfied with what the numbers are telling you.
The success of the DORA framework — which originated from work by Nicole Forsgren, Jez Humble, and Gene Kim — lies in its simplicity and ability to capture various aspects of software development through its four core metrics: lead time for changes, deployment frequency, time to restore service, and change failure rate. These metrics are in healthy tension with each other, which means improving one could unintentionally lead to the degradation of another.
Of course, there are limitations to the DORA metrics. While they offer a snapshot of your team’s performance, they don’t explain why something might be off. Nor do they tell you how to improve. The DORA framework is not a diagnostic tool; it doesn’t point out bottlenecks in your processes or identify cultural issues inhibiting your team’s effectiveness. It’s much like having a compass — it will tell you what direction you’re headed in, but not what obstacles lie in the way or how to navigate around them.
The SPACE framework, developed by Forsgren along with Margaret-Anne Storey, Chandra Maddila, Thomas Zimmerman, Brian Houck, and Jenna Butler, grew out of an attempt to create a more comprehensive tool to capture the complex and interrelated aspects of software delivery and operations. The goal was to create a model that would acknowledge the competing tensions within software development and use those tensions as catalysts for improvement.
Unlike DORA, SPACE embraces quantitative and qualitative metrics, identifying five critical dimensions of software delivery and operational performance. The acronym stands for satisfaction, performance, activity, communication and collaboration, and efficiency and flow.
SPACE offers a comprehensive (though fuzzy) approach to improving productivity. It acknowledges the interplay between different aspects of software development and provides a balanced and holistic model for assessment and improvement. Still, it is just a framework — it doesn’t offer any specifics about what exactly to measure or what “good” should look like.
A set of universal metrics can’t fully capture the effectiveness of your organization because organizations vary in size, age, and culture. A mature, larger organization may have very different challenges and therefore different areas to focus on for improvement compared to a smaller, newer organization. This means that while DORA metrics are incredibly useful, they must be complemented by other qualitative assessments, leadership insights, and perhaps more localized metrics that take into account the unique characteristics of specific teams.
Unfortunately, there is no definition of productivity that boils down to keeping an eye on a few simple metrics. Measuring productivity is actually pretty hard.
Engineering organizations measure developer productivity to eliminate bottlenecks and make data-informed decisions about resource allocation and business objective alignment. Assessing productivity also provides insights into project predictability, which aids in planning and forecasting. This data acts as an early warning system to recognize when teams are overburdened, allowing for proactive interventions to alleviate stressors and redistribute workloads.
Even when the intent of measuring productivity is to improve team and organizational effectiveness, individual engineers can still be concerned that the data will be used against them. There’s a pervasive worry that these metrics could translate into some form of individual performance review, even when that’s not the intended use. This concern can contribute to a culture of apprehension, where engineers might be less willing to take risks, innovate, or openly discuss challenges. Any perception that the data will be weaponized for performance purposes can doom an effectiveness effort. Say that you won’t use the data to target individuals and mean it.
Transparency in communicating the intent, scope, and limitations of productivity metrics can go a long way in assuaging these concerns. The metrics themselves likewise need to be transparent. By involving engineers in the process of deciding what to measure, how to measure it, and how the data will be used, you can mitigate fears and build a more cooperative culture focused on continuous improvement rather than punitive action.
Despite these risks, measuring productivity can foster healthy conversations about organizational improvement. Metrics can highlight inefficiencies or bottlenecks and open the door to constructive dialogue about how to solve these problems. This becomes especially necessary as a business grows and alignment between engineering objectives and broader business goals becomes more challenging. Software delivery metrics offer a standardized way to communicate the department’s status to other organizational stakeholders.
Choose your metrics carefully. Besides the risk of impacting the psychological safety of your engineers, there are other pitfalls to be aware of. Don’t rely on misleading or irrelevant metrics that provide a distorted view of what’s happening within the teams (for example, pull requests per engineer or lines of code committed). Poorly chosen metrics can lead to misguided decisions and even undermine the credibility of the whole measurement process.
Consider, too, the incentives that are created when you choose metrics. Overemphasizing activity-focused numbers might lead engineers to game the system in a way that boosts activity metrics but doesn’t genuinely improve their productivity or the value created by their work. This can result in a culture where superficial metrics are prized over substantive improvements, leading to technical debt and inefficiencies. On the other hand, if your metrics encourage engineers to submit more but smaller pull requests, you’re likely to see benefits in quality and speed of delivery.
The work of delivering code changes for individual tasks is often measured in terms of cycle time. This term comes from manufacturing processes, where cycle time is the time it takes to produce a unit of product and lead time is the time it takes to fulfill a delivery request.
In software development, these terms are often mixed. For most features, it might not be reasonable to track the full lead time of a feature, as in the time from a customer requesting a feature to its delivery. Assuming the team is working on a product that’s supposed to serve many customers, it’s unrealistic to expect features to be shipped as soon as the team hears the idea.
Although we’re reusing manufacturing terms, remember that there is no unit of product in software development. A car can only be sold by the manufacturer once. The work that happens in an engineering organization can be sold over and over again, with near-zero marginal cost for each additional sale of the exact same code.
When talking about cycle time for code, we’re talking about the time it takes for code to reach production through development, reviews, and other process steps. Cycle time is the most important flow metric because it indicates how well your engine is running. When diagnosing a high cycle time, your team might have a conversation about topics like this:
There are perfectly good reasons for cycle time to fluctuate, and simply optimizing for a lower cycle time would be harmful. However, when used responsibly, it can be a great discussion starter. Even better, consider tools that help visualize how this number moves over time, leading to a deeper understanding of trends and causes.
Issue cycle time captures how long your epics, stories, and tasks (or however you plan your work) are in progress. Each team splits work differently, so they’re not directly comparable. If you end up creating customer value, it probably doesn’t matter whether that happens in five tasks taking four hours each or four tasks taking five hours each.
Things don’t always go smoothly. When you expected something to take three days and it took four weeks of grinding, your team most likely missed an opportunity to adjust plans together. When you find yourself in this type of situation, here are some questions to ask.
It feels great to work with a team that consistently delivers value to customers; that’s what you get by improving issue cycle time.
Depending on the type of software you’re building, “deployment” or “release” might mean different things. For a mobile app with an extensive QA process, getting to a two-week release cadence is already a good target, while the best teams building web backends deploy to production whenever a change is ready.
Deployment frequency serves as both a throughput and a quality metric. When a team is afraid to deploy, they’ll do so less frequently. When they deploy less frequently, bigger deployment batches increase risk. Solving the problem typically requires building more infrastructure. Here are some of the main considerations:
Historically, agile teams have tracked velocity or story points. Originally meant as a way to help teams get better at splitting work and shipping value, these units have been abused ever since as a way to directly compare teams and steer an organization toward output-based thinking.
If talking about story points helps you be more disciplined about limiting queue depth and WIP, go for it. If not, don’t feel bad about dropping story points as long as you understand your cycle times.
Another traditional management pitfall is to focus on utilization, thinking that you want your engineers to be 100% occupied. As utilization approaches 100%, cycle times shoot up and teams slow down. You’ll also lose the ability to handle any reactive work that comes along without causing major disruptions to your other plans.
There’s a time and place to look at metrics around individual engineers. In very healthy environments, they can be used to improve the quality of coaching conversations while understanding the shortcomings of these measures. In a bigger organization, an effort to focus on individual metrics will likely derail your good intentions around data-driven continuous improvement. Engineers will rightfully point out how the number of daily commits doesn’t tell you anything about how good they are at their jobs.
On the other hand, opportunities abound at the team level without shining a spotlight on any individual. Start your conversations there instead.
Assessing productivity challenges in software engineering teams requires looking beyond output metrics. Consider these potential culprits when trying to debug a productivity issue:
It’s worth mentioning that scope creep isn’t necessarily a bad thing! Mitigating its effects should be focused on building in time for learning, feedback, and discovery; reducing scope creep via extensive up-front planning and specification rarely produces good results.
If you’re just starting out on your productivity journey, goal-setting can feel intimidating, especially if you’re trying to prove the value of investing in this area. It can be tempting to go straight to frameworks like DORA and SPACE and try to set goals around those concepts. Still, you’ll have more luck if you identify a single opportunity from your conversations with engineers and execute on it (we’ll talk more about this in the final chapter).
For example, if you learn that CI builds fail 20% of the time due to seemingly random environmental issues, that’s a concrete data point to measure and set a target around. Once you hit the target, you can ensure you’ll notice if you exceed it again. Rinse and repeat the process with different metrics for different kinds of improvements.
Once you’ve embraced that pattern, it’s a good time to get DORA metrics in place if you haven’t already and start using them to track the impact of improvements on teams and services. In many ways, the core DORA metrics cover the activity pillar in SPACE, and establishing them within your organization will quickly highlight potential opportunities.
As your productivity journey progresses, DORA metrics will continue to be useful for tracking trends, but they will never tell your whole productivity story. As you start to recognize themes in your work and your users’ reported issues, embracing SPACE more thoroughly beyond the activity dimension will make sense. The SPACE framework is best used to identify various indicators of overall productivity, from OKR/goal attainment to meeting load to cross-team collaboration burden.
Setting goals around SPACE pillars is also fraught; there’s no way, for example, to boil efficiency and flow down to a single number. On the other hand, SPACE is great as a framework to classify problems and brainstorm specific metrics you might use to track trends and validate improvements.
When it comes to setting metrics goals, you’ll sometimes find yourself pressured to set a goal before you know how you’re going to solve the fundamental problem. Even under pressure, set goals around potential valuable outcomes from working on the problem, not on a restatement of the problem itself.
Opportunities to improve flow exist throughout the reporting chain and sometimes straight up to senior leadership. Culturally, you need to get people at all levels to understand and internalize the idea that interruptions for software engineers are bad and should be minimized.
Of course, some interruptions are inevitable, but many are imposed without recognizing the cost. Before you do anything else with developer productivity, ensure there’s general agreement on reducing interruptions (we’ll discuss this in more detail in the next chapter).
At the team level, some interruptions are within the team’s control and some are not. For example, suppose a code change requires a review from another team. In that case, the originating engineer is interrupted in their task until a person from the other team accepts the change, and the originating team may not feel in control of the situation in the meantime.
Nonetheless, plenty is in the control of individual teams: what they prioritize, how they work together, how they ensure quality, how they automate tedious tasks, and much more. Working agreements and retrospectives are two tools to use at the team level.
At the organizational level, we start to talk about more ambient interruptions, which no one is responsible for but just seem to appear. Tackling these interruptions is outside the scope of any one team unless a team is specifically responsible for this kind of thing. This is where things get more challenging but also more rewarding; solving these cross-team problems tends to have more leverage than focusing solely on team-level opportunities.
Once you reach a certain size, it’s useful to be explicit about who is accountable for developer productivity and what it’s like to build software at your company. If your immediate response is “everyone,” either you are still a relatively small organization or it’s time to start thinking about a more definitive answer.
In this chapter, we discussed developer productivity, including ways to quantify it and guidance on goal-setting in the developer productivity space. Next, we’ll talk about the less quantifiable but equally important developer experience.