Data-driven Software Engineering Management. Which Data?

Leading software organisations with data-driven insights? Sure! Bring it on! But how? And where to get the right data from? Learn how to unlock your software engineering teams treasure trove of data for better decisions making.

Good Data Leads to Better Decisions.

It is easy to get behind the idea of data-driven decision making in the software engineering world. We all like to understand if our teams are happily chucking along at high velocity, if code quality and potential security risk are well under control and, overall, if our DevOps delivery pipeline is running like a smooth assembly line.

“The great thing about fact-based decisions is that they overrule the hierarchy.” — Jeff Bezoz, Founder Amazon

Make data-driven engineering decisions? Sure! Bring it on! Who would not prefer to make informed, objective, real-time decisions based on facts and not on gut feeling alone?

As the term “data-driven” implies, it requires some data in the first place. Even better, not only collecting data, but being able to derive some insights and understanding of that data to make useful decisions.

In the business world we got used to making data-driven decisions. For instance, we calculate customer retention and renewal rates, we know about customer acquisition costs, or advertising spent relative to web traffic uptake. In the software world we are less familiar with the same approaches and a little hesitant on what data to collect and how to use data to drive decisions. Moreover, how to extract that data without nagging all our teams continuously, manually aggregating reports and creating your own data analytics solutions ?

Funnily enough, in engineering we already sit on a ton of data, which just needs unlocking and lends itself very well to automated software intelligence gathering.

In the following we highlight some of the first steps towards a data-driven engineering management approach.

Unlock the Engineering Data You Already Own.

Software engineering organisation already own vast amounts of data. However, that data is often locked away in silos and often used for very pinpointed operational decisions rather than the overall management of processes, teams, and risks.

“111 billion lines of new software code is created every year.” — Secure Decisions

Source Code Repository and Code Review Data.

Let us start with the obvious data: Source code and their respective code repositories such as GitHub Enterprise. This is almost the ground truth from which much of our company value — if you are in some sort of software business — originates from. Source code repositories document the who did what and when. They are also a great source for overall trends on how smooth teams run and, conversely, of indicators when things go pear shape.

Some data-driven insights might include: Does the project run like a clockwork that churns out a predictable amount of new features? Does the team remediate and fix a certain percentage of technical debt versus producing new code? Are team members equally contributing across more than one component or is there the risk of some single person committing and understanding a particular component? Are there any large fluctuations in velocity? If so, when does this happen? Always around release points or just seemingly randomly? How does velocity correlate with issues, code quality and infrastructure hiccups?

Much of these questions can be unearthed mining repository information, ticketing and issue tracking systems as well as code review and quality assurances processes. Much of this can be done automatically with the support of AI data mining techniques.

DevOps and Infrastructure Data

Another great source of data for decision making is the (developers’) operational infrastructure. Most companies have moved towards more agile processes and towards more automation including software builds, testing and automated quality gating. However, there are vast differences in DevOps process efficiencies and even more differences when looking at the whole continuous integration and delivery (CI/CD) cycles.

Common questions to ask are around the efficiencies of those DevOps processes are metrics such as: Who long do our builds take? How often do they break? What is the mean time to recovery once a build is broken? How does this impact different teams and what does this mean to our overall velocity?

Not only can long build cycles and broken builds be frustrating for everyone, but it can lead to serious productivity losses by holding up teams and making the whole delivery engine stutter, bringing engineering teams out of any high-velocity rhythm.

Again, you probably already have that data somewhere. May it be in your engineering logs, you build sever dashboards or part of your DevOps teams’ KPIs. However, it is important to lift and correlated that data with your other engineering insights to really make it speak volumes across projects and the entire organisation.

Quality Assurance and Security Testing Data

The third big bucket is the all the data that already exists with your QA teams and application security teams. This includes test coverage, static and dynamic code analysis output as well as application security testing results. Other aspects might be around compliance including open source license compliance and your vulnerability management states.

This data is likely already prominent in your organisation. You might use this for quality gating, risk assessment and release readiness decisions. However, it is important to obtain a more holistic view by understanding that same data in the context of the overall team’s velocity, infrastructure bottlenecks as as well code and quality metrics.

For instance, there might be teams at high velocity, but at lower quality or with repeated patterns of the same security flaws that could do with some help of secure coding training. Or you teams with a low velocity, but on a critical business application with a lot of technical debt, but still producing high quality output. Such teams might benefit from some refactoring time and additional temporary resourcing. Similarly, if velocity is down because there are many build failures and tickets are piling up, it might be more useful removing infrastructure bottlenecks first.

How to Collect All this Data?

One of the main challenges is unearthing, collecting and then unlocking the value of the data already existing in some shape or form in your organisation. A typical trap is to either have that data locked in engineering silos or to assign people to manually report data upstream, collect it in Excel sheet or some BI dashboards and look at it once a quarter. Either of these aspects is not data-driven decision making.

Like everything else in a high-velocity CI/CD environment automation is the key. Automation of data collection. Automation of data aggregation. Automation of data correlation. Automation of insights from data so that valuable engineering time can be used on informed in-time decision making. But how to automate all this?

Engineering Intelligence to the Rescue.

Engineering intelligence is like business intelligence for engineering leaders. A central place to store, aggregate and decide on information. Next generation software intelligence platforms such as Logilica not only store and aggregate data, but integrate seamlessly with your existing infrastructure, pull information from different sources and run proactive analytics on. For instance, the analyse meta-information of your source code repositories for detecting trends, anomalies and deeper insights than the raw data you currently have.

As a result, you can obtain a central engineering management decision hub that can be used to collaboratively improve your overall organisational efficacy and predictability.


While we all understand that data is important for decision making, it becomes clear that for engineering leaders there is already a plethora of data available within the organisation to base decisions on. However, that data is often partial and locked up in low-level systems not conveniently and centrally accessible to engineering management.

Delivering software quickly, reliably, and safely is at the heart of technology transformation and organizational performance. — State of DevOps, 2019

We highlighted that next generation engineering intelligence platform addresses these challenges by collecting, aggregating and transforming those data sources into insights. In a future article we will explore how such software engineering platforms operate and what some of the key success criteria are to transform engineering management into a smooth running data-driven decision-making process.



[2] Gartner AST reviews.


[4] Enterprise BI Platforms, Q3 2019

Ralf Huuck
Founder & CEO of Logilica
Join the Community.
Don't miss the next article.

Trending  Posts