The Root Cause of All Problems in Data - Revisited

Understanding the core conflict in analytics

Mar 09, 2024

“Data is not a technology problem but a people and process problem.”

Raise your hand if you’ve heard this before. Right? But what does it actually mean?

First let’s unpack the question.

What does it mean when a problem is a technology problem? Does it mean that we lack the technology to solve it? I think that’s a good definition.

What about a people and process problem? What does that mean? Does it have to do with people’s unwillingness to adopt data driven decisions or do we lack a good methodology for being data driven?

In this edition I’ll do a deep dive into these questions and hopefully provide some answers.

Over the past couple of years since I started posting about data and analytics online, a remarkable thing has happened. I’ve gotten hundreds of thousands of impressions, thousands of likes and reposts and many direct messages telling me what I posted has deeply resonated with data professionals.

Why though? Does this mean that data problems are universal somehow? Does that mean there’s a repeating pattern? Yes! I think it does. And you know how much I love patterns 😀

Last June I wrote “The Root Cause of All Problems in Analytics” hoping to analyze these patterns and find the root cause. It turned out to be by far my most popular post - until very recently.

Since then, I’ve kept working on the problem. I was not satisfied with the answers I got back then. Why? Because according to TOC a problem only persists if there’s a hidden, core conflict at its core, and for the longest time I couldn’t find it in analytics.

What’s a core conflict?

In TOC a conflict occurs when two diametrically opposed actions or policies support the same objective. It’s best described using the diagram below known as an Evaporating Cloud. I won’t get into too much detail about it but you can read the Wikipedia page here.

Here’s an example of a Core Chronic Conflict when it comes to software development. This diagram is credited to Gene Kim, co-founder of the DevOps movement and author of the The Phoenix Project, which covers the conflict below through a business novel (the same exact way Goldratt did with TOC using The Goal)

You read it left to right:

The objective is to ensure enablement of business goals. In order to enable business goals, we need to respond to urgent business needs quickly, which means we must complete work quickly.

At the same time in order to enable business goals we need to provide a stable and reliable service and in order to do that we must complete work slowly so we don’t break things.

As you can see the two policies are in conflict. Which one should you do? The conflict creates all manner of undesirable effects (TOC terminology for problems). For example before the DevOps movement, companies used to release software very infrequently.

Some had monthly releases, others quarterly and some even yearly. Can you imagine not releasing a feature for an entire year? Release processes were fraught with anxiety, and often required multiple days to fully stabilize. Software was brittle and deemed untouchable.

DevOps fixed all that. Nowadays features are automatically tested and integrated into the codebase daily, often several times a day. Anyone from a fresh-out-of-school junior developer to a Tech Lead could perform a release.

There are a couple of important things to point out about conflicts.

There are assumptions hidden behind those left pointed arrows. As long as they remain unchallenged the conflict persists. Finding this conflict is not easy, so the recommended method is to use the Current Reality Tree.

Second, even though there are a myriad of problems in any organization, they stem from only a handful of root causes - often a single root cause. As long as the conflict persists, any attempts to fix the problems directly will fail.

How to analyze problems and find their root cause

The Current Reality Tree (aka the Problem Tree) is one of several logical thinking tools developed by Goldratt as part of TOC. In its original form, the idea is to list 10-15 of the most salient problems in a system and then carefully connect them via cause-and-effect relationships.

I came up with the following tree for analytics:

There is A LOT going on in this document, so let me show you how to read it. Start from the bottom and read the Undesirable Effects (UDEs pronounced oo-dee) using IF-THEN logic.

The arrows represent sufficient cause logic, which means that the if an arrow points to a UDE, it must be sufficient to cause that UDE. If not, you might need to add additional UDEs or preconditions.

For example: IF “Managers have more and more questions and requests for data and dashboards” THEN “More and more dashboards, reports, metrics and tables are produced” Is the first UDE sufficient enough cause for the second? I think so.

On the other hand IF “There are no checks or policies to ensure data quality” is insufficient to cause “Data quality gets worse and worse” by itself but when you also add “Data changes in unpredictable ways” then it makes more sense.

Let’s continue.

IF “More and more dashboards, reports, metrics and tables are produced” THEN “Managers don’t know which dashboards, reports, metrics or tables to use” But this UDE causes them to ask for more reports and thus we end up with a never-ending loop of doom.

Let’s look at another path. This UDE causes two others:

IF “There’s no standard definition of key metrics across the org” THEN “There’s inconsistent/no instrumentation of key business processes” At the same time IF “There’s no standard definition of key metrics across the org” THEN “There are multiple conflicting definitions of the same metric”

IF “Manager might use the wrong dashboard, report, metric or table to make decisions” THEN “Managers make ineffective/wrong decisions” which leads to “Managers look incompetent in the eyes of their direct managers”which leads to “Managers get more and more frustrated with the data team” which leads to “The impact of the data team is never realized” which leads to “The data team is perceived as a cost center” which leads to “The data team loses it’s budget / funding”

Can you see how everything is linked together through cause and effect?

Finding the root cause

According to TOC, the core conflict is usually found when a single cause leads to the majority of the effects. That means we’ve found the root cause. In our case, however, we have multiple root causes as end nodes.

What we should try to do is find a common cause for two or more of them in order to connect them. So we should ask ourselves: Why is there no standard definition of metrics across the org? What’s stopping us? Why is the value of analytics opaque and not well understood?

Maybe IF the value of analytics was clear and well understood THEN there would automatically be a standard definition of all the metrics across the org. Right? Is that sufficient? Hmm, it feels like a logical jump, like there should be something else in between.

What if there was a methodology that described exactly how to get value out of analytics, do you think it would require a standard definition of metrics across the org? Of course right? Otherwise how would we benefit from multiple definitions of a key metric like Revenue?

So we can add another UDE at the bottom to connect these two.

IF “There’s no methodology that shows the power of data and analytics” THEN “The value of data and analytics is opaque and not well understood” and at the same time “There’s no standard definition of metrics” That seems to make sense now.

Wait a minute. If such a methodology existed, why couldn’t it also answer the majority of questions managers have? Why do we expect managers to know what to ask for when they most likely don’t have the full context of the business in mind, but we the data people do?

If such a methodology existed, it absolutely should answer most of the questions managers have in a systematic way, leaving room for only a few “research” type questions for the data science team to dive into.

So we can connect all three now with the same root cause!

Such a methodology does exist! Amazon has been using it for years. It’s made up of three components:

A large set of very well defined metrics,
A giant deck of slides showing time series charts of the above metrics
A systematic weekly review process where those metrics are analyzed and decisions are made.

So why aren’t people using it? Does this mean there’s a hidden core conflict?

Discovering the hidden core conflict

The fact that these problems are the same everywhere, and that they persist means there must be a core chronic conflict hidden somewhere. And the fact that there is an existing methodology out there but it’s not used means the conflict is chronic and will persist.

It took me many months but I finally arrived on the Conflict Cloud below for analytics.

Do not be fooled by its simplicity!

So lets read from left to right.

My objective as a manager or executive is to make effective decisions. After all that’s what I get paid to do. In fact I might even have incentives tied to the quality of my decisions and the results they get.

Like all humans I rely on my intuition to make decisions. It’s only logical that in order to rely on my intuition, I need to trust my intuition.

At the same time in order to make effective decisions I’m under constant pressure to rely more and more on data analysis. In order to rely on data analysis, I must distrust my intuition and instead trust data.

Do you notice a small conflict? 😀

Like I said above don’t be fooled by its simplicity.

If data says one thing but our intuition says the opposite, we will generally distrust data and trust our intuition. We don’t like being wrong! But the pressure to use data is immense, so what do we do instead? We try to find data that validates our assumptions.

Scientists have to deal with this conflict every time they work on a theory. It requires rigorous and careful thinking to be open to being wrong and examine assumptions carefully in order to find the truth. But overworked managers don’t have the time or inclination to do so.

And that’s why many attempts to fix these problems from the bottom-up usually fail. You need the methodology to be imposed from the top which means you have to get buy-in from the C-suite.

That’s it for this issue. Did you like this issue? If so let me know in the comments or via a reply.

Until next time.

Josh Perryman

Mar 9, 2024

Very nicely laid out. Though I must admit I'm not convinced that the root is data vs intuition. But I'm also not quite ready to refute it either.

This conclusion seems to find its ultimate focus on a tension internal to an individual and presumes that the individual has full agency and autonomy. That may hold in small organizations (where I'm often employed) but may not in larger organizations (where I often consult).

The DevOps example is a nice contrast because it easily roles up to the organizational level, even though several of its parameters can be applied at the individual level (work, time). So while data has the flexibility of application to both individual and organization, I'm not sure that intuition does.

As a final thought, I think there may be another dimension, or alternative expression of intuition. Perhaps it is rigor, or structure, of decision-making. The most structured expression of decision-making would be a coded algorithm which is completely deterministic in its results. The less rigorous end of the spectrum would be decisions based on emotional or other whims, nearly indistinguishable random dice roles or monkeys with keyboards.

I find that over time organizations tend to get more rigorous in their approach to decision-making, as they layer on processes and requirements in an effort to better manage risks and variations. Perhaps this is the basic contract with the concept of intuition.

Expand full comment

1 reply by Ergest Xheblati