Supra

TRUTH-HACKING:
3 Rules To Not Get Fooled by Data

Founder of CX-AI.com and CEO of Success Drivers
// Pioneering Causal AI for Insights since 2001 //
Author, Speaker, Father of two, a huge Metallica fan.

Author: Frank Buckler, Ph.D.

Published on: May 18, 2021 * 12 min read

Everyone communicates with facts and data to support a certain message. Politics is doing it, Media is doing it, Businesses are doing it. Lying with data has become a shady art, perfectionated by politics, cultivated by managers.

Even worse, business leaders fool themselves day in day out by drawing “obvious” conclusions from data.

Imagine we could make sure we do not fall for fake news, alternative facts, spurious correlations, and alike. Imagine we would have a checklist to see if an insight is legit?

How many trillions of dollars could be saved? How many lives could be saved? How much smarter would people guide political decision-makers? How much better would this world become if we all get a little bit data-smart every day?

This article explains the way…

So now, how can we separate the wheat from the chef? How do you check if a finding is flawed?

These three simple rules are your guide:.

  1. Control How Your Data Is Sampled
  2. Understand What Your Data Really Mean
  3. Be Aware How of You Infer Truth

Here is why…

Control how your data was sampled

In 2016, Donald Trump held his inauguration speech. Those media who were critical towards Trump, were highlighting the fact that the present audience was significantly smaller than ever before.

However, Trump’s spokesperson Katrina Pierson mentioned that “the peak subway passengers had been extremely high.” This comment became world-famous as the “Alternative Facts”-Quote.

It stands for cherry-picking examples to prove the point you want to make.

Politicians of all parties are doing it all the time. But not just them. Business leaders do it too.

The moon landing conspiracy is built on using selected facts that would make this project questionable – like the waving flag, although the moon has no wind. It is neglecting the available facts that would explain the phenomenon.

Beware of Cherry-Picking

In the 2000s, when I still used to read business books, I had this eureka moment. I was reading the book “Simply Better” which was mentioning Cardinal Health as an example of bad management. Then I read “Close to the Core” which used Cardinal Health as the case study for how to do it.

Even business book authors do cherry-picking. Every single business book on this mother earth is a cherry-picking selection of cases that prove one single theory.

What’s wrong with getting some inspiration from business books?

It’s a dangerous dance at the edge. Getting inspired by wrong ideas will derail your mind.

The problem: the books hinder you to build your own opinion and finding the truth. They just design to make you believe the theory.

Reading business books will less likely make you smarter or more successful. It will more likely make you become a business fashion victim.

Cherry-Picking is the practice of selecting results that fit your claim and excluding those that don’t.

Why is this strategy so successful of fooling us all?  We are drawing conclusions from any experience no matter if it is representative or not.

What can you do about it? Do not conclude from data, without checking that if it represents the matter of interest.

There are other forms of “cherry-picking”…

Sampling Bias – the unintended cherrypicking

In 1948 when The Chicago Tribune mistakenly predicted, based on a phone surveythat Thomas E. Dewey would become the next US president. They hadn’t considered that only a certain demographic could afford telephones, excluding entire segments of the population from their survey.

This cherrypicking can sneak into decision-making easily. Have you ever done a churner survey?

Survival Bias – another unintended cherry picking

In world war two the US army checked bombers for damages thru gunfire and applied ammunition to those spots. It did not help at all.

Why? They needed to check those who did NOT survive gunfire too in order to find spots that required further protection.

I have never met a client who is doing a churner survey and who realized he fools himself with the survivorship bias. To find out what leads to churn, you need to survey customers, NOT ex-customers, and follow the churn on them.

We believe in data and facts. For us, “facts” are a synonym for “truth”. But cause-effect relationships can not be observed but must be inferred (and a “reason” IS a cause-effect relationship),. This article has more NEVER take facts to decide about reasons.  

ALWAYS be sure to have cases with different outcomes in your sample – successful and not successful, churner and non-churner, winner and loser.

Understand What Your Data Really Mean

When working with an automotive brand, I was astonished at how incredibly high the customer satisfaction was at nearly all of their car dealers.

The client took me aside and explained: Car dealers are incentivized by customer satisfaction. They get millions in cashback from the manufacturer if customers are satisfied. He further explained: sure, larger  car dealers hired personnel just to call those who give lower ratings to take this rating back. They also implemented all kinds of other measures to make sure the rating was excellent.

When I shortly after bought a car, it became apparent when the dealer smiled at me with a huge basket of flowers in his hand and said: “hope you’ll enjoy this car – if someone from Ford calls you, we would be delighted if you say ‘extremely satisfied’ – if you are not, please tell me beforehand.”

Beware the Hawthorne Effect

In the 1920s at Hawthorne Works, an Illinois factory, a social sciences experiment hypothesized that workers would become more productive following various changes to their environment such as working hours, lighting levels, and break times. However, it turned out that what motivated the workers’ productivity was someone taking an interest in them.

When you try to measure customer satisfaction or the likelihood to recommend, asking alone may increase or decrease the outcome. In CX this is sometimes used in “cuddle calls”. With this you reach out to customers just to show that you care and improve satisfaction.

WHY is this fooling us all?

We are not aware that data is just a representation of a real-world phenomenon. We take the label of the data and take this as the truth. Only when we understand how the data was generated can we understand the resulting data analysis.

WHAT can you do about it?

  • When interpreting analysis results, also consider that data might not be generated the way you believe it was
  • Track the context of data generation (e.g. as a binary variable as 1 for “with observation” and “0” without observation) and include this information in the analysis
  • Make sure you really have understood which piece of reality the data really stands for.
Be Aware of How You Infer Truth

True facts: global warming correlates highly negatively with the number of pirates. The number of people drowning by falling in pools correlates with Niclas Cage appearance in movies. And the shoe size correlates with carrier success.

We all heard about it: “Correlation is not causation“. The intuition to take correlation as causation is hard-wired in our brains. It is tough to resist this conclusion.

Correlation works great where just one or two things impact an outcome, AND the effect happens shortly after the action. Beyond these cases, correlation is largely misleading.

Beware the Cobra Effect

If marketers around the world need to hit their sales numbers for the month, they do price promotions. It’s inevitable that this works, as sales number immediately reacts.

Still, it causes more harm than good: The Cobra Effect.

The good share of the additional sales is simply sales that would have happened anyway, but later, and at a higher price.

The net sales effect is much lower, and the profit effect questionable as margin suffers.

On top of this, competition reacts to defend its market share and pushes its own price promotions. This harms your sales, overall market price level and leading you to the next price promotion. It’s a vicious cycle.

In the 1800s, it was said that the British Empire wanted to reduce cobra bite deaths in India. They offered a financial incentive for every cobra skin brought to them to motivate cobra hunting. But instead, people began farming them. When the government realized the incentive wasn’t working, they removed it, so cobra farmers released their snakes, increasing the population.

This is WHY the Cobra Effect is successfully fooling us?

We see the immediate effect e.g. of a price promotion. The indirect effect as well as the long-term effects are not that obvious because on the long-term other factors influence the outcome as well. Also, the effect can spread over time.

When people do not know a solution to an obvious problem, they take the obvious solution: price promotion.

Actionism is always a “good” strategy in complex environments. Nobody can accuse you not to do something. Also, nobody can easily prove that you are wrong.

Managing a complex system takes complex analytics to understand it and self-organizing measures to address it.

Beware Assumptions

There is a joke among data scientists: “if you shoot past the deer on the left and on the right, on average, it is dead”. Believing that an average well represents all, can be misleading.

It’s all around…

Once, we ran a marketing mix modeling for a pharma sales force to determine which marketing and sales actions drive prescriptions. Conventional (linear) modeling “found” that “giving product samples” to doctors will drive prescriptions.

When applied a more flexible machine learning methodology, it turned out that at some point, more samples REDUCE prescriptions.

After the fact, this is clear. The doctors give the samples away. If they have too many, they will first use samples, not prescribe.

Summarizing always comes with assumptions. Those assumptions are in many (if not most) cases WRONG.

To demonstrate the effect, statistician Francis Anscombe put together four-example-data sets in the 1970s known as Anscombe’s Quartet. Each data set has the same mean, variance and correlation.

However, when graphed, it became clear that each of the data sets are totally different. Anscombe wanted to make clear that the shape of the data is as important as the summary metrics and cannot be ignored in the analysis.

It can be misleading to only look at the summary metrics of data sets. This applies to parametric statistical modeling as well. Their parameters are summarizing a preassumed property (primarily “a linear relationship”).

WHY is this strategy so successfully fooling us all? Our world is complicated enough. We have a desire to make it simple. Simple is beautiful to us. We believe what we want to believe: a simple, plausible explanation.

Confounder at work

When you take NPS ratings of customers and then correlate this with the later development of customers (whether or not they churn or even buy more), repeatedly, you will be surprised.

What we see is that it often hardly correlates for some reasons. One reason is the so-called Simson’s Paradox.

When customer segments that have a higher potential to upsell, give at the same time more critical ratings, it will mess up your correlation.

In the 1970s, Berkeley University was accused of sexism because female applicants were less likely to be accepted than male ones. However, when they tried to identify the source of the problem, it was found that for individual subjects, the acceptance rates were generally better for women than men.

The paradox was caused by a difference in what subjects men and women were applying for. A greater proportion of female applicants applied to highly competitive subjects, where acceptance rates were much lower for both genders.

The Simpson’s Paradox is a phenomenon in which a trend appears in different groups of data but disappears or reverses when the groups are combined.

It works because humans are hardwired to believe in correlations “When something is consistently happening along with something else (correlation) there must be a cause-effect relationship of some kind.”

Correlation leads us astray for several reasons. One reason is highlighted in the Simson Paradox: The influence of a confounding effect.

If there is something that influences the cause (e.g. the NPS rating) and the effect (e.g. customer value, churn, upsell), at the same time then correlation (as well as modeling that excluded the confounder) can be wrong.

WHAT can we do about it?

In a business context, whenever possible, avoid jump from correlation to the conclusion.

Instead, use methods designed to infer causality. They are coined “causal analysis” and the latest tech “causal machine learning”.

In such situations not possible to run proper analytics: at least make yourself aware of how fragile your learning is. Try to hypothesize other explanations for the correlation. Evaluate possible A/B testing options.

A warning sign is always when not only the correlation but also a non-correlation, can be wrapped in a nice story.

My advice on “Indirect (Cobra) Effects”: Beware actionism. If you are not sure, doing something can be more harmful than “wait & see”. The latter is an established strategy in medicine and should be in business too.

There are well-established methods in place that can bring light in the darkness. If you educate decision-makers,it needs proper analytics to see root causes of effects, then the causal models will become a standard practice.

My advice on “Beware Assumptions”: Take a look at raw data first. It will not answer your overall question but can quickly spotlight wrong assumptions you are making.

Practice humbleness. Humans tend to overestimate the validity of what they know – big times. Be aware that most stuff we know about business will turn out to be wrong (or oversimplified) in the future.

Machinelearning is made to model input-output relations with the least amount of assumption. Causal machinelearning has the framework and algorithms to get the insights you are looking for.

My advice on “Confounder under control”:  Berkley University’s example suggests that it is enough to split up the KPI comparison in a two-dimensional table. But this is deceptive.

It needs a good amount of fortune to find the hidden confounder this way. Mostly, you don’t know what you don’t know.

There can be dozens of variables that may turn out to be a confounder. That’s why causal machinelearning is the way to go.

This article further elaborates on how you can spot causation

Truth-hacking – the art & science of the 21st century

Being a Truth-hacker can be hard. Don’t become one if you can not handle uncertainty. Don’t become one if you do not have a passion for truth.

My passion is based on my conviction that it’s unethical and unfair not to aim for truth.

It’s unfair to your colleagues to have hidden agendas, it’s unfair to shareholders who invest hard-earned money, it’s unfair to customers who are those who pay your check.

Truth-hacking can be learned. It is a “simple” three-step process:

  1. Control How Your Data Is Sampled
  2. Understand What Your Data Really Mean
  3. Be Aware of How You Infer Truth

Think about how this would make the world a better place!

What if you don’t have to fool yourself anymore with ludicrous fact-based stories? Wouldn’t it feel better too?

What if middle management can’t trick Company leaders and investors anymore with questionable fact-based explanations? Investments would find uses that thrive instead of making the shady rich.

What if politics can’t blind voters anymore with cherry-picking facts? Voters would elect politicians that truly drive prosperity.

If you are now passionate about truth-hacking too, please spread the word.

Share this article not just to your friends but to those who you really want to adopt this art too.

Share this article and sign up to get my upcoming articles elaborating on these topics.

– Frank