Liam Moroney 4 min

Causation vs. Correlation


Data can be very easy to misinterpret if you don't know its limitations. In this episode, Liam talks about one of the most common ways to misinterpret data, causation vs. correlation.



0:00

In marketing, we love to talk about being data-driven, or at least in recent

0:04

times, data

0:05

and farmed.

0:07

But one way or the other, we love to see ourselves as marketers who use data to

0:10

make

0:11

better decisions.

0:13

One thing that's very important to go along with that though, is how much it

0:16

means to

0:17

be data-literate.

0:18

And that can sound like a slightly insulting phrase, but the truth is that data

0:22

can be

0:23

very easy to misinterpret if you don't know where its limitations are.

0:28

And a really big example of that is the difference between correlation and

0:33

causation.

0:34

And if you look at marketing data, this is a much bigger problem than we often

0:38

think.

0:39

Now the simplest example of correlation mixed up with causation is if you were

0:43

to plot a

0:43

graph of all of the people who use ice cream and all of the people who wear

0:47

shorts, you

0:48

would find that there's a very strong correlation between the two.

0:52

Now does that mean that people who wear shorts are more likely to eat ice cream

0:56

No.

0:57

I mean the people who eat ice cream are more likely to wear shorts also though.

1:02

What it means is that the people who are eating ice cream and wearing shorts

1:05

usually

1:05

do so because it's hot out.

1:07

So there is a correlation between the two.

1:10

And this comes up in marketing programs quite a lot.

1:14

A very obvious example that you see commonly is in paid search programs.

1:19

Where if you look at unbranded keywords versus branded keywords, you will find

1:23

that the branded

1:24

keywords get much higher click through rates, much lower cost per leads and a

1:27

much higher

1:28

overall ROI.

1:31

But of course they do because it's simply correlated data.

1:34

The people who click on branded keywords were looking for you.

1:37

The people who clicked on the unbranded keywords were looking for something

1:40

that you might do

1:41

but not necessarily you.

1:43

So of course they were going to perform better.

1:45

It's a co-hearted group that has more likelihood to convert.

1:49

It's not because the branded keywords was the better campaign.

1:52

But it could very easily lead you to say let's put more money into branded and

1:57

less into

1:58

unbranded.

1:59

And what you might miss in all of this is are those people who click on branded

2:04

keywords

2:05

unlikely to buy from you if you didn't put that ad in front of them.

2:09

A lot of the data would say that they were going to buy anyway.

2:12

You simply paid them to come in when they were going to do so anyway.

2:18

Another example of this that doesn't get talked about as often is retargeting.

2:22

When you look at a retargeting list versus a cold audience list, again same

2:26

things, much

2:27

higher clicker rates, much lower cost per lead, much higher ROI.

2:32

And so it leads people to sometimes abandon the cold audiences entirely and

2:36

only focus

2:37

on the retargeting list.

2:39

But again this is correlation.

2:40

The people in a retargeting list happen to be people who are already familiar

2:45

with you

2:46

versus the cold audience list who are not.

2:49

So if you abandon the cold audience list, you may not be bringing in any new

2:53

people who

2:54

are not yet familiar with you.

2:56

You are simply focusing on the audience who is.

2:58

And in the long term, you can end up generating fewer deals by focusing on the

3:03

conversion rate

3:04

of that group.

3:05

Or another version is people try and take that cold audience and as quickly as

3:09

possible,

3:10

put them into a retargeting list assuming that that same conversion rate will

3:15

hold,

3:15

but it generally doesn't.

3:17

And this is the difference between correlation and causation.

3:20

And what really matters is if you accidentally look at correlation and assume

3:24

that it is

3:25

caused, it can lead you to think that you are generating better and better

3:30

outcomes.

3:31

But at the end of the day, if you're betting on things like branded keywords

3:35

only and retargeting

3:36

lists only, then you're not actually bringing new unfamiliar future buyers in

3:41

and down the

3:42

line, you could end up with less revenue even though you went with the highest

3:46

ROI options

3:47

available.

3:48

That's the difference between correlation and causation.

3:52

And that's the importance of data literacy.

3:54

[MUSIC]

4:04

(dramatic music)

4:07

(buzzing)