Week 6 Preparation and Overview: Correlation vs. Causation

Learning Outcomes

Upon successful completion of this week, you will be able to:

Describe what a correlation is and how it is established.
Explain the meaning of “correlation does not imply causation.”
Critically evaluate correlational claims.
Explain the danger of confusing correlation with causation.
Identify extraneous or confounding variables.

Reading

Complete the following reading assignments before class:

Read Chapter 8 and Chapter 9 in Battersby (2016).

Key Points

PART I: What is a correlation and how is it established?

Per capita consumption of cheese (US) correlates with the Number of people who died by becoming tangled in their bedsheets.

	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009
*Per capita consumption of cheese (US)* Pounds (USDA)	29.8	30.1	30.5	30.6	31.3	31.7	32.6	33.1	32.7	32.8
*Number of people who died by becoming* *tangled in their bedsheets* Deaths (US) (CDC)	327	456	509	497	596	573	661	741	809	717
Correlation: 0.947091

Adapted from http://www.tylervigen.com/(For the full introduction, go to Spurious Correlations: https://youtu.be/g-g0ovHjQxs)

As the above graph shows, per capita consumption of cheese (US) correlates with the number of people who died by becoming tangled in their bedsheets—that’s what the graphed data (Correlation: 0.947091) shows! But, would eating cheese really cause people to become tangled in their bedsheets and die? Of course not, since the data shows a correlation, not a causal link. One may argue, instead, that both increases in consumption of cheese (US) and incidents of death by bedsheet entanglement constitute a spurious or false relationship.

Likewise, getting an immunization vaccine against measles, mumps, and rubella would mean babies would get autism. But, as we shall see in greater detail later on in the course, this couldn’t be right either, since to say that there is a correlation between a MMR vacine and autism, is to confuse correlation with causation.

All this suggests that some have difficulty determining what a correlation is and how it is established. And, unfortunately, this, in many cases, leads some to reason incorrectly (i.e., commit a fallacy) that correlation by itself implies causation. Of course, what is missing here is the skill (and desire) to critically evaluate a correlational claim to avoid the confused notion that correlation (always) implies causation.

But, before we look at these two concerns, we need to find out what a correlation is. A (bivariate) correlation is an association, relationship, or a correspondence between two variables (or changing events, factors, or things).

For example, there is a widespread belief that fat consumption is linked to heart attacks (this is a correlational claim). In other words, much fat in your diet is a risk factor for getting a heart attack. But, do you really believe that fat consumption is linked to heart attacks? Could it be that there is no positive correlation, association, relationship, or a correspondence between these two changing things? Consider, for instance, the paradox of high-fat diets that are associated with reduced heart disease (this is a negative correlation). The well known “Mediterranean diet,” for example, is simply high-fat Greek food that is good for your health. Or, how about consumption of French cuisine that consists of much fat, but is associated with a relatively low rate of heart attack (Battersby, p. 99, adapted)?

What is crucial here to realize is that only by comparing rates of the effect in those who are in the target category (e.g., high-fat diets that are associated with heart disease) with those who are not (e.g., high-fat diets that are associated with reduced or no heart disease), can we know if being in a particular category is correlated with some possible effect (e.g., getting a heart attack). Accordingly, when searching for correlations, we “…need to compare two samples before making claims about the population generally” (Battersby, p. 105; p. 103, adapted).

Further examples from our textbook and the links below will help you fine-tune your understanding of what a correlation is and how it is established.

LINKS TO LEARN ABOUT CORRELATION

Dancing Statistics: Explaining the Statistical Concept of Correlation Through Dance (4:11)
Correlational Analysis - An Overview (7:18)
Research Design - Correlational Studies (2:32)
Correlations (5:21)
Correlations in Psychology Tutorial (6:16)
What Is Correlation? (6:56)
The Correlation Coefficient - Explained in Three Steps (6:53)
Interpreting the Correlation Coefficient (9:51)

PART II: What does “correlation does not imply causation” mean?

Big Data Analytics/April 13th, 2014 • data, market research (http://tomfishburne.com/2014/04/big-data-analytics.html?utm_medium=twitter&utm_source=twitterfeed)

The argument form below is fallacious.

1. X is correlated with Y.

------------------------------------

2. Thus, X caused Y.

So, for example, one may argue (as depicted by the cartoon above) that because an increase in shaved heads (X) is (positively) correlated with an increase in sales (Y), an increase in shaved heads (X) caused an increase in sales (Y). This is a fallacy.

A fallacy is an argument that has gone wrong, but can often be mistaken for a good argument. Unfortunately, there are many people that accept such a conclusion that something must be accompanied by something else (whether it’s actually true or not) based on informal observation. Such correlation usually serves as the basis for superstitious belief or magical thinking.

As we learned from the key points of WEEK #1, our common core understanding of critical thinking is about taking some argument apart using analysis and evaluating whether we should be convinced that some derived conclusion follows from the evidence. When we look critically at the argument form above, we can see that the derived conclusion (X caused Y) need not always follow from the premise (X is correlated with Y). Correlation does not require causation (i.e., empirically observed correlation is a necessary, but not sufficient condition for causation). This means that the certitude of the conclusion varies and is probabilistic at best. Accordingly, we have to be skeptical about reasoning like this all the time. If we’re not skeptical about this reasoning, we could then commit the fallacy of correlation equals causation (or, fallacy of false or questionable cause). A variation of this is called Post hoc ergo propter hoc (Latin: “after this, therefore because of this”). For our purposes, this simply is interpreted as the argument form:

1. X occurs before Y.

---------------------------------

2. Thus, X is the cause of Y.

LINKS TO LEARN THAT CORRELATION DOES NOT IMPLY CAUSATION

Junk Science Episode 10: Correlation/Causeation (2:26)
Correlation Does Not Imply Causation (3:36)
Correlational Studies--Correlation Does Not Imply Causation (5:47)
Mini Lecture: Correlational Research (11:22)
Notes/Tests - Correlation vs Causation (19:54)

PART III: How do we critically evaluate correlational claims?

Cross Validated (http://stats.stackexchange.com/questions/36/examples-for-teaching-correlation-does-not-mean-causation)

In general, to critically evaluate correlational claims we must begin by determining whether correlation plays a role in the type of study or research under consideration—Descriptive, Correlational, or Experimental. In a descriptive study, correlation plays no part (so correlational claims cannot be evaluated by default); in a correlational study, correlation plays a principal role (so correlation must be critically evaluated); and, in an experimental study, correlation plays an underlying role (so correlation must be critically evaluated as a necessary, but not sufficient condition for causation—we will discuss this in greater detail in WEEK #9 of the course). Accordingly, for an experimental study we must ask what other criteria are also required for establishing a causal claim (we will discuss this in greater detail in WEEK #10 of the course).

The next step to critically evaluate correlational claims is asking our four basic CRITICAL QUESTIONS applied to correlation (p. 118):

What does the claim of correlation mean? Which two variables, changing events, factors, or things co-vary? Do they exhibit a positive or negative relation?
How good is the evidence? Are two relevant groups being compared? Is the difference between the groups large enough (i.e., outside the margin of error of both samples) so that it is unlikely that these differences are the result of chance sampling variation? Were the groups being compared appropriately selected?
What other information is relevant? What is the context? Have other researchers found similar correlations? Of similar strength? Did other researchers use different types of samples and groups?
Are relevant fallacies avoided? For example, consider the fallacies of No comparison, Biased Sampling, Small Sample, Unclear Target Population, and of Significance.

These fallacies are clearly described in our textbook. Since most have been already covered in the previous chapters of our textbook, corresponding online links, and in the Keynotes, we need only introduce the new fallacy of Significance. The error of reasoning here for this fallacy is to argue that the difference between two (sample) groups, in a strict statistical or scientific sense, is important—relying on the common usage of the word “significant.” In contrast, the “[d]ifferences are said to be ‘statistically significant’ when…we can theoretically be 95% confident that the differences are not due to chance” (according to what we learned about statistical reasoning in Chapter 3 of our textbook; p. 105, emphasis added). This, therefore, merely provides a probabilistic judgement about a result that is basically not significant or important in any ordinary sense. As Mark Battersby notes, “[a] ‘statistically significant difference’ between two groups means that it’s very likely that there’s a correlation; but this says nothing about the strength of the correlation or about whether the correlation is of any human, scientific, or personal significance” (pp. 114-115, emphasis added).

The following links about research or study methods and correlation will help you connect the dots with everything you have learned so far.

LINKS TO LEARN ABOUT RESEARCH OR STUDY METHODS AND CORRELATION

Research and Correlation (18:52)
Research Methods - Experiments & Correlations (9:09)
Selecting a Research Design (9:55)
Comparing Descriptive, Correlational, and Experimental Studies (10:44)
Correlation Does Not Imply Causation (20:16)

PART III: What is the danger of confusing correlation with causation?

F11 PSY1001 Sections 14 & 15 - A blog for discussion sections 14 & 15 of Introduction to Psychology (PSY 1001) in Fall 2011 (http://blog.lib.umn.edu/farre212/f11psy1001ds1415/2011/12/)

One way we can think about how events or actions are connected is by appealing to the notion of causation. Causation is the act or process of causing something to happen. So, it is concerned with causes and effects. A cause is the reason why something happens—it is an event or action that directly makes something happen; an effect is what happens because of the cause—it is what happens as a result of the cause.

The danger of confusing correlation with causation is that this can lead to bad assumptions, false (or bizarre) conclusions, and/or ignoring other possible factors (e.g., lurking, extraneous, spurious, or confounding variables—more about this in PART II below). Moreover, when we sidestep how (i.e., analysis) and why (i.e., evaluation) events or actions are really connected, we settle on mere informal observation. As a result, we run the danger of believing that something must be accompanied by something else—whether the association or pattern is actually real or not.

As we learned in Science 210, the belief that an action or an event can have an effect on something even though there is no causal relation between the two is called a superstition. Accordingly, a superstition is a false belief based on a misguided identification of causation. At best, we can say that the two events are associated or correlated. In the extreme, such correlation usually serves as the basis for supernatural, paranormal, or pseudoscientific beliefs. (See 'Science' of the Gaps, The Threat of Pseudoscience, and The Pattern Behind Self-Deception for more information.)

Besides making bad connections (mistaking correlation for causation), let’s consider some other reasons why these pseudoscientific, supernatural, or paranormal beliefs persist (Adapted from Top 5 Ways We Suck):

(2) We don't know how dumb we are (the Dunning-Kruger Effect)— In general, uninformed people overestimate their own knowledge of the facts, don’t recognize actual knowledge or expertise in other people, and don’t realize how ignorant they are. (For more information, watch Dunning-Kruger Effect and Why People Don't Realize They're Incompetent.)

(3) We are incompetent (the Peter Principle) — This is the principle that in hierarchies (particularly in a business setting), people tend to rise to their level of incompetence when they are promoted. Applied to an epistemological hierarchy (where what is at issue is who really knows something), people tend to rise to “their level of ignorance” as they self-promote themselves as competent or knowledgeable in a field outside their limited knowledge base or belief system (e.g., an arm chair philosopher or a self-declared expert in science)—revealing their level of ignorance or incompetence because a “knowledge” base (which just might simply be a hardened belief) in one area does not secure a level of knowledge or competence in another. (For more information, watch Peter Principle, Trading Psychology - The Peter Principle in Trading, and The Myth of Upward Promotion.)

(4) We don't even realize we’re wrong (confirmation bias) —This is the tendency of people to prefer and accept information confirming an already-held belief, but ignore information showing the belief is false. (For more information, watch Confirmation Bias, What is Confirmation Bias?, Pitfalls of Thinking: Confirmation Bias (1/2), and Cognitive Biases and the Authority of Science.)

(5) We lie to ourselves (cognitive dissonance) —We try to make our inconsistent beliefs or anecdotal stories fit with competing reality, although uncomfortably in conflict with our hardened belief system, dogmatic faith, or motivated reasoning. (For more information, watch Cognitive Dissonance, Cognitive Dissonance by Bo. Bennett, TedxCanberra - Ash Donaldson - Cognitive Dissonance, and Telling Stories.)

In contrast, causation is important in empirical science because, when successfully attributed, the experimenter can establish beliefs that are true (i.e., the experimenter can establish that claims correspond with reality) and are justified (i.e., the experimenter can provide good arguments for believing each claim or deduced implication). So, a goal of science is to figure out which patterns are real—one way of achieving this is to figure out which correlations are really causations.

But the process of successfully attributing causation is itself not without problems, for the causal chain (or pathway) of events is often not that clear. For instance, can we be sure that A causes B, or is it, in fact, the other way around that B causes A? Or, is there a common factor, C that can cause one or the other or both? Or, could it be the case that the causal chain (or pathway) of events loops so that both A causes B and B causes A?

LINKS TO LEARN ABOUT CAUSATION

Correlation vs. Causality: Freakonomics Movie (1:29)
Correlation vs. Causation (1:25)
The Danger of Mixing Up Causality and Correlation (5:57)
Correlation Does NOT Imply Causation (3:17)
Correlation vs. Cause and Effect (9:59)
Correlation vs. Causation (2:13)
International Relations 101 (#34): Correlation versus Causation (9:03)

PART IV: What exactly are Extraneous or Confounding Variables?

By Pass Publishing's Difficult Topics Explained (Posted on YouTube as Research Methods: Extraneous and Confounding Variables)

As we have learned, a correlation is a mutual relationship that is thought to exist between two events or types of events.

Furthermore, we also learned that in correlational studies, correlation must be critically evaluated; for although correlation plays a principal role, it does not necessarily imply causation. In contrast, we learned that in experimental studies, correlation plays an underlying role. Correlation must be critically evaluated as a necessary, but not sufficient, condition for causation.

Causation, as the capacity of one variable to directly influence another, is important in experimental studies because it is the bridge that links the independent and dependent variables of the hypothesis (If A, then B), enabling the experimenter to transcend mere correlation.

But, can we be sure that variable A causes variable B, or is it, in fact, the other way around that variable B causes variable A? Or, is there a hidden third, extraneous, or confounding factor C that can cause one or the other or both? This hidden or lurking third factor (another variable at play/an alternative explanation) is called the extraneous, spurious, or confounding variable.

It is important to control the extraneous or confounding variable because if we don’t, we will not be able to establish cause and effect. However, because it would be impossible to find everything that could influence one or the other or both variables, a causal relationship cannot be conclusively proven—although a very likely causal relationship between the variables can be proven by conducting an experiment.

LINKS TO LEARN MORE ABOUT EXTRANEOUS OR CONFOUNDING VARIABLES

Independent, Dependent and Confounding Variables in Quantitative Research (7:47)
Causation Structure - Intro to Statistics (1:11)
Confounding (10:27)
Correlation, Causation, and Lurking Variables (10:33)
AP Stat 4.3 (9:34)

References

Bassham, G., Irwin, W., Nardone, H., & Wallace, J. M. (2008). Critical thinking: A student’s introduction (3rd ed.). New York, NY: McGraw-Hill.

Battersby, M. (2010). Is that a fact? A field guide to statistical and scientific information. Ontario, CA: Broadview Press.

Bronowski, J. (1978). The common sense of science. Cambridge, MA: Harvard University Press.

Caste, N. J., & Kiersky, J. H. (1995). Thinking critically: Techniques for logical reasoning. St. Paul, MN: West Publishing Co.

Epstein, R. L., & Kernberger, C. (2006). The pocket guide to critical thinking. Belmont, CA: Wadsworth.

Huxley, T. H. (1880). The crayfish: An introduction to the study of zoology. New York, NY: D. Appleton.

Lee, J. A. (2000). The scientific endeavor: A primer on scientific principles and practice. San Francisco, CA: Addison Wesley Longman.

Mayfield, M. (2001). Thinking for yourself: Developing critical thinking skills through reading and writing. Belmont, CA: Thomson Learning, Inc. (Cengage Learning).

Nolt, J., & Rohatyn, D. (1988). Schaum’s outline of theory and problems of logic. New York, NY: McGraw-Hill, Inc.

Popper, K. R. (1963). Conjectures and refutations: The growth of scientific knowledge. London, UK: Routledge.

Rutherford, F. J., & Ahlgren, A. (1990). Science for all Americans. New York, NY: Oxford University Press.

Schick Jr., T., & Vaughn, L. (2008). How to think about weird things: Critical thinking for a new age (5th ed.). New York, NY: McGraw-Hill.

Talavera, I. (2011). Beyond ethical codes: A call for critical thinking in religious culture. The Forum on Public Policy: A Journal of the Oxford Round Table, (2011)3, 1-14.

Talavera, I. (2012). Science and religion: Drawing the line. The Forum on Public Policy: A Journal of the Oxford Round Table, (2012)2, 1-25.

Tarski, A. (1994). Introduction to logic and to the methodology of deductive sciences (4th ed.). New York, NY: Oxford University Press.

Weston, A. (2009). A rulebook for arguments. Indianapolis, IN: Hackett Publishing Co., Inc.

Blog