How Do You Know? Rethinking the Way Psychologists Analyze Data

July 27, 2017

When Richard Morey was a doctoral student in cognitive psychology, he had something of a crisis of faith in his chosen profession. He had tried to replicate a famous experiment, but was frustrated that he couldn’t get the same findings, and then shocked to discover that other researchers had been having the same problem for years. He quickly learned that this replication problem was plaguing psychology. Too often, findings from a single study become the definitive conclusion on a topic, because when other researchers find no similar effects, it is extremely difficult to get a journal to publish negative or “null” findings or print a correction. Concerned about the reliability of studies he had been reading, Morey could have given up on research, but instead he found a way to help correct the problem by pursuing a specialty he hadn’t expected: statistics and research methodology. Today, Morey is an advocate for improving the quality of psychology research, and he trains students and faculty around the world how to use reliable methods and avoid the pitfalls of previous research.

When Morey began to look into the replicability problem, he discovered that “when you look at some parts of the literature, you realize that the methods are shakier” than it seems. A big part of the problem is in the way researchers often use statistics. Traditionally, social scientists have relied on a procedure called null hypothesis significance testing. They start their statistical testing with the assumption that any apparent relationship between the variables being studied occurs by chance. Then they look at the patterns in their data to determine the probability of obtaining data at least as extreme as what they have if that assumption were correct. They get a statistic called a p-value; the smaller the p value, the lower the probability they would have obtained their data if there were no real relationship between the variables. Psychologists use a conventional cut-off number to determine whether a finding is “significant.”

The method is fairly straightforward and widely used, but it has real problems. One is that, because of the way probability works, p-values go down as sample sizes go up, so researchers with large datasets can use the conventional cut-off to declare that a finding is important, even if it is trivially small. There is also a tendency to go “fishing” in the data until one finds a relationship with a “significant” p-value, even one that is unexpected and possibly untrue. And once that finding is published, the misconception rarely gets corrected because it is so difficult to publish negative findings. That can lead to bad science – and ultimately to a lack of confidence in research.

Morey wanted a different way to approach the problem, and he found one in Bayesian statistics. Bayesian inference calculates the probability of something happening given another variable or piece of information. Unlike null hypothesis testing, it compares evidence for multiple possible outcomes. It doesn’t rely on a p-value cut-off, and it isn’t as sensitive to sample size. Morey explains that the two different approaches rely on different types of inference, and while he stops short of saying that the Bayesian way is more reliable, he has confidence that it can help increase replicability of psychological studies because it allows researchers to be clearer about their goals and assumptions.

“I see my role as trying to inform people about the tension between these different statistical approaches. I try to educate them that the two types of inference are irreconcilable – you have to make a choice.” He conveys that message through the many workshops he conducts at universities and institutions, which often lead to requests for instruction in Bayesian statistics, because many universities still focus on null hypothesis significance testing. He is encouraged by the interest he sees among young psychologists in Bayesian methods, many of whom, like him, never expected to specialize in statistics, and he hopes his teaching will contribute to researchers making more informed decisions about methods and improving the quality of science. He is passing on his belief that discussions about methods are at the heart of psychology. After all, he says, “statistics are so closely related to the very thing all scientists care about: how do you know something?”

Richard Morey is a recipient of the Federation of Associations in Behavioral & Brain Sciences (FABBS) Foundation Early Career Impact Award, to be presented during the annual meeting of the Psychonomic Society in Vancouver, British Columbia.