For accuracy, brain studies of complex behavior require thousands of people

Findings will encourage more data sharing, collaboration among researchers

by Tamara Bhandari•March 16, 2022

Alex Berdis

Scientists rely on brainwide association studies to measure brain structure and function — using brain scans — and link them to mental illness and other complex behaviors. But a study by researchers at Washington University School of Medicine in St. Louis and the University of Minnesota shows that most published brainwide association studies are performed with too few participants to yield reliable findings.

As brain scans have become more detailed and informative in recent decades, neuroimaging has seemed to promise a way for doctors and scientists to “see” what’s going wrong inside the brains of people with mental illnesses or neurological conditions. Such imaging has revealed correlations between brain anatomy or function and illness, suggesting potential new ways to diagnose and treat psychiatric, psychological and neurological conditions. But the promise has yet to turn into reality, and a new study explains why: The results of most studies are unreliable because they involved too few participants.

Scientists rely on brainwide association studies to measure brain structure and function — using MRI brain scans — and link them to complex characteristics such as personality, behavior, cognition, neurological conditions, and mental illness. But a study by researchers at Washington University School of Medicine in St. Louis and the University of Minnesota, published March 16 in Nature, shows that most published brainwide association studies are performed with too few participants to yield reliable findings.

Using publicly available data sets – involving a total of nearly 50,000 participants – the researchers analyzed a range of sample sizes and found that brainwide association studies need thousands of individuals to achieve higher reproducibility. Typical brainwide association studies enroll just a couple dozen people.

Such so-called underpowered studies are susceptible to uncovering strong but spurious associations by chance while missing real but weaker associations. Routinely underpowered brainwide association studies result in a glut of astonishingly strong yet irreproducible findings that slow progress toward understanding how the brain works, the researchers said.

“Our findings reflect a systemic, structural problem with studies that are designed to find correlations between two complex things, such as the brain and behavior,” said senior author Nico Dosenbach, MD, PhD, an associate professor of neurology at Washington University. “It’s not a problem with any individual researcher or study. It’s not even unique to neuroimaging. The field of genomics discovered a similar problem about a decade ago with genomic data and took steps to address it. The NIH (National Institutes of Health) began funding larger data-collection efforts and mandating that data must be shared publicly, which reduces bias, and as a result, genome science has gotten much better. Sometimes you just have to change the research paradigm. Genomics has shown us the way.”

First author Scott Marek, PhD, an instructor in psychiatry at Washington University, and co-first author Brenden Tervo-Clemmens, PhD, a postdoctoral researcher at Massachusetts General Hospital/Harvard Medical School, realized something was wrong with how brainwide association studies typically are conducted when they could not replicate the results of their own study.

“We were interested in finding out how cognitive ability is represented in the brain,” Marek said. “We ran our analysis on a sample of 1,000 kids and found a significant correlation and were like, ‘Great!’ But then we thought, ‘Can we reproduce this in another thousand kids?’ And it turned out we couldn’t. It just blew me away because a sample of a thousand should have been plenty big enough. We were scratching our heads, wondering what was going on.”

To identify problems with brainwide association studies, the research team — including Dosenbach, Marek, Tervo-Clemmens, co-senior author Damien A. Fair, PhD, director of the Masonic Institute for the Developing Brain at the University of Minnesota, and others — began by accessing the three largest neuroimaging datasets: the Adolescent Brain Cognitive Development Study (11,874 participants), the Human Connectome Project (1,200 participants) and the UK Biobank (35,375 participants). Then they analyzed the datasets for correlations between brain features and a range of demographic, cognitive, mental health and behavioral measures, using subsets of various sizes. Using separate subsets, they attempted to replicate any identified correlations. In total, they ran billions of analyses, supported by the powerful computing resources of Fair’s Masonic Institute of the Developing Brain.

The researchers found that brain-behavior correlations identified using a sample size of 25 — the median sample size in published papers — usually failed to replicate in a separate sample. As the sample size grew into the thousands, correlations became more likely to be reproduced.

Further, the estimated strength of the correlation, a measure known as the effect size, tended to be largest for the smallest samples. Effect sizes are scaled from 0 to 1, with 0 being no correlation and 1 being perfect correlation. An effect size of 0.2 is considered quite strong. As sample sizes increased and correlations became more reproducible, the effect sizes decreased. The median reproducible effect size was .01. Yet published papers on brainwide association studies routinely report effect sizes of 0.2 or more.

In retrospect, it should have been obvious that the reported effect sizes were too high, Marek said.

“You can find effect sizes of 0.8 in the literature, but nothing in nature has an effect size of 0.8,” Marek said. “The correlation between height and weight is 0.4. The correlation between altitude and daily temperature is 0.3. Those are strong, obvious, easily measured correlations, and they’re nowhere near 0.8. So why did we ever think that the correlation between two very complex things, like brain function and depression, would be 0.8? That doesn’t pass the sniff test.”

Neuroimaging studies are expensive and time-consuming. An hour on an MRI machine can cost $1,000. No individual investigator has the time or money to scan thousands of participants for each study. But if all of the data from multiple small studies were pooled and analyzed together, including statistically insignificant results and minuscule effect sizes, the result probably would approximate the correct answer, Dosenbach said.

“The future of the field is now bright and rests in open science, data sharing and resource sharing across institutions in order to make large datasets available to any scientist who wants to use them,” Fair said. “This very paper is an amazing example of that.”

Dosenbach, also an associate professor of biomedical engineering, of occupational therapy, of pediatrics and of radiology, added: “There’s a lot of promise to this kind of work in terms of finding solutions for mental illnesses and just understanding how the mind works. The great news is that we’ve identified a main reason why brain imaging has yet to deliver on its promise to revolutionize mental health care. The work represents a major turning point for linking brain activity and behavior by clearly defining not just the prior roadblocks but also the promising new paths forward.”

Announcements

Information for Our Community

For accuracy, brain studies of complex behavior require thousands of people

Related

Editors' Picks