Measurement in Behavioral Sciences: Moving Beyond the “Sum-and-Alpha” Approach

A recent article urging behavioral scientists to re-evaluate measurement standards suggests many studies may be drawing conclusions that are not supported by data. According to Dr. Daniel McNeish’s “Limitations of the Sum-and-Alpha Approach to Measurement in Behavioral Research” published in Policy Insights from the Behavioral and Brain Sciences, only 2 to 26 percent of empirical articles report having thoroughly tested to ensure measurements are accurately capturing constructs of interest — a trend which has not improved in three decades, despite implications for policymakers and the public who use these findings to inform decisions.

The behavioral scientist aims to concretize invisible and abstract patterns of human experience into something measurable — something that can be assessed, compared, and analyzed across groups of people. Unlike the physical sciences, many of the constructs behavioral scientists study lack an inherent structure, and scientists must create tools for measuring them. To be useful, these tools must be reliable and valid. A measurement is reliable if it performs as-intended, with limited error, in a way that is consistent across time. Validity in measurement refers to whether the measurement’s components (a scale’s individual items; “do you enjoy public speaking?”) are accurately assessing the target construct (ex: social anxiety).

As McNeish points out, assumptions about a measure being valid and reliable are often made without being thoroughly tested. Methods like factor analysis can determine how individual items within a measure map on to a construct differentially. For example, “I like being alone” and “I feel anxious around people” may not be equally relevant to the construct of social anxiety, though many studies use methods that assume they are. Among such methods are “sum-and-alpha” approaches. “Sum” refers to single value, summed average of items that reflects an individual’s “score” of a construct (ex: “score of social anxiety”), while “alpha” refers to a common test of internal reliability. Many researchers rely on alpha values to determine statistical significance, though it also operates on assumptions that are not frequently tested.

McNeish argues that popular “sum-and-alpha” approaches are sometimes insufficient in drawing the conclusions researchers then report. These approaches do not account for the possibility certain items may be more relevant to the construct of interest, nor do they explore whether a measure is capturing multiple constructs simultaneously. Without more rigorous testing, it is possible findings are being driven by random noise and not real-world trends.

How is it, then, that “sum-and-alpha” has become so prevalent? McNeish points to two factors: graduate training models and collaborative research trends. Few graduate programs offer in-depth coursework concerning measurement, and behavioral scientists are frequently expected to be “jacks of all trades” with mastery in each aspect of the research process. This expectation is not observed in other fields, where researchers often work in-tandem with statistical experts.

McNeish encourages consumers of empirical research (policy makers and public members alike) to review measurement methodology when contextualizing research findings. Findings from studies that report going above the “sum-and-alpha” methodology should be prioritized in informing decision making.

Research to Policy