RESEARCH Methods / Methodological Features and Effect Sizes

As evidence-based reform becomes increasingly important in educational policy, it is becoming essential to understand how research design might contribute to reported effect sizes in experiments evaluating educational programs. The purpose of this article is to examine how methodological features such as types of publication, sample sizes, and research designs affect effect sizes in experiments. A total of 645 studies from 12 recent reviews of evaluations of reading, mathematics, and science programs were studied. The findings suggest that effect sizes are roughly twice as large for published articles, small-scale trials, and experimenter-made measures, than for unpublished documents, large-scale studies, and independent measures, respectively. In addition, effect sizes are significantly higher in quasi-experiments than in randomized experiments. Explanations for the effects of methodological features on effect sizes are discussed, as are implications for evidence-based policy.

Technical Report

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Published Report

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

Rigorous evidence of program effectiveness has become increasingly important with the 2015 passage of the Every Student Succeeds Act (ESSA). One question that has not yet been fully explored is whether program evaluations carried out or commissioned by developers produce larger effect sizes than evaluations conducted by independent third parties. Using study data from the What Works Clearinghouse, we find evidence of a “developer effect,” where program evaluations carried out or commissioned by developers produced average effect sizes that were substantially larger than those identified in evaluations conducted by independent parties. We explore potential reasons for the existence of a “developer effect” and provide evidence that interventions evaluated by developers were not simply more effective than those evaluated by independent parties. We conclude by discussing plausible explanations for this phenomenon as well as providing suggestions for researchers to mitigate potential bias in evaluations moving forward.

Technical Report

Wolf, R., Morrison, J.M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Published Report

Wolf, R., Morrison, J.M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness. DOI: 10.1080/19345747.2020.1726537

The Every Student Succeeds Act has made the use of evidence even more relevant to policymakers and practitioners. The What Works Clearinghouse (WWC) serves a role as reviewer of evidence as policymakers and practitioners attempt to navigate the broad expanse of educational research. For this reason, it is of vital interest to understand whether WWC policies lead to fair estimates of effect sizes. Previous research (e.g. Cheung & Slavin, 2016) has indicated that small studies are associated with greatly inflated effect sizes. This study examined whether this could be explained by methodological and program factors of studies in the WWC database. Using the outcomes from all accepted studies in the reading and mathematics areas, we found a non-linear trend of effect size decreasing as the sample size increases. Even controlling for research design, level of assignment, type of intervention, and type of outcome, there was still an effect of sample size. While publication bias, intervention strength, methodological rigor, and superrealization may each contribute partially to this sample size effect, neither individually nor together do they explain the full impact of sample size on effect size. We suggest the WWC institute a policy of weighting studies by inverse variance before averaging effect sizes, and setting minimum sample criteria to reduce the inflationary effect of small study bias.

Conference Paper

Neitzel, A., Pellegrini, M., Lake, C., & Slavin, R. (2018, August 1). Do small studies add up in the What Works Clearinghouse? The 128^th meeting of the American Psychological Association, San Francisco, CA. https://osf.io/preprints/edarxiv/qy7ez

Large-scale randomized studies provide the best means of evaluating practical, replicable approaches to improving educational outcomes. This article discusses the advantages, problems, and pitfalls of these evaluations, focusing on alternative methods of randomization, recruitment, ensuring high-quality implementation, dealing with attrition, and data analysis. It also discusses means of increasing the chances that large randomized experiments will find positive effects, and interpreting effect sizes.

Technical Report

Slavin, R., & Cheung, A. (2017). Lessons learned from large-scale randomized experiments. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Published Report

Slavin, R., & Cheung, A. (2017). Lessons learned from large-scale randomized experiments.Journal of Education for Students Placed at Risk. Doi: http://dx.doi.org/10.1080/10824669.2017.1360774

Program effectiveness reviews in education seek to provide educators with scientifically valid and useful summaries of evidence on achievement effects of various interventions. Different reviewers have different policies on measures of content taught in the experimental group but not the control group, called here treatment-inherent measures. These are contrasted with treatment-independent measures of content emphasized equally in experimental and control groups. The What Works Clearinghouse (WWC) averages effect sizes from such measures with those from treatment-independent measures, while the Best Evidence Encyclopedia (BEE) excludes treatment-inherent measures. This article contrasts effect sizes from treatment-inherent and treatment-independent measures in WWC reading and math reviews to explore the degree to which these measures produce different estimates. In all comparisons, treatment-inherent measures produce much larger positive effect sizes than treatment-independent measures. Based on these findings, it is suggested that program effectiveness reviews exclude treatment-inherent measures, or at least report them separately.

Technical Report

Slavin, R.E., & Madden, N.A. (2011). Measures inherent to treatments in program effectiveness reviews. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.slavin-madden-2009-measures-inherent-to-treatments

Published Report

Slavin, R.E., & Madden, N.A. (2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4 (4), 370-380.

Research in fields other than education has found that studies with small sample sizes tend to have larger effect sizes than those with large samples. This article examines the relationship between sample size and effect size in education. It analyzes data from 185 studies of elementary and secondary mathematics programs that met the standards of the Best-Evidence Encyclopedia. As predicted, there was a significant negative correlation between sample size and effect size. The differences in effect sizes between small and large experiments were much greater than those between randomized and matched experiments. Explanations for the effects of sample size on effect size are discussed.

Technical Report

Slavin, R.E., & Smith, D. (2009). The relationship between sample sizes and effect sizes in systematic reviews in education. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

Published Report

Slavin, R.E., & Smith, D. (2009). The relationship between sample sizes and effect sizes in systematic reviews in education. Educational Evaluation and Policy Analysis, 31 (4), 500-506.

Share this:

Related

Program Reviews