Key Challenges and Some Guidance on Using Strong Quantitative Methodology in Education Research


  • Robin K. Henson University of North Texas
  • Genéa Stewart University of North Texas
  • Lee A. Bedford University of North Texas



doctoral training, educational research, effect sizes, evidence-based practice, quantitative methods


The current article reviews several common areas of focus in quantitative methods with the hope of providing Journal of Urban Mathematics Education (JUME) readers and researchers with some guidance on conducting and reporting quantitative analyses. After providing some background for the discussion, the methodological nature of recent JUME articles is reviewed, followed by commentary on key challenges and recommendations for strong practice in quantitative methodology. The review addresses causal inferences, measurement issues, handling missing data, testing for assumptions, dealing with nested data, and providing evidence for outcomes. Enhanced quantitative training and resources for doctoral students, authors, reviewers, and editors is recommended.


Adler, J., Ball, D. Krainer, K., Lin, F., & Novotna, J. (2005). Reflections on an emerging field: Researching mathematics teacher education. Educational Studies in Mathematics, 60(3), 359–381.

Aiken, L. S., West, S.g., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology: Replication and extension of Aiken, West, Sechrest, and Reno's (1990) survey of PhD programs in North America. American Psychologist, 63(1), 32–50.

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole Publishing Company.

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.).

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.).

Austin, P. C. (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Statistics in Medicine, 27(12), 2037–2049.

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424.

Beaujean, A. A., & Osterlind, S. J. (2008). Using item response theory to assess the Flynn effect in the National Longitudinal Study of Youth 79 Children and Young Adults data. Intelligence, 36(5), 455–463.

Berliner, D. C. (2002). Comment: Educational research: The hardest science of all. Educational Researcher, 31(8), 18–20.

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071.

Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V., Cirillo, M., Kramer, J., & Hiebert, J. (2019). Posing significant research questions. Journal for Research in Mathematics Education, 50(2), 114–120.

Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V., Cirillo, M., Kramer, S. L.., Hiebert, J., & Bakker, A. (2020). Addressing the problem of always starting over: Identifying, valuing, and sharing professional knowledge for teaching. Journal for Research in Mathematics Education, 51(2), 130–139.

Casad, B. J., Hale, P., & Wachs, F. L. (2017). Stereotype threat among girls: Differences by gender identity and math education context. Psychology of Women Quarterly, 41(4), 513–529.

Cochran-Smith, M., & Zeichner, K. M. (2005). Studying teacher educations, The report of the AERA Panel on Research and Teacher Education. Lawrence Erlbaum Associates.

Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7(3), 249–253.

Connolly, P., Keenan, C., & Urbanska, K. (2018). The trials of evidence-based practice in education: A systematic review of randomised controlled trials in education research 1980–2016. Educational Research, 60(3), 276–291.

Courville, T., & Thompson, B. (2001). Use of structure coefficients in published multiple regression articles: B is not enough. Educational and Psychological Measurement, 61(2), 229–248.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170–180.

Demerath, P. (2006). The science of context: Modes of response for qualitative researchers in education. International Journal of Qualitative Studies in Education, 19(1), 97–113.

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

Enders, C. K. (2010). Applied missing data analysis. Guilford Press.

Ferron, J. M., Hogarty, K. Y., Dedrick, R. F., Hess, M. R., Niles, J. D., Kromrey, J. D. (2008). Reporting results from multilevel analyses. In A. A. O'Connell & D. B. McCoach (Eds.), Multilevel modeling of educational data. Information Age Publishing.

Gutiérrez, R. (2002). Enabling the practice of mathematics teachers in context: Toward a new equity research agenda. Mathematical Thinking and Learning, 4(2–3), 145–187.

Henson, R. K. (1999). Multivariate normality: What is it and how is it assessed? Advances in Social Science Methodology, 5, 193–211.

Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177–189.

Henson, R. K. (2002, April 1–5). The logic and interpretation of structure coefficients in multivariate general linear model analyses [Paper presentation]. Annual Meeting of the American Educational Research Association, New Orleans, LA, United States.

Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34(5), 601–629.

Henson, R. K., Hull, D. M., & Williams, C. S. (2010). Methodology in our education research culture: Toward a stronger collective quantitative proficiency. Educational Researcher, 39(3), 229–240.

Henson, R. K., Kogan, L. R., & Vacha-Haase, T. (2001). A reliability generalization study of the Teacher Efficacy Scale and related instruments. Educational and Psychological Measurement, 61(3), 404–420.

Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66(3), 393–416.

Henson, R. K., & Williams, C. (2006, April 7–11). Doctoral training in research methodology: A national survey of education and related disciplines [Paper presentation]. Annual Meeting of the American Educational Research Association, San Francisco, CA, United States.

Hill, J. (2008). Discussion of research using propensity-score matching: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine. Statistics in Medicine, 27(12), 2055–2061.

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523–531.

Howard, K. E., Romero, M., Scott, A., & Saddler, D. (2015). Success after failure: Academic effects and psychological implications of early universal algebra policies. Journal of Urban Mathematics Education, 8(1).

Hughes, G. D., Onwuegbuzie, A. J., Daniel, L. G., & Slate, J. R. (2010). APA Publication Manual changes: Impacts on research reporting in the social sciences. Research in the Schools, 17(1), viii–xix.

Irvin, M., Byun, S. Y., Smiley, W. S., & Hutchins, B. C. (2017). Relation of opportunity to learn advanced math to the educational attainment of rural youth. American Journal of Education, 123(3), 475–510.

Johnson, R. B., & Christensen, L. (2019). Educational research: Quantitative, qualitative, and mixed approaches. SAGE.

Journal of Urban Mathematics Education. (n.d.-a). Policies and procedures. Retrieved November 1, 2019, from

Journal of Urban Mathematics Education. (n.d.-b). About the journal. Retrieved November 1, 2019, from

Kesselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., & Levin, J. R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68(3), 350–386.

Kraha, A., Turner, H., Nimon, K., Zientek, L., & Henson, R. (2012). Tools to support interpreting multiple regression in the face of multicollinearity. Frontiers in Psychology, 3, 44.

Kwok, O., Underhill, A., Berry, J. W., Luo, W., Elliott, T., & Yoon, M. (2008). Analyzing longitudinal data with multilevel models: An example with individuals living with lower extremity intra-articular fractures. Rehabilitation Psychology, 53(3), 370–386.

Lee, L. S. (2018). Success of online mathematics courses at the community college level. Journal of Mathematics Education, 11(3), 69–89.

Lekwa, A. J., Reddy, L. A., Dudek, C. M., & Hua, A. N. (2019). Assessment of teaching to predict gains in student achievement in urban schools. School Psychology, 34(3), 271–280.

Lissitz, R. W., & Samuelson, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448.

Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83(404), 1198–1202.

Matthews, J. S. (2018). When am I ever going to use this in the real world? Cognitive flexibility and urban adolescents’ negotiation of the value of mathematics. Journal of Educational Psychology, 110(5), 726–746.

Maxwell, J. A. (2004). Causal explanation, qualitative research, and scientific inquiry in education. Educational Researcher, 33(2), 3–11.

McCoach, D. B. (2010). Hierarchical linear modeling. In G. R. Hancock, R. O. Mueller, & L. M. Stapleton (Eds.), The reviewer’s guide to quantitative methods in the social sciences (pp. 123–140). Routledge.

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge.

Morales-Chicas, J., & Agger, C. (2017). The effects of teacher collective responsibility on the mathematics achievement of students who repeat algebra. Journal of Urban Mathematics Education, 10(1), 52–73.

Morgan, P. L., Frisco, M. L., Farkas, G., & Hibel, J. (2010). A propensity score matching analysis of the effects of special education services. Journal of Special Education, 43(4), 236–254.

Onwuegbuzie, A. J., & Daniel, L. G. (2005). Evidence-based guidelines for publishing articles in Research in the Schools and beyond. Research in the Schools, 12(2), 1–11.

Osborne, J. W. (2013). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. SAGE.

Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556.

Primi, C., Morsanyi, K., Donati, M. A., Galli, S., & Chiesi, F. (2017). Measuring probabilistic reasoning: The construction of a new scale applying item response theory. Journal of Behavioral Decision Making, 30(4), 933–950.

Quintana, S. M., & Minami, T. (2006). Guidelines for meta-analyses of counseling psychology research. The Counseling Psychologist, 34(6), 839–877.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.

Reise, S. P., Ainsworth, A. T., & Haviland, M. G. (2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101.

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.

Sadikovic, S., Milovanovic, I., & Oljaca, M. (2018). Another psychometric proof of the Abbreviated Math Anxiety Scale usefulness: IRT analysis. Primenjena Psihologija, 11(3), 301–323.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company.

Smith, P. A., & Hoy, W. K. (2007). Academic optimism and student achievement in urban elementary schools. Journal of Educational Administration, 45(5), 556–568.

Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). Pearson.

Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9(2), 165–181.

Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32.

Vacha-Haase, T., Henson, R. K., & Caruso, J. C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62(4), 562–569.

Vacha-Haase, T., Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reliability coefficients: A review of three journals. Journal of Experimental Education, 67(4), 335–341.

Vogler, A. M., Prediger, S., Quasthoff, U., & Heller, V. (2018). Students’ and teachers’ focus of attention in classroom interaction — Subtle sources for the reproduction of social disparities. Mathematics Education Research Journal, 30(3), 299–323.

Valero P. (2008). In between the global and the local: The politics of mathematics education reform in a globalized society. In B. Atweh, A. C. Barton, M. C. Borba, N. Gough, C. Keitel, C. Vistro-Yu, & R. Vithal (Eds.), Internationalisation and Globalisation in Mathematics and Science Education (pp. 421–439). Springer.

Woltman, H., Feldstain, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69.

Young, D. J. (1997, March 24–28). A Multilevel Analysis of Science and Mathematics Achievement [Paper presentation]. Annual Meeting of the American Educational Research Association, Chicago, IL, United States.

Young, J. R., Young, J., Hamilton, C., & Pratt, S. (2019). Evaluating the effects of professional development on urban mathematics teachers TPACK using confidence intervals. REDIMAT – Journal of Research in Mathematics Education, 8(3), 312–338.

Zientek, L. R., Capraro, M. M., & Capraro, R. M. (2008). Reporting practices in quantitative teacher education research: One look at the evidence cited in the AERA panel report. Educational Researcher, 37(4), 208–216.

Zimney, G. H. (1961). Method in experimental psychology. Ronald Press.