Key Challenges and Some Guidance on Using Strong Quantitative Methodology in  Education Research

Robin  Henson; Genéa   Stewart; Lee  Bedford

doi:10.21423/jume-v13i2a382

Authors

Robin K. Henson University of North Texas https://orcid.org/0000-0002-0656-585X
Genéa Stewart University of North Texas
Lee A. Bedford University of North Texas

DOI:

https://doi.org/10.21423/jume-v13i2a382

Keywords:

doctoral training, educational research, effect sizes, evidence-based practice, quantitative methods

Abstract

The current article reviews several common areas of focus in quantitative methods with the hope of providing Journal of Urban Mathematics Education (JUME) readers and researchers with some guidance on conducting and reporting quantitative analyses. After providing some background for the discussion, the methodological nature of recent JUME articles is reviewed, followed by commentary on key challenges and recommendations for strong practice in quantitative methodology. The review addresses causal inferences, measurement issues, handling missing data, testing for assumptions, dealing with nested data, and providing evidence for outcomes. Enhanced quantitative training and resources for doctoral students, authors, reviewers, and editors is recommended.

References

Adler, J., Ball, D. Krainer, K., Lin, F., & Novotna, J. (2005). Reflections on an emerging field: Researching mathematics teacher education. Educational Studies in Mathematics, 60(3), 359–381. https://doi.org/10.1007/s10649-005-5072-6 DOI: https://doi.org/10.1007/s10649-005-5072-6

Aiken, L. S., West, S.g., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology: Replication and extension of Aiken, West, Sechrest, and Reno's (1990) survey of PhD programs in North America. American Psychologist, 63(1), 32–50. https://doi.org/10.1037/0003-066X.63.1.32 DOI: https://doi.org/10.1037/0003-066X.63.1.32

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Brooks/Cole Publishing Company.

American Psychological Association. (2010). Publication manual of the American Psychological Association (6th ed.).

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000 DOI: https://doi.org/10.1037/0000165-000

Austin, P. C. (2008). A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Statistics in Medicine, 27(12), 2037–2049. https://doi.org/10.1002/sim.3150 DOI: https://doi.org/10.1002/sim.3150

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. https://doi.org/10.1080/00273171.2011.568786 DOI: https://doi.org/10.1080/00273171.2011.568786

Beaujean, A. A., & Osterlind, S. J. (2008). Using item response theory to assess the Flynn effect in the National Longitudinal Study of Youth 79 Children and Young Adults data. Intelligence, 36(5), 455–463. https://doi.org/10.1016/j.intell.2007.10.004 DOI: https://doi.org/10.1016/j.intell.2007.10.004

Berliner, D. C. (2002). Comment: Educational research: The hardest science of all. Educational Researcher, 31(8), 18–20. https://doi.org/10.3102/0013189X031008018 DOI: https://doi.org/10.3102/0013189X031008018

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061 DOI: https://doi.org/10.1037/0033-295X.111.4.1061

Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V., Cirillo, M., Kramer, J., & Hiebert, J. (2019). Posing significant research questions. Journal for Research in Mathematics Education, 50(2), 114–120. https://doi.org/10.5951/jresematheduc.50.2.0114 DOI: https://doi.org/10.5951/jresematheduc.50.2.0114

Cai, J., Morris, A., Hohensee, C., Hwang, S., Robison, V., Cirillo, M., Kramer, S. L.., Hiebert, J., & Bakker, A. (2020). Addressing the problem of always starting over: Identifying, valuing, and sharing professional knowledge for teaching. Journal for Research in Mathematics Education, 51(2), 130–139. https://doi.org/10.5951/jresematheduc-2020-0015 DOI: https://doi.org/10.5951/jresematheduc-2020-0015

Casad, B. J., Hale, P., & Wachs, F. L. (2017). Stereotype threat among girls: Differences by gender identity and math education context. Psychology of Women Quarterly, 41(4), 513–529. https://doi.org/10.1177%2F0361684317711412 DOI: https://doi.org/10.1177/0361684317711412

Cochran-Smith, M., & Zeichner, K. M. (2005). Studying teacher educations, The report of the AERA Panel on Research and Teacher Education. Lawrence Erlbaum Associates.

Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7(3), 249–253. https://doi.org/10.1177/014662168300700301 DOI: https://doi.org/10.1177/014662168300700301

Connolly, P., Keenan, C., & Urbanska, K. (2018). The trials of evidence-based practice in education: A systematic review of randomised controlled trials in education research 1980–2016. Educational Research, 60(3), 276–291. https://doi.org/10.1080/00131881.2018.1493353 DOI: https://doi.org/10.1080/00131881.2018.1493353

Courville, T., & Thompson, B. (2001). Use of structure coefficients in published multiple regression articles: B is not enough. Educational and Psychological Measurement, 61(2), 229–248. https://doi.org/10.1177/0013164401612006 DOI: https://doi.org/10.1177/0013164401612006

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555 DOI: https://doi.org/10.1007/BF02310555

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. http://doi.org/10.1037/h0040957 DOI: https://doi.org/10.1037/h0040957

Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170–180. http://doi.org/10.1037/0003-066X.60.2.170 DOI: https://doi.org/10.1037/0003-066X.60.2.170

Demerath, P. (2006). The science of context: Modes of response for qualitative researchers in education. International Journal of Qualitative Studies in Education, 19(1), 97–113. https://doi.org/10.1080/09518390500450201 DOI: https://doi.org/10.1080/09518390500450201

Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates. DOI: https://doi.org/10.1037/10519-153

Enders, C. K. (2010). Applied missing data analysis. Guilford Press.

Ferron, J. M., Hogarty, K. Y., Dedrick, R. F., Hess, M. R., Niles, J. D., Kromrey, J. D. (2008). Reporting results from multilevel analyses. In A. A. O'Connell & D. B. McCoach (Eds.), Multilevel modeling of educational data. Information Age Publishing.

Gutiérrez, R. (2002). Enabling the practice of mathematics teachers in context: Toward a new equity research agenda. Mathematical Thinking and Learning, 4(2–3), 145–187. https://doi.org/10.1207/S15327833MTL04023_4 DOI: https://doi.org/10.1207/S15327833MTL04023_4

Henson, R. K. (1999). Multivariate normality: What is it and how is it assessed? Advances in Social Science Methodology, 5, 193–211.

Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34(3), 177–189. https://doi.org/10.1080/07481756.2002.12069034 DOI: https://doi.org/10.1080/07481756.2002.12069034

Henson, R. K. (2002, April 1–5). The logic and interpretation of structure coefficients in multivariate general linear model analyses [Paper presentation]. Annual Meeting of the American Educational Research Association, New Orleans, LA, United States.

Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34(5), 601–629. https://doi.org/10.1177/0011000005283558 DOI: https://doi.org/10.1177/0011000005283558

Henson, R. K., Hull, D. M., & Williams, C. S. (2010). Methodology in our education research culture: Toward a stronger collective quantitative proficiency. Educational Researcher, 39(3), 229–240. https://doi.org/10.3102/0013189X10365102 DOI: https://doi.org/10.3102/0013189X10365102

Henson, R. K., Kogan, L. R., & Vacha-Haase, T. (2001). A reliability generalization study of the Teacher Efficacy Scale and related instruments. Educational and Psychological Measurement, 61(3), 404–420. https://doi.org/10.1177/00131640121971284 DOI: https://doi.org/10.1177/00131640121971284

Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66(3), 393–416. https://doi.org/10.1177/0013164405282485 DOI: https://doi.org/10.1177/0013164405282485

Henson, R. K., & Williams, C. (2006, April 7–11). Doctoral training in research methodology: A national survey of education and related disciplines [Paper presentation]. Annual Meeting of the American Educational Research Association, San Francisco, CA, United States.

Hill, J. (2008). Discussion of research using propensity-score matching: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine. Statistics in Medicine, 27(12), 2055–2061. https://doi.org/10.1002/sim.3245 DOI: https://doi.org/10.1002/sim.3245

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60(4), 523–531. https://doi.org/10.1177/00131640021970691 DOI: https://doi.org/10.1177/00131640021970691

Howard, K. E., Romero, M., Scott, A., & Saddler, D. (2015). Success after failure: Academic effects and psychological implications of early universal algebra policies. Journal of Urban Mathematics Education, 8(1). https://doi.org/10.21423/jume-v8i1a248 DOI: https://doi.org/10.21423/jume-v8i1a248

Hughes, G. D., Onwuegbuzie, A. J., Daniel, L. G., & Slate, J. R. (2010). APA Publication Manual changes: Impacts on research reporting in the social sciences. Research in the Schools, 17(1), viii–xix.

Irvin, M., Byun, S. Y., Smiley, W. S., & Hutchins, B. C. (2017). Relation of opportunity to learn advanced math to the educational attainment of rural youth. American Journal of Education, 123(3), 475–510. https://doi.org/10.1086/691231 DOI: https://doi.org/10.1086/691231

Johnson, R. B., & Christensen, L. (2019). Educational research: Quantitative, qualitative, and mixed approaches. SAGE.

Journal of Urban Mathematics Education. (n.d.-a). Policies and procedures. Retrieved November 1, 2019, from https://jume-ojs-tamu.tdl.org/jume/index.php/jume/policiesandprocedures

Journal of Urban Mathematics Education. (n.d.-b). About the journal. Retrieved November 1, 2019, from https://journals.tdl.org/jume/index.php/jume/about

Kesselman, H. J., Huberty, C. J., Lix, L. M., Olejnik, S., Cribbie, R. A., Donahue, B., & Levin, J. R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68(3), 350–386. https://doi.org/10.3102/00346543068003350 DOI: https://doi.org/10.3102/00346543068003350

Kraha, A., Turner, H., Nimon, K., Zientek, L., & Henson, R. (2012). Tools to support interpreting multiple regression in the face of multicollinearity. Frontiers in Psychology, 3, 44. https://doi.org/10.3389/fpsyg.2012.00044 DOI: https://doi.org/10.3389/fpsyg.2012.00044

Kwok, O., Underhill, A., Berry, J. W., Luo, W., Elliott, T., & Yoon, M. (2008). Analyzing longitudinal data with multilevel models: An example with individuals living with lower extremity intra-articular fractures. Rehabilitation Psychology, 53(3), 370–386. https://doi.org/10.1037/a0012765 DOI: https://doi.org/10.1037/a0012765

Lee, L. S. (2018). Success of online mathematics courses at the community college level. Journal of Mathematics Education, 11(3), 69–89. https://doi.org/10.26711/007577152790033

Lekwa, A. J., Reddy, L. A., Dudek, C. M., & Hua, A. N. (2019). Assessment of teaching to predict gains in student achievement in urban schools. School Psychology, 34(3), 271–280. https://doi.org/10.1037/spq0000293 DOI: https://doi.org/10.1037/spq0000293

Lissitz, R. W., & Samuelson, K. (2007). A suggested change in terminology and emphasis regarding validity and education. Educational Researcher, 36(8), 437–448. https://doi.org/10.3102/0013189X07311286 DOI: https://doi.org/10.3102/0013189X07311286

Little, R. J. (1988). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 83(404), 1198–1202. https://doi.org/10.1080/01621459.1988.10478722 DOI: https://doi.org/10.1080/01621459.1988.10478722

Matthews, J. S. (2018). When am I ever going to use this in the real world? Cognitive flexibility and urban adolescents’ negotiation of the value of mathematics. Journal of Educational Psychology, 110(5), 726–746. http://doi.org/10.1037/edu0000242 DOI: https://doi.org/10.1037/edu0000242

Maxwell, J. A. (2004). Causal explanation, qualitative research, and scientific inquiry in education. Educational Researcher, 33(2), 3–11. https://doi.org/10.3102%2F0013189X033002003 DOI: https://doi.org/10.3102/0013189X033002003

McCoach, D. B. (2010). Hierarchical linear modeling. In G. R. Hancock, R. O. Mueller, & L. M. Stapleton (Eds.), The reviewer’s guide to quantitative methods in the social sciences (pp. 123–140). Routledge.

Millsap, R. E. (2011). Statistical approaches to measurement invariance. Routledge. DOI: https://doi.org/10.4324/9780203821961

Morales-Chicas, J., & Agger, C. (2017). The effects of teacher collective responsibility on the mathematics achievement of students who repeat algebra. Journal of Urban Mathematics Education, 10(1), 52–73. https://doi.org/10.21423/jume-v10i1a287 DOI: https://doi.org/10.21423/jume-v10i1a287

Morgan, P. L., Frisco, M. L., Farkas, G., & Hibel, J. (2010). A propensity score matching analysis of the effects of special education services. Journal of Special Education, 43(4), 236–254. https://doi.org/10.1177/0022466908323007 DOI: https://doi.org/10.1177/0022466908323007

Onwuegbuzie, A. J., & Daniel, L. G. (2005). Evidence-based guidelines for publishing articles in Research in the Schools and beyond. Research in the Schools, 12(2), 1–11.

Osborne, J. W. (2013). Best practices in data cleaning: A complete guide to everything you need to do before and after collecting your data. SAGE. DOI: https://doi.org/10.4135/9781452269948

Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556. https://doi.org/10.3102/00346543074004525 DOI: https://doi.org/10.3102/00346543074004525

Primi, C., Morsanyi, K., Donati, M. A., Galli, S., & Chiesi, F. (2017). Measuring probabilistic reasoning: The construction of a new scale applying item response theory. Journal of Behavioral Decision Making, 30(4), 933–950. https://doi.org/10.1002/bdm.2011 DOI: https://doi.org/10.1002/bdm.2011

Quintana, S. M., & Minami, T. (2006). Guidelines for meta-analyses of counseling psychology research. The Counseling Psychologist, 34(6), 839–877. https://doi.org/10.1177/0011000006286991 DOI: https://doi.org/10.1177/0011000006286991

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage.

Reise, S. P., Ainsworth, A. T., & Haviland, M. G. (2005). Item response theory: Fundamentals, applications, and promise in psychological research. Current Directions in Psychological Science, 14(2), 95–101. https://doi.org/10.1111/j.0963-7214.2005.00342.x DOI: https://doi.org/10.1111/j.0963-7214.2005.00342.x

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. https://doi.org/10.1093/biomet/70.1.41 DOI: https://doi.org/10.1093/biomet/70.1.41

Sadikovic, S., Milovanovic, I., & Oljaca, M. (2018). Another psychometric proof of the Abbreviated Math Anxiety Scale usefulness: IRT analysis. Primenjena Psihologija, 11(3), 301–323. https://doi.org/10.19090/pp.2018.3.301-323 DOI: https://doi.org/10.19090/pp.2018.3.301-323

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989X.7.2.147 DOI: https://doi.org/10.1037/1082-989X.7.2.147

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin Company.

Smith, P. A., & Hoy, W. K. (2007). Academic optimism and student achievement in urban elementary schools. Journal of Educational Administration, 45(5), 556–568. https://doi.org/10.1108/09578230710778196 DOI: https://doi.org/10.1108/09578230710778196

Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). Pearson.

Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9(2), 165–181. https://doi.org/10.1177/095935439992006 DOI: https://doi.org/10.1177/095935439992006

Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32. https://doi.org/10.3102/0013189X031003025 DOI: https://doi.org/10.3102/0013189X031003025

Vacha-Haase, T., Henson, R. K., & Caruso, J. C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62(4), 562–569. https://doi.org/10.1177/0013164402062004002 DOI: https://doi.org/10.1177/0013164402062004002

Vacha-Haase, T., Ness, C., Nilsson, J., & Reetz, D. (1999). Practices regarding reporting of reliability coefficients: A review of three journals. Journal of Experimental Education, 67(4), 335–341. https://doi.org/10.1080/00220979909598487 DOI: https://doi.org/10.1080/00220979909598487

Vogler, A. M., Prediger, S., Quasthoff, U., & Heller, V. (2018). Students’ and teachers’ focus of attention in classroom interaction — Subtle sources for the reproduction of social disparities. Mathematics Education Research Journal, 30(3), 299–323. https://doi.org/10.1007/s13394-017-0234-2 DOI: https://doi.org/10.1007/s13394-017-0234-2

Valero P. (2008). In between the global and the local: The politics of mathematics education reform in a globalized society. In B. Atweh, A. C. Barton, M. C. Borba, N. Gough, C. Keitel, C. Vistro-Yu, & R. Vithal (Eds.), Internationalisation and Globalisation in Mathematics and Science Education (pp. 421–439). Springer. https://doi.org/10.1007/978-1-4020-5908-7_23 DOI: https://doi.org/10.1007/978-1-4020-5908-7_23

Woltman, H., Feldstain, A., MacKay, J. C., & Rocchi, M. (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69. https://doi.org/10.20982/tqmp.08.1.p052 DOI: https://doi.org/10.20982/tqmp.08.1.p052

Young, D. J. (1997, March 24–28). A Multilevel Analysis of Science and Mathematics Achievement [Paper presentation]. Annual Meeting of the American Educational Research Association, Chicago, IL, United States.

Young, J. R., Young, J., Hamilton, C., & Pratt, S. (2019). Evaluating the effects of professional development on urban mathematics teachers TPACK using confidence intervals. REDIMAT – Journal of Research in Mathematics Education, 8(3), 312–338. http://doi.org/10.17583/redimat.2019.3065 DOI: https://doi.org/10.17583/redimat.2019.3065

Zientek, L. R., Capraro, M. M., & Capraro, R. M. (2008). Reporting practices in quantitative teacher education research: One look at the evidence cited in the AERA panel report. Educational Researcher, 37(4), 208–216. https://doi.org/10.3102/0013189X08319762 DOI: https://doi.org/10.3102/0013189X08319762

Zimney, G. H. (1961). Method in experimental psychology. Ronald Press. DOI: https://doi.org/10.1037/14006-000