-
3265
-
2352
-
1520
-
1224
-
1025
Automatic Scoring System for English Writing Based on Natural Language Processing: Assessment of Accuracy and Educational Effect
DOI:
https://doi.org/10.30564/fls.v6i6.7135Abstract
This study investigates the effectiveness and educational impact of a novel NLP-based English writing auto-scoring system. Utilizing advanced machine learning techniques, including BERT and Graph Neural Networks, the system demonstrates high consistency with human raters (Quadratic Weighted Kappa of 0.92) across multiple dimensions of writing quality. A longitudinal study involving 500 students over a 16-week semester revealed significant improvements in writing abilities, with the most substantial gains observed in grammar and mechanics (28.5% increase) and organization and structure (23.7% increase). Through comprehensive system evaluation using multiple metrics, including Adjacent Agreement Rate and Root Mean Square Error, our system consistently outperformed existing baseline approaches, including commercial off-the-shelf solutions. The implementation of our system significantly enhanced teacher efficiency, reducing essay grading time by 62% and increasing time for individualized feedback by 45%. The system's architecture integrates cutting-edge NLP technologies with a user-friendly interface, facilitating real-time feedback and adaptive assessment capabilities. Our evaluation framework encompasses both technical accuracy and educational effectiveness, addressing a critical gap in current literature. While the system shows limitations in assessing highly creative writing and faces potential risks of student gaming, its overall impact on writing instruction and assessment is overwhelmingly positive. The study demonstrates that NLP-based auto-scoring systems can effectively scale writing assessment, provide timely feedback, and potentially democratize access to high-quality writing instruction. These findings suggest a path toward more efficient, personalized, and equitable writing education.
Keywords:
NLP-Based Auto-Scoring; English Writing Assessment; Educational Technology; Machine Learning in EducationReferences
[1] Page, E.B., 2003. Project Essay Grade: PEG. In: Shermis, M.D., Burstein, J. (Eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates: Mahwah, NJ, USA. pp. 43−54.
[2] Devlin, J., Chang, M.W., Lee, K., et al., 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Minneapolis, MN, USA, 2−7 June 2019. pp. 4171−4186.
[3] Brown, T.B., Mann, B., Ryder, N., et al., 2020. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33; Virtual, 6−12 December 2020; pp. 1877−1901.
[4] Shermis, M.D., Burstein, J. (Eds.), 2013. Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge: New York, NY, USA. pp. 1−398.
[5] Taghipour, K., Ng, H.T., 2016. A Neural Approach to Automated Essay Scoring. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; Austin, TX, USA, 1−5 November 2016. pp. 1882−1891.
[6] Rodriguez, P.U., Jauregi, A., Zubizarreta, A., 2019. Automated Essay Scoring with Pre-trained Language Models: A Comparative Study. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; July 28−August 2, 2019; Florence, Italy. pp. 2174−2184.
[7] Madnani, N., Loukina, A., Cahill, A., 2017. A Large Scale Quantitative Analysis of Sources of Grammatical Error in Student Writing. Journal of Writing Research. 9(2), 183−218.
[8] Zehner, F., Sälzer, C., Goldhammer, F., 2016. Automatic Coding of Short Text Responses via Clustering in Educational Assessment. Educational and Psychological Measurement. 76(2), 280−303.
[9] Liu, J., Xu, Y., Zhao, L., 2019. Automated Essay Scoring based on Two-Stage Learning. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics; July 28−August 2, 2019; Florence, Italy. pp. 2778−2788.
[10] Yan, D., Fu, J., Du, X., 2020. A Graph-based Neural Network Approach to Automated Essay Scoring. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing; Virtual, 16−20 November 2020. pp. 1178−1189.
[11] Burstein, J., Tetreault, J., Madnani, N., 2013. The E-Rater Automated Essay Scoring System. In: Shermis, M.D., Burstein, J. (Eds.). Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge: New York, NY, USA. pp. 55−67.
[12] Landauer, T. K., Laham, D., Foltz, P.W., 2003. Automated Scoring and Annotation of Essays with the Intelligent Essay Assessor. In: Shermis, M.D., Burstein, J.C. (Eds.). Automated Essay Scoring: A Cross-Disciplinary Perspective. Lawrence Erlbaum Associates: Mahwah, NJ, USA. pp. 87−112.
[13] Allen, L.K., Jacovina, M.E., McNamara, D.S., 2016. Computer-Based Writing Instruction. In: MacArthur, C.A., Graham, S., Fitzgerald, J. (Eds.). Handbook of Writing Research, 2nd ed. The Guilford Press: New York, NY, USA. pp. 316−329.
[14] Crossley, S.A., Kyle, K., McNamara, D.S., 2019. An NLP-driven, On-Line Tool for Automated Writing Evaluation. In: Crossley, S.A., McNamara, D.S. (Eds.). Adaptive Educational Technologies for Literacy Instruction. Routledge: London, UK. pp. 208−223.
[15] Bridgeman, B., Trapani, C., Attali, Y., 2012. Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity, and Country. Applied Measurement in Education. 25(1), 27−40.
[16] Madnani, N., Heilman, M., Tetreault, J., et al., 2019. Debiasing Automated Essay Scoring Models. Journal of Educational Measurement. 56(3), 669−688.
[17] Williamson, D.M., Xi, X., Breyer, F.J., 2012. A Framework for Evaluation and Use of Automated Scoring. Educational Measurement: Issues and Practice. 31(1), 2−13.
[18] Shermis, M.D., Hamner, B., 2013. Contrasting State-of-the-Art Automated Scoring of Essays. In: Shermis, M.D., Burstein, J. (Eds.). Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge: New York, NY, USA. pp. 313−346.
[19] Gebril, A., Plakans, L., 2014. Investigating Source Use, Discourse Features, and Process in Integrated Writing Tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment. 12, 47−84.
[20] Yannakoudakis, H., Cummins, R., 2015. Evaluating the Performance of Automated Text Scoring Systems. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications; Denver, CO, USA, 4 June 2015. pp. 213−223.
[21] Wilson, J., Roscoe, R.D., 2020. Automated Writing Evaluation and Feedback: Multiple Metrics of Efficacy. Journal of Educational Computing Research. 58(1), 87−125.
[22] Chapelle, C.A., Chung, Y.R., 2010. The Promise of NLP and Speech Processing Technologies in Language Assessment. Language Testing. 27(3), 301−315.
[23] Li, Z., Link, S., Hegelheimer, V., 2015. Rater Performance in a Web-Based, Data-Rich Environment for ESL Writing Assessment: A Case Study. Assessing Writing. 26, 29−41.
[24] Warschauer, M., Grimes, D., 2008. Automated Writing Assessment in the Classroom. Pedagogies: An International Journal. 3(1), 22−36.
[25] Stevenson, M., Phakiti, A., 2014. The Effects of Computer-Generated Feedback on the Quality of Writing. Assessing Writing. 19, 51−65.
[26] Wilson, J., Czik, A., 2016. Automated Essay Evaluation Software in English Language Arts Classrooms: Effects on Teacher Feedback, Student Motivation, and Writing Quality. Computers & Education. 100, 94−109.
[27] Wang, L., Smith, B., 2021. Deep Learning Approaches for Automated Essay Scoring: A Systematic Review. Journal of Educational Computing Research. 59(4), 692−721.
[28] Johnson, R.M., Davis, K.L., Thompson, A.J., 2022. Transformer Models for Automated Writing Assessment: A Comparative Analysis. Computers & Education. 176, 104341.
[29] Brown, C.M., Taylor, P.J., 2020. Integrating Natural Language Processing and Machine Learning for Essay Evaluation. International Journal of Artificial Intelligence in Education. 30(2), 237−265.
[30] Chen, H., Liu, X., Wang, Y., 2023. Multi-dimensional Assessment of English Writing Using Deep Learning Models. Computer Assisted Language Learning. 36(3), 567−589.
[31] Zhang, W., Lee, K., 2021. Real-time Feedback Generation for Online Writing Assessment: A Neural Network Approach. Journal of Writing Research. 13(1), 45−67.
[32] Graham, S., Perin, D., 2007. A meta-analysis of writing instruction for adolescent students. Journal of Educational Psychology. 99(3), 445−476.
[33] Wilson, J.M., Hartman, E., Kuhn, S., 2021. Unpacking Teacher Workload: A Critical Review of Research on Teacher Stress and Workload. Review of Educational Research. 91(2), 279−314.
[34] Deane, P., 2013. On the Relation Between Automated Essay Scoring and Modern Views of the Writing Construct. Assessing Writing. 18(1), 7−24.
[35] Chapelle, C.A., Voss, E., 2016. 20 years of technology and language assessment in Language Learning & Technology. Language Learning & Technology. 20(2), 116−128.
Downloads
How to Cite
Issue
Article Type
License
Copyright © 2024 Moussa Diagne Faye, Vini Yves Bernadin Loyara, Amadou Keita, Mamadou Diop, Angelbert Chabi Biaou, Mahamadou Koita, Hamma Yacouba
This is an open access article under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.