Automated Assessment of Text Complexity through the Fusion of AutoML and Psycholinguistic Models

Herianah Herianah; Engeline Chelsya Setiawan; Adri Adri; Tamrin Tamrin; Loso Judijanto; Diah Supatmiwati; Djoko Sutrisno; Musfeptial Musfeptial; Wahyu Damayanti; Martina Martina

doi:10.30564/fls.v7i3.8788

Authors

Herianah Herianah
National Research and Innovation Agency, Jakarta Pusat, Indonesia
Engeline Chelsya Setiawan
Department of Health Policy and Administration, Universitas Airlangga, Surabaya, Indonesia
Adri Adri
National Research and Innovation Agency, Jakarta Pusat, Indonesia
Tamrin Tamrin
National Research and Innovation Agency, Jakarta Pusat, Indonesia
Loso Judijanto
IPOSS Jakarta, Indonesia
Diah Supatmiwati
English Literature Study Program, Universitas Bumigora, Mataram, Indonesia
Djoko Sutrisno
Universitas Ahmad Dahlan, Yogyakarta, Indonesia
Musfeptial Musfeptial
National Research and Innovation Agency, Jakarta Pusat, Indonesia
Wahyu Damayanti
National Research and Innovation Agency, Jakarta Pusat, Indonesia
Martina Martina
National Research and Innovation Agency, Jakarta Pusat, Indonesia

DOI:

https://doi.org/10.30564/fls.v7i3.8788

Received: 18 February 2025 | Revised: 24 February 2025 | Accepted: 26 February 2025 | Published Online: 27 February 2025

Abstract

The complexity of written texts poses significant challenges for comprehension, impacting education, literacy, and communication across various fields. As the demand for advanced text assessment tools grows, this study aims to integrate Automated Machine Learning (AutoML) with psycholinguistic models to enhance the automated assessment of text complexity, ultimately improving educational practices and content development. A mixed-methods approach combined the quantitative analysis of text complexity metrics with qualitative insights from psycholinguistic models. The AutoML framework automated model selection and hyperparameter tuning, while psycholinguistic features were extracted to inform the model. This research addresses a critical gap in existing automated text assessment methods, which often lack a nuanced understanding of language complexity and rely on simplistic heuristics that fail to capture the intricacies of language. Integrating AutoML and psycholinguistic models offers a more accurate, efficient, and contextually relevant assessment of text complexity, which is crucial for educational tools and content creation. The fusion model achieved an impressive 92% accuracy, outperforming traditional models (77%) and large language models (82%), while demonstrating a rapid response time of 0.5 s, making it suitable for real-time applications. These findings highlight the significant potential of combining AutoML with psycholinguistic insights to enhance automated text complexity assessment. This innovative approach paves the way for improved educational outcomes and more effective communication strategies, offering a promising solution to the challenges of text complexity evaluation in various domains.

Keywords:

Text Complexity Assessment; Automated Machine Learning (AutoML); Psycholinguistic Models; Educational Technology; Natural Language Processing (NLP)

References

[1] Morozov, D.A., Glazkova, A.V., Iomdin, B.L., 2022. Text complexity and linguistic features: Their correlation in English and Russian. Russian Journal of Linguistics. 26(2), 426–448. DOI: https://doi.org/10.22363/2687-0088-30132

[2] Solnyshkina, M., McNamara, D., Zamaletdinov, R., 2022. Natural language processing and discourse complexity studies. Russian Journal of Linguistics. 26(2), 317–341. DOI: https://doi.org/10.22363/2687-0088-30171

[3] Laposhina, A.N., Lebedeva, M.Y., Berlin Khenis, A.A., 2022. Word frequency and text complexity: an eye‐tracking study of young Russian readers. Russian Journal of Linguistics. 26(2), 493–514. DOI: https://doi.org/10.22363/2687-0088-30084

[4] Nadeem, F., Ostendorf, M., 2018. Estimating linguistic complexity for science texts. In Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2018 at the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HTL 2018, New Orleans, Louisiana, 5 June 2018. pp. 45–55. DOI: https://doi.org/10.18653/v1/w18-0505

[5] O'Keefe, H., Rankin, J., Wallace, S.A., et al., 2023. Investigation of text-mining methodologies to aid the construction of search strategies in systematic reviews of diagnostic test accuracy—a case study. Research Synthesis Methods. 14(1), 79–98. DOI: https://doi.org/10.1002/jrsm.1593

[6] Rathje, S., Mirea, D.-M., Sucholutsky, I., et al., 2024. GPT is an effective tool for multilingual psychological text analysis. Proceedings of the National Academy of Sciences of the United States of America. 121(34), e2308950121. DOI: https://doi.org/10.1073/pnas.2308950121

[7] Kaya, Y., Ertuğrul, O.F., 2016. A novel approach for spam email detection based on shifted binary patterns. Security and Communication Networks. 9(10), 1216–1225. DOI: https://doi.org/10.1002/sec.1412

[8] Carlson, K.M., 2011. An acoustic analysis of liquid gemination in the Spanish of Havana, Cuba. Dialectologia. (6), 1–24. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-79958838888&partnerID=40&md5=7a9f88d48fe0a15ca378cf5852c1d1bf

[9] Sutrisno, D., Martina, M., Karsana, D., et al., 2024. Semiotic Analysis of Psycholinguistic Strategies in Indonesian President Candidates' Debates 2024: Unraveling Linguistic Signifiers and Mental Processes in Argumentative Discourse. Forum for Linguistic Studies. 6(5), 943–976. DOI: https://doi.org/10.30564/fls.v6i5.7251

[10] Badal, V.D., Kundrotas, P.J., Vakser, I.A., 2015. Text Mining for Protein Docking. PLOS Computational Biology. 11(12), e1004630. DOI: https://doi.org/10.1371/journal.pcbi.1004630

[11] SenthilKumar, G., Madhusudhana, S., Flitcroft, M., et al., 2023. Automated machine learning (AutoML) can predict 90-day mortality after gastrectomy for cancer. Scientific Reports. 13(1), 11051. DOI: https://doi.org/10.1038/s41598-023-37396-3

[12] Li, M., Sun, J., Tan, X., 2024. Evaluating the effectiveness of large language models in abstract screening: a comparative analysis. Systematic Reviews. 13(1), 219. DOI: https://doi.org/10.1186/s13643-024-02609-x

[13] Castelli, M., Pinto, D.C., Shuqair, S., et al., 2022. The Benefits of Automated Machine Learning in Hospitality: A Step-By-Step Guide and AutoML Tool. Emerging Science Journal. 6(6), 1237–1254. DOI: https://doi.org/10.28991/ESJ-2022-06-06-02

[14] Škrlj, B., Schwartz, A., Ferlež, J., et al., 2022. Dynamic Surrogate Switching: Sample-Efficient Search for Factorization Machine Configurations in Online Recommendations. In RecSys 2022 - Proceedings of the 16th ACM Conference on Recommender Systems. Association for Computing Machinery, Inc.: New York, NY, USA. pp. 472–475. DOI: https://doi.org/10.1145/3523227.3547384

[15] Muller, A.E., Ames, H.M.R., Jardim, P.S.J., 2022. Machine learning in systematic reviews: Comparing automated text clustering with Lingo3G and human researcher categorization in a rapid review. Research Synthesis Methods. 13(2), 229–241. DOI: https://doi.org/10.1002/jrsm.1541

[16] Huang, J., Chen, G., Liu, H., et al., 2024. MRI-based automated machine learning model for preoperative identification of variant histology in muscle-invasive bladder carcinoma. European Radiology. 34(3), 1804–1815. DOI: https://doi.org/10.1007/s00330-023-10137-w

[17] Müller, M., 2021. MyPad as a reflection of multimodal action in elementary school children's foreign language learning. Journal of Language and Linguistic Studies. 17(2), 675–685. DOI: https://doi.org/10.52462/jlls.46

[18] Tellez, E.S., Moctezuma, D., Miranda-Jiménez, S., et al., 2018. An automated text categorization framework based on hyperparameter optimization. Knowledge-based Systems. 149, 110–123. DOI: https://doi.org/10.1016/j.knosys.2018.03.003

[19] Shynkaruk, V., Kharchenko, S., 2020. Communicative-functional potential of incentive modality in psycholinguistic dimension. Psycholinguist. 28(2), 183–203. DOI: https://doi.org/10.31470/2309-1797-2020-28-2-183-203

[20] Yung, F., Duh, K., Komura, T., et al., 2017. A psycholinguistic model for the marking of discourse relations. Dialog and Discourse. 8(1), 106–131. DOI: https://doi.org/10.5087/dad.2017.104

[21] Zhang, L., Dilmore, R., Huerta, N., et al., 2018. Application of a new reduced-complexity assessment tool to estimate CO2 and brine leakage from reservoir and above-zone monitoring interval (AZMI) through an abandoned well under geologic carbon storage conditions. Greenhouse Gases: Science and Technology. 8(5), 839–853. DOI: https://doi.org/10.1002/ghg.1813

[22] Martina, M., Duli, A., Adi Armin, M., et al., 2025. Exploring the Semantic Meaning of Place Names in Describing the Characteristics of the Pontianak City Region as a National Mapping Effort. Forum for Linguistic Studies. 7(1), 244–259. DOI: https://doi.org/10.30564/fls.v7i1.7609

[23] Cherif, W., Madani, A., Kissi, M., 2021. Text categorization based on a new classification by thresholds. Progress in Artificial Intelligence. 10(4), 433–447. DOI: https://doi.org/10.1007/s13748-021-00247-1

[24] Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., et al., 2020. Implementing autoML in educational data mining for prediction tasks. Applied Sciences. 10(1), 90. DOI: https://doi.org/10.3390/app10010090

[25] Sutrisno, D., Susanti, A., 2024. Integrating AI Technology to Optimize Learning for SD Muhammadiyah Kebumen. Global Synthesis in Education Journal. 1(3), 1–49. Available from: https://gse-journal.net/index.php/gse/article/view/38

[26] Garmpis, S., Maragoudakis, M., Garmpis, A., 2022. Assisting Educational Analytics with AutoML Functionalities. Computers. 11(6), 97. DOI: https://doi.org/10.3390/computers11060097

[27] Gao, H., Zhang, Q., Bu, X., et al., 2024. Quadruple parameter adaptation growth optimizer with integrated distribution, confrontation, and balance features for optimization. Expert Systems with Applications. 235, 121218. DOI: https://doi.org/10.1016/j.eswa.2023.121218

[28] Aghalarova, S., Keser, S.B., 2022. Application of AutoML Technique for Predicting Academic Performance of Students. El-Cezeri Journal of Science and Engineering. 9(2), 394–412. DOI: https://doi.org/10.31202/ecjse.946505

[29] Morozova, M.S., Ovsjannikova, M.A., Rusakov, A.Y., 2020. Albanian dialects in the light of language contact: A quantitative study of loanwords. Acta Linguistica Petropolitana. 16(2), 275–305. DOI: https://doi.org/10.30842/alp2306573716210

[30] Wang, R., Bai, L., Rayhana, R., et al., 2024. Federated autoML learning for community building energy prediction. SPIE. Available from: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/12952/3012012/Federated-autoML-learning-for-community-building-energy-prediction/10.1117/12.3012012.short

[31] Wang, A., Ju, Y., Bi, C., 2022. Scientometric analysis of researches on tai chi and health promotion based on literature from 1991 to 2021. Annals of Palliative Medicine. 11(12), 3648–3662. DOI: https://doi.org/10.21037/apm-22-843

[32] Senthil Kumar, S., Baskaran, T.S., 2024. Classification of Chronic Kidney Disease in Adults Using Enhanced Recurrent Neural Networks. International Journal of Intelligent Systems and Applications in Engineering. 12(7s), 191–200. Available from: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85181486283&partnerID=40&md5=68ebd7c2f585f3dca7826c1713605f5d

[33] Tian, H., et al. 2024. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images. Proceedings of the AAAI Conference on Artificial Intelligence. 12(1), 1–4. DOI: https://doi.org/10.1145/3573051.3593392

[34] Li, Y., 2024. Progress in the Application of Deep Learning in Natural Language Processing and its Impact on English Teaching Translation Software System. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence and Autonomous Robot Systems, AIARS 2024, Bristol, UK, 29–31 July 2024; pp. 134–137. DOI: https://doi.org/10.1109/AIARS63200.2024.00030

[35] Bunnell, B.E., Davidson, T.M., Winkelmann, J.R., et al., 2019. Implementation and Utility of an Automated Text Messaging System to Facilitate Symptom Self-Monitoring and Identify Risk for Post-Traumatic Stress Disorder and Depression in Trauma Center Patients. Telemedicine and e-Health. 25(12), 1198–1206. DOI: https://doi.org/10.1089/tmj.2018.0170

[36] Kuusalo, L., Sokka-Isler, T., Kautiainen, H., et al., 2020. Automated Text Message–Enhanced Monitoring Versus Routine Monitoring in Early Rheumatoid Arthritis: A Randomized Trial. Arthritis Care & Research. 72(3), 319–325. DOI: https://doi.org/10.1002/acr.23846

[37] Bruzón, A.G., et al., 2021. Landslide susceptibility assessment using an automl framework. International Journal of Environmental Research and Public Health. 18(20), 10971. DOI: https://doi.org/10.3390/ijerph182010971

[38] Das, J., 2020. An analytical studies on the psycholinguistics. International Journal on Advanced Science, Engineering and Information Technology. 29(3), 9026–9029. Available: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85084220503&partnerID=40&md5=269e6cd64039af89f8c6c5dc08b2bf4a

[39] Cheong, R.C.T., Jawad, S., Adams, A., et al., 2024. Enhancing paranasal sinus disease detection with AutoML: efficient AI development and evaluation via magnetic resonance imaging. European Archives of Oto-Rhino-Laryngology. 281(4), 2153–2158. DOI: https://doi.org/10.1007/s00405-023-08424-9

[40] Wang, W., 2024. Evaluation method of immersive situational interpretation teaching effect based on natural language processing. International Journal of Continuing Engineering Education and Life-Long Learning. 34(5), 527–539. DOI: https://doi.org/10.1504/IJCEELL.2024.140718

[41] SenthilKumar, G., Merrill, J., Maduekwe, U.N., et al., 2023. Prediction of Early Recurrence Following CRS/HIPEC in Patients With Disseminated Appendiceal Cancer. Journal of Surgical Research. 292, 275–288. DOI: https://doi.org/10.1016/j.jss.2023.06.054

Volume 8 | 2026

Vol.8 Iss.1

Volume 7 | 2025

Vol.7 Iss.12

Vol.7 Iss.11

Vol.7 Iss.10

Vol.7 Iss.9

Vol.7 Iss.8

Vol.7 Iss.7

Vol.7 Iss.6

Vol.7 Iss.5

Vol.7 Iss.4

Vol.7 Iss.3

Vol.7 Iss.2

Vol.7 Iss.1

Volume 6 | 2024

Vol.6 Iss.6

Vol.6 Iss.5

Vol.6 Iss.4

Vol.6 Iss.3

Vol.6 Iss.2

Vol.6 Iss.1

Volume 5 | 2023

Vol.5 Iss.3

Vol.5 Iss.2

Vol.5 Iss.1

Volume 4 | 2022

Vol.4 Iss.1

Volume 3 | 2021

Vol.3 Iss.1

Volume 2 | 2020

Vol.2 Iss.1

Volume 1 | 2019

Vol.1 Iss.1

Announcements

Forum for Linguistic Studies (FLS) Partners with the International Digital Education Conference (IDEC) 2025

indexing

Automated Assessment of Text Complexity through the Fusion of AutoML and Psycholinguistic Models

Authors

DOI:

Abstract

Keywords:

References

Downloads

How to Cite

Issue

Article Type

License