The Study on Sentence Difficulty of the Chinese Components

Authors

  • Jia-Fei Hong

    Department of Chinese as a Second Language, National Taiwan Normal University, Taipei 106, Taiwan

  • Jia-Ni Chen

    Department of Chinese as a Second Language, National Taiwan Normal University, Taipei 106, Taiwan

  • Yao-Ting Sung

    Department of Educational Psychology and Counseling, National Taiwan Normal University, Taipei 106, Taiwan

DOI:

https://doi.org/10.30564/fls.v6i4.6737
Received: 11 June 2024 | Revised: 29 July 2024 | Accepted: 31 July 2024 | Published Online: 14 October 2024

Abstract

In recent years, reading comprehension has gradually become a proficiency indicator of interest in lexical and grammar. As sentences are the basic units of discourse structure, sentence difficulty is often applied to the study of text difficulty. Although there have been a number of studies on sentence difficulty, the lack of consistency in the indicators chosen or the discussion of specific grammatical issues have limited the research on sentence difficulty. Therefore, this study adopts a corpus-based approach, using a corpus as an objective and scientific data source. The study utilizes the Digital Platform for Chinese Grammar and the 8000 Chinese Words as important reference sources. Additionally, the CRIE 3.0 is employed to validate the texts and establish sentence difficulty indicators. However, due to the incomplete development of certain indicators in the "Chinese Grammar Digital Platform", the study refers to the Chinese Proficiency Grading Standards for International Chinese Language Education and Hanyu Shuiping Kaoshi to establish comprehensive sentence structure and sentence component difficulty indicators. Subsequently, the established difficulty indicators are validated by conducting comparative analyses using corpora as the basis. Native speaker corpora are used as benchmarks, while Mandarin learner corpora are used for comparison, and then validate objectively through the machine learning model. These validation aims to examine the validity and reliability of the selected indicators and establish a calculation method involving " level of grammar * point distribution ratio of grammar " to determine the difficulty indicators for Chinese sentences, Additionally, expert reliability is accessed to ensure the credibility of indicators.

Keywords:

Corpus-based; Sentence difficulty; Indicators; Lexical complexity; Syntactic complexity

References

Arya, D.J., Hiebert, E.H., Pearson, P.D., 2011. The effects of syntactic and lexical complexity on the comprehension of elementary science texts. International Electronic Journal of Elementary Education. 4(1), 107–125.

Brown, H.D., 1971. Children’s Comprehension of Relativized English Sentences. Child Development. 42(6), 1923–1936.

Cheng, C.C., 2005. Cihui yuyi yu juzi yuedu nanyidu jiliang [〈詞彙語義與句子閱讀難易度計量〉Semantics and sentence reading difficulty measurement.]. In Proceedings of the Sixth Chinese Lexical Semantics Workshop. Xiamen, China, April 21.

Chen, K.J., Bai, M.H., 1998. Unknown Word Detection for Chinese by a Corpus-based Learning Method. Computational Linguistics and Chinese Language Processing. 3(1), 27–44.

Dale, E., Chall, J.S., 1948. A formula for predicting readability: Instructions. Educational research bulletin. 37–54.

Devlin, J., Chang, M.W., Lee, K., et al., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Dubay, W.H., 2004. The principles of readability. Costa Mesa, CA: Impact Information.

Feng, L., Jansche, M., Huenerfauth, M., et al., 2010. A comparison of features for automatic readability assessment. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China. pp. 276–284.

Flesch, R., 1948. A New Readability Yardstick. Journal of Applied Psychology. 32(3), 221–233.

Fry, E.B., 2002. Readability versus leveling. Reading Teacher. 56(3), 286–292.

Gordon, P.C., Hendrick, R., Johnson, M., 2004. Effects of noun phrase type on sentence complexity. Journal of Memory and Language. 51, 97–114.

Gunning, R., 1952. The technique of clear writing. New York: McGraw-Hill.

Hong, J.F., 2021. A Corpus-based Study of Development of the Digital Platform of Chinese Grammar Bank and its Assistant in CSL. Journal of Chinese Language Teaching. 18(1), 59–87.

Hong, J.F., Ahrens, K., Huang, C.R., 2012. Event structure of transitive verb: a MARVS perspective. International Journal of Computer Processing of Languages. 24(01), 37–50.

Hong, J.F., Sung, Y.T., Tseng, H.C., et al., 2016. A multilevel analysis of the linguistic features affecting Chinese text readability. Taiwan Journal of Chinese as a Second Language. 13, 95–126.

Hong, J., Peng, C., Tseng, H., et al., 2020. Linguistic Feature Analysis of CEFR Labeling Reliability and Validity in Language Textbooks. Journal of Technology and Chinese Language Teaching. 11(1), 57–83.

Kenneth, I.F., Susan, M.C., 1973. Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior. 12, 627–635.

Kim, J., Zhou Y., Schiavon S., et al., 2018. Personal comfort models: predicting individuals' thermal preference using occupant heating and cooling behavior and machine learning. Building and Environment. 129, 96–106.

Klare, G.R., 1984. Readability. Handbook of reading research. London: Routledge. Volume 1, pp. 681–744.

Klare, G.R., 2000. The Measurement of Readability: Useful Information for Communicators. Journal of Computer Documentation. 24(3), 11–25.

Laufer, B., Nation, P., 1995. Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics. 16, 307–322.

Lu, X.F., Ai, H.Y., 2015. Syntactic complexity in college-level English writing Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing. 29, 16–27.

Lu, X.F., 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics. 15(4), 474–496.

Pang, C., 2016. Hanyu juzi nanyidu yingxiang yinsu fengxi [〈漢語句子難易度影響因素分析〉Factors of the difficulty of chinese sentence]. Journal of Language and Literature Studies. 1, 18–19.

Shen, Y.L., Tao, W., 2011. The Relative Significance of Vocabulary Breadth and Syntactic Knowledge in the Prediction of Reading Comprehension Test Performance. Chinese Journal of Applied Linguistics. 34(3), 113–126.

Siu, C.T.‐S., Ho, C.S.‐H., 2015. Cross-Language Transfer of Syntactic Skills and Reading Comprehension Among Young Cantonese– English Bilingual Students. Reading Research Quarterly. 50(3), 313–336.

Spache, G., 1953. A new readability formula for primary-grade reading materials. Elementary school journal. 53, 410–413.

Sung, Y.T., Chen, J.L., Cha, J.H., et al., 2015. Constructing and validating readability models: the method of integrating multilevel linguistic features with machine learning. Behavior research methods, 47(2), 340–354.

Sung, Y.T., Chen, J.L., Lee, Y.S., et al., 2013. Investigating Chinese Text Readability: Linguistic Features, Modeling, and Validation. Chinese Journal of Psycology. 55(1), 75–106.

Sun, S.Y., Wan, Y., 2016. Yuedu ceshi nandu de yingxiang yinsu yanjiu ─ ─ cong wenben juzi tuijinfangshi rushou[〈閱讀測試難度的影響因素研究──從文本句子推進方式入手〉A research on factors of reading text test difficulty. ]. Language Planning. 2, 55–58.

Thoma, A.G., Gilbert, K.K., 1967. Influence of Syntactic Errors on Sentence Recognition. Journal of Verbal Learning and Verbal Behavior. 6, 692–698.

Vapnik, V.N., Chervonenkis, A., 1974. Teoriya RaspoznavaniyaObrazov: Statisticheskie Problemy Obucheniya [Theory of pattern recognition: Statistical problems of learning]. Moscow: Nauka.

Wang, C.C., Williams, J., 2024. Utilizing Chat GPT in the Development of CSP Textbooks: A Case Study of Teacher-student Collaboration in a One-on-one Classroom. 4(1), 53–96.

Wu, S.Y., Yu, D., Jiang, X., 2020. Development of Linguistic Features System for Chinese Text Readability Assessment and Its Validity Verification. Chinese Teaching in The World. 1, 81–97.

Yang, Z., Yang, D., Dyer, C., et al., 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, San Diego, CA, USA. pp. 1480–1489.

Ye, W.X., Qiu, L.K., 2008. Hanyu fuheci lijie nanyidu de jisuan [〈漢語複合詞理解難易度的計算〉Computing the degree of readability in understanding Chinese compounds.]. Language and Linguistics. 9(2), 435–447.

Downloads

How to Cite

Hong, J.-F., Chen, J.-N., & Sung, Y.-T. (2024). The Study on Sentence Difficulty of the Chinese Components. Forum for Linguistic Studies, 6(4), 434–448. https://doi.org/10.30564/fls.v6i4.6737

Issue

Article Type

Article