Dissertation
Dissertation > Language, writing > FOREIGN > English > Language teaching

A Research on General Computerized Commosition Scoring and Feed-Back for College English Teaching in China

Author GeShiLi
Tutor SongRou
School Beijing Language and Culture University
Course Linguistics and Applied Linguistics
Keywords English writing Automated eassy scoring Composition feed-back outputting Natural language processing
CLC H319
Type PhD thesis
Year 2008
Downloads 557
Quotes 0
Download Dissertation

Due to the fact that faculty numbers are out of all proportion to EFL Students in China, composition scoring and feedback system for the teaching practice of college English writing is in dire need. This system should release teachers from huge workload of evaluating compositions and it should boost students writing motivation, so as to enhance their writing ability ultimately. Until now, researches are mainly on scoring compositions of specific prompts, whereas few researches on general computerized composition scoring and feed-back outputting for college English teaching in China.Non-English major college students enjoy a large percentage among English learners in china. Generally speaking, their English writing skills are not good and they need large amount of training to elevate the skill of language using. The number of college English teachers is relatively small, and they have not enough time and energy to correct and score large numbers of compositions. Accordingly, a relatively accurate computerized scoring method can solve big problems.The so-called general refers to a set of approaches which are designed to be applied in scoring non-specific prompt compositions. Within the practice of daily writing exercises and tests in college English writing teaching, various composition prompts can be involved. To undergo language data training for a scoring model in each and every exercise or test will result enormous workload of manual annotation on teachers. Moreover, considering that the sample collection is small and sample features are not statistically significant, the training effect may not be optimal. The two factors affect the feasibility of computerized scoring. Therefore, it is necessary to study a general, computerized scoring system for non-specific prompt compositions.Feedback outputting is even more important than scoring in the teaching practice of college English writing. Since a score shows only the final quality evaluation of a finished composition, whereas valuable feed-back can tell writers the existing problems and make them realize their errors in language using, so as to consciously correct or avoid them in future.Considering the above-mentioned research goals, the present paper analyzes the limitations and solutions of a general, computerized scoring and feed-back outputting for Chinese college English writing. For essay scoring, the first difficulty is scoring reliability. Owing to inherent subjectivity in essay scoring, the best available objective standard is the uniform agreement of several parties on target essay. The second difficulty lies in the aspect of natural language processing, which includes problems in writing language and content. On account of the limitations of natural language processing technology and the research goal of a general scoring method, the research efforts can only center on writing language, with writing content serving as secondary. The last difficulty is that the "inter-language" between Chinese and English, which resulted from English writing of Chinese students, contains large amounts of various types of errors, whereas the available relatively accurate processing techniques are lexical statistic analysis and pattern matching. The key to providing feedback for writing is to provide an accurate feedback on language errors.This research collects writing data from ST3 sub-corpus of CLEC and College English Composition. With the scoring of three experienced raters, 660 compositions are selected, which covers 257 different prompts and 5 score ranges (score ranges 2, 5, 8, 11, and 14), to form the writing collection of present research. The collection is further divided into training set, which includes 440 compositions, and testing set, which includes 220 compositions. The training set is used to construct the computerized scoring and feedback model and the testing set is used for validating the performance of constructed models.This research includes two aspects: computerized scoring and feedback outputting.For computerized scoring, targeting on the research goal of a general non-specific prompt composition scoring with one training multiple using, and based on previous researches and confirmation in present research, the following independent variables are selected: three key lexical features (essay length, lexical diversity, and lexical distribution), one verb-phrase number feature, and the feature of exact phrase using or not. Writing scores are selected as dependent variables. The scoring model is constructed by integration of both multiple regression and feature probabilistic classification. By utilizing testing set to validate the performance of scoring models, research results, such as scoring precision, recall and false rate for various scoring ranges, overall precision and false rate, and scoring result reliability matrix are finally achieved.The research shows that the overall precision is 75.45%, overall false rate is only 10%. The precision for each score range is as high as 100% (for score range 2), with the lowest as 65% at least (for score range 11). The recall for each scoring range is rising obviously with the increase of scores, gradually from 30% for the lowest score range 2 to 94% for the highest score range 14. The false rate presents similar tendency, 0 for score range 2 and 5, 16% for score range 14. Although in considering the research goal of general scoring, only scoring features which are not content related can be selected, this scoring model already possesses the value of reference in daily teaching practice of college English writing.Writing feedback includes two aspects: lexical co-occurrence error and phrase usage error. The recognition of lexical co-occurrence error is based on lexical bigram knowledge extracted from large corpus of English as native language. If two words appear adjacently few or zero times in the large corpus but do appear in students’ compositions, the co-occurrence of these two words will be regarded as a suspected error and will be fed back to teachers and students for final judgment. The recognition of phrase usage error is by the way of studying common phrases in college English writing, constructing usage patterns for phrase errors, and matching these patterns with sentences in compositions, so as to locate the errors in phrases using.In the feedback of error detection, the researcher observes co-occurrence for the most frequently used 1000 words. Bigrams whose co-occurrence frequency is lower than 10 in the large corpus has an error rate of over 70% in students’ compositions while lower than 30, close to 57%. In phrase pattern matching, the recall for phrase recognition by sampling statistics is 84.77% and precision is 96.45%. These results show that, among high-frequency words, both feedback precision and recall are high. As to the fact that high-frequency words are justly the learning basics and key points for non-English major students, therefore, these feedbacks will play an important role in correcting corresponding errors in students language use.Targeting on general scoring, the research mainly focuses on language using in students’ compositions, but writing content is not neglected. Through automatic clustering of compositions under one prompt, a few compositions whose wordings are distinct from most compositions, e.g. writings that are off the prompt, can be discovered. Experiment shows that this method is of certain capability in recognizing writings with similar prompt but different contents.The creativities of this research are as follows:(1) Limitation analysisTo analyzes the limitations of computer in composition scoring and feedback outputting. Feasible solutions are offered for those resolvable problems and reasons are analyzed for those not resolvable which can be a valuable reference for future research.(2) Research goalsTo explore a one training multiple using, general scoring method for inter-language writing without pre-set prompts by non-English major college students; to explore the feasibility and exact method for computerized inter-language writing automatic error detecting and feedback.; to explore the feasibility and exact method for scoring inter-language writing content.(3) Scoring techniquea) Small lexical feature set (length of composition, lexical diversity, and lexical distribution). In lexical distribution, the prompt lines are deleted and word lists adapted. For the goal of general scoring for teaching practice of college English writing, a small but accurate lexical feature set is more pertinent and the effect is good enough.b) Phrase feature included, including the number of verb phrases and the using or not of each phrase. The selection of phrase features is basically not content related. Both features contribute greatly for composition scoring and the use of phrase pattern provides a relatively high precision for studying the use of phrases in students’ compositions.c) Bigram feature of word list one included. The detection of lexical co-occurrence errors for the most frequently used words in students’ compositions is pertinent to the errors in students language use and the error report precision is reasonably high.d) Automatic clustering method for discovering off-the-subject writing adopted.This research shows that computers exceeds human greatly in capabilities such as statistics, matching, and storing. Once appropriate selected application purpose and well-designed method are set up, computers can accomplish work which is seemingly intelligence demanding. Computers enjoy a strong potential in the field of one training multiple using, non-specific prompt inter-language writing scoring.Whereas, on the other side, after collecting various features of writing, the precision for automatic scoring is only 75%. Feedback of error detecting is limited to high-frequency words co-ocurrence and common phrase usage. Even within such narrow scope of study, the feedback precision and recall is relatively limited. The research practice proves that totally automatic computerized scoring for interlanguage writing is far from realistic. This is due to the dual complexity of natural language processing and interlanguage processing. Therefore, the future research direction for scoring of interlanguage compositions should be the studying of the optimum interface between human and machine, which can build on each other’s strength to the largest degree.

Related Dissertations
More Dissertations