论文标题

日本构图评分和编写辅助系统设计的机器学习方法

Machine learning approach of Japanese composition scoring and writing aided system's design

论文作者

Huang, Wanhong

论文摘要

对于任何语言,自动评分系统都非常复杂。因为自然语言本身是一个复杂的模型。当我们评估自然语言生成的文章时,我们需要查看许多维度的文章,例如单词特征,语法特征,语义特征,文本结构等。即使是人类,有时也无法准确地对构图进行分级,因为不同的人对同一文章有不同的看法。但是,组成评分系统可以极大地帮助语言学习者。它可以使语言在输出事物的过程中提高自己。尽管机器仍然很难直接评估语义和务实水平的构图,尤其是对于高环境文化中日语,中文和其他语言,但我们可以使机器评估单词和语法水平的段落,这可以在组成评估者或语言学习者的帮助下进行评估。特别是对于外语学习者,词汇和句法内容通常是他们更关心的内容。在我们的实验中,我们进行了以下操作:1)我们使用单词分割工具和词典来实现文章的单词分割,并提取单词特征,并生成单词的“复杂性”特征。弓技术用于提取主题功能。 2)我们设计了Turing-Complete Automata型号,并为出现在JLPT检查中的语法上创建了300多个自动机。并通过使用这些自动机提取语法功能。 3)我们提出了一种评分构图主题的统计方法,最终分数将取决于提交给系统的所有著作。 4)我们为语言更精简设计语法提示功能,以便他们可以知道当前可以使用哪些语法。

Automatic scoring system is extremely complex for any language. Because natural language itself is a complex model. When we evaluate articles generated by natural language, we need to view the articles from many dimensions such as word features, grammatical features, semantic features, text structure and so on. Even human beings sometimes can't accurately grade a composition because different people have different opinions about the same article. But a composition scoring system can greatly assist language learners. It can make language leaner improve themselves in the process of output something. Though it is still difficult for machines to directly evaluate a composition at the semantic and pragmatic levels, especially for Japanese, Chinese and other language in high context cultures, we can make machine evaluate a passage in word and grammar levels, which can as an assistance of composition rater or language learner. Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about. In our experiments, we did the follows works: 1) We use word segmentation tools and dictionaries to achieve word segmentation of an article, and extract word features, as well as generate a words' complexity feature of an article. And Bow technique are used to extract the theme features. 2) We designed a Turing-complete automata model and create 300+ automatons for the grammars that appear in the JLPT examination. And extract grammars features by using these automatons. 3) We propose a statistical approach for scoring a specify theme of composition, the final score will depend on all the writings that submitted to the system. 4) We design an grammar hint function for language leaner, so that they can know currently what grammars they can use.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源