论文标题
单词感官歧义和信息提取的自然语言处理
Natural language processing for word sense disambiguation and information extraction
论文作者
论文摘要
这项研究工作涉及自然语言处理(NLP)和以明确形式提取基本信息。信息管理策略中最常见的是文档检索(DR)和信息过滤。 DR Systems可以用作联合收割机的工作,从而从广阔的原材料中带回有用的材料。借助大量潜在有用的信息,信息提取(IE)系统可以通过将原材料进行改进并将其简化为原始文本的细菌来转换原材料。文档检索系统从文本存储库中收集带有所需信息的相关文档。然后,IE系统将它们转换为更容易消化和分析的信息。它隔离了相关的文本片段,从片段中提取相关信息,然后在连贯的框架中将目标信息整合在一起。该论文提出了一种使用词库的新方法,以解决单词感官的歧义。说明性示例支持了这种方法对快速有效的歧义的有效性。已经描述了基于模糊逻辑的文档检索方法,并说明了其应用。提问系统描述了从检索到的文本文档中提取信息的操作。通过使用结构化描述语言(SDL),该过程可大大简化用于回答查询的信息提取的过程,该语言基于WHO,什么,何时,何时何地,何时和原因的问题的红衣主教。该论文以基于Dempster-Shafer证据理论的新策略的介绍,用于文档检索和信息提取。该策略允许放松许多局限性,这是贝叶斯概率方法固有的。
This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.