以印度语言实施基于深度学习的方法来摘要

论文标题

以印度语言实施基于深度学习的方法来摘要

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

论文作者

Tangsali, Rahul, Pingle, Aabha, Vyawahare, Aditya, Joshi, Isha, Joshi, Raviraj

论文摘要

由于相关数据集的可用性，有关低资源印度语言的文本摘要的研究受到限制。本文介绍了用于ILSUM 2022 INDEM语言摘要数据集的各种深度学习方法的摘要。 ISUM 2022数据集分别包含印度英语，印地语和古吉拉特语及其基础摘要的新闻文章。在我们的工作中，我们探索了不同的预训练的SEQ2SEQ模型，并用ILSUM 2022数据集微调了这些模型。在我们的案例中，微调的Sota Pegasus模型是英语最佳的，是印地语的增强数据的微调IndiChbart模型，以及对Gujarati的基于翻译的映射方法。我们使用Rouge-1，Rouge-2和Rouge-4作为评估指标评估了我们对所获得的推论的分数。

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题