论文标题

以印度语言实施基于深度学习的方法来摘要

Implementing Deep Learning-Based Approaches for Article Summarization in Indian Languages

论文作者

Tangsali, Rahul, Pingle, Aabha, Vyawahare, Aditya, Joshi, Isha, Joshi, Raviraj

论文摘要

由于相关数据集的可用性,有关低资源印度语言的文本摘要的研究受到限制。本文介绍了用于ILSUM 2022 INDEM语言摘要数据集的各种深度学习方法的摘要。 ISUM 2022数据集分别包含印度英语,印地语和古吉拉特语及其基础摘要的新闻文章。在我们的工作中,我们探索了不同的预训练的SEQ2SEQ模型,并用ILSUM 2022数据集微调了这些模型。在我们的案例中,微调的Sota Pegasus模型是英语最佳的,是印地语的增强数据的微调IndiChbart模型,以及对Gujarati的基于翻译的映射方法。我们使用Rouge-1,Rouge-2和Rouge-4作为评估指标评估了我们对所获得的推论的分数。

The research on text summarization for low-resource Indian languages has been limited due to the availability of relevant datasets. This paper presents a summary of various deep-learning approaches used for the ILSUM 2022 Indic language summarization datasets. The ISUM 2022 dataset consists of news articles written in Indian English, Hindi, and Gujarati respectively, and their ground-truth summarizations. In our work, we explore different pre-trained seq2seq models and fine-tune those with the ILSUM 2022 datasets. In our case, the fine-tuned SoTA PEGASUS model worked the best for English, the fine-tuned IndicBART model with augmented data for Hindi, and again fine-tuned PEGASUS model along with a translation mapping-based approach for Gujarati. Our scores on the obtained inferences were evaluated using ROUGE-1, ROUGE-2, and ROUGE-4 as the evaluation metrics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源