论文标题

使用字典筛选嵌入用于文本分类的压缩

Embedding Compression for Text Classification Using Dictionary Screening

论文作者

Zhou, Jing, Jing, Xinru, Liu, Muyu, Wang, Hansheng

论文摘要

在本文中,我们提出了一种词典筛选方法,用于将压缩嵌入文本分类任务中。该方法的关键目的是评估字典中每个关键字的重要性。为此,我们首先使用完整词典训练预先指定的基于神经网络的模型。这导致了基准模型,然后我们使用该模型来获取数据集中每个样本的预测类概率。接下来,为了评估每个关键字在影响预测的类概率上的影响,我们开发了一种新的方法来评估字典中每个关键字的重要性。因此,可以筛选每个关键字,只保留最重要的关键字。使用这些筛选的关键字,可以构建具有大量尺寸的新词典。因此,可以基本压缩原始文本序列。所提出的方法导致参数,平均文本序列和字典大小的显着降低。同时,与基准模型相比,预测能力仍然非常有竞争力。进行了广泛的数值研究,以证明该方法的经验性能。

In this paper, we propose a dictionary screening method for embedding compression in text classification tasks. The key purpose of this method is to evaluate the importance of each keyword in the dictionary. To this end, we first train a pre-specified recurrent neural network-based model using a full dictionary. This leads to a benchmark model, which we then use to obtain the predicted class probabilities for each sample in a dataset. Next, to evaluate the impact of each keyword in affecting the predicted class probabilities, we develop a novel method for assessing the importance of each keyword in a dictionary. Consequently, each keyword can be screened, and only the most important keywords are reserved. With these screened keywords, a new dictionary with a considerably reduced size can be constructed. Accordingly, the original text sequence can be substantially compressed. The proposed method leads to significant reductions in terms of parameters, average text sequence, and dictionary size. Meanwhile, the prediction power remains very competitive compared to the benchmark model. Extensive numerical studies are presented to demonstrate the empirical performance of the proposed method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源