论文标题
单词排名和Hirsch索引,以识别政治文本中的核心
Words ranking and Hirsch index for identifying the core of the hapaxes in political texts
论文作者
论文摘要
本文介绍了对官方政治演讲内容的定量分析。我们研究了美国总统宣布的大约一千个会谈,从华盛顿到特朗普。特别是,我们搜索稀有词的相关性,即每次演讲中只说一次 - 所谓的hapaxes。我们实施了Zipf-Mandelbrot类型的等级大小程序,以讨论Hapaxes的频率规律性,以期在整个演讲集中。从获得的等级大小定律开始,我们通过基于HIRSCH索引变体的过程定义和检测HAPAXES的核心。根据美国总统的整体演讲,我们讨论了由此产生的单词列表。我们进一步表明,这种核心的核心本身可以通过Zipf-Mandelbrot定律很好地拟合,并且其中包含在散点图和拟合曲线之间产生偏差的元素 - 所谓的King和Vice-Roy效应。一些社会政治见解来自有关美国总统信息的发现。
This paper deals with a quantitative analysis of the content of official political speeches. We study a set of about one thousand talks pronounced by the US Presidents, ranging from Washington to Trump. In particular, we search for the relevance of the rare words, i.e. those said only once in each speech -- the so-called hapaxes. We implement a rank-size procedure of Zipf-Mandelbrot type for discussing the hapaxes' frequencies regularity over the overall set of speeches. Starting from the obtained rank-size law, we define and detect the core of the hapaxes set by means of a procedure based on an Hirsch index variant. We discuss the resulting list of words in the light of the overall US Presidents' speeches. We further show that this core of hapaxes itself can be well fitted through a Zipf-Mandelbrot law and that contains elements producing deviations at the low ranks between scatter plots and fitted curve -- the so-called king and vice-roy effect. Some socio-political insights are derived from the obtained findings about the US Presidents messages.