论文标题
听起来如何?使用语音生成和深度学习的多语言Speenname2Vec算法
How Does That Sound? Multi-Language SpokenName2Vec Algorithm Using Speech Generation and Deep Learning
论文作者
论文摘要
搜索有关特定人的信息是许多用户经常进行的在线活动。在大多数情况下,用户可以通过包含名称的查询并将其发送回网络搜索引擎以找到自己的意愿。通常,Web搜索引擎仅提供与名称查询相关的一些准确结果。当前,大多数用于在线搜索中同义词的解决方案基于模式匹配和语音编码,但是,这种解决方案的性能往往不超过最佳。在本文中,我们提出了一种新颖而通用的方法SpokeName2Vec,它通过利用自动语音生成来解决相似的名称建议问题,并深入学习产生了口语名称嵌入。这种复杂而创新的嵌入方式捕捉了人们以任何语言和口音发音的名称的方式。利用名称发音可以有助于区分和检测听起来相似的名称,但书写不同。在大规模数据集中证明了该方法,该数据集由250,000个预订组成,并使用机器学习分类器和7,399个名称进行了评估,并具有经过验证的同义词。发现所提出的方法的性能优于本研究中评估的其他10种算法,包括良好使用的语音和字符串相似性算法,以及两种最近提出的算法。获得的结果表明,所提出的方法可以作为解决类似名称建议问题的有用且有价值的工具。
Searching for information about a specific person is an online activity frequently performed by many users. In most cases, users are aided by queries containing a name and sending back to the web search engines for finding their will. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Currently, most solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec, a novel and generic approach which addresses the similar name suggestion problem by utilizing automated speech generation, and deep learning to produce spoken name embeddings. This sophisticated and innovative embeddings captures the way people pronounce names in any language and accent. Utilizing the name pronunciation can be helpful for both differentiating and detecting names that sound alike, but are written differently. The proposed approach was demonstrated on a large-scale dataset consisting of 250,000 forenames and evaluated using a machine learning classifier and 7,399 names with their verified synonyms. The performance of the proposed approach was found to be superior to 10 other algorithms evaluated in this study, including well used phonetic and string similarity algorithms, and two recently proposed algorithms. The results obtained suggest that the proposed approach could serve as a useful and valuable tool for solving the similar name suggestion problem.