论文标题
一种基于文本插入的方法,用于衡量专利的技术相似性 - 工作流程,代码和应用程序
A Text-Embedding-based Approach to Measure Patent-to-Patent Technological Similarity -- Workflow, Code, and Applications
论文作者
论文摘要
本文描述了一种有效可扩展的方法,可以通过将自然语言处理中的嵌入技术与最近邻居近似结合起来来衡量专利之间的技术相似性。使用这种方法,我们能够计算所有专利之间的现有相似性,这又使我们能够将整个专利宇宙代表为技术网络。我们以各种方式验证了技术签名和相似性,并在电动汽车技术的情况下证明了它们的有用性来衡量知识流,映射技术变化并创建专利质量指标。因此,本文有助于越来越多的有关文本指标的文献进行专利分析。我们提供有关该方法的详尽文档,包括https://github.com/daniel-hain/patent_embedding_research,包括所有代码,指标和中间输出。
This paper describes an efficiently scalable approach to measure technological similarity between patents by combining embedding techniques from natural language processing with nearest-neighbor approximation. Using this methodology we are able to compute existing similarities between all patents, which in turn enables us to represent the whole patent universe as a technological network. We validate both technological signature and similarity in various ways, and demonstrate at the case of electric vehicle technologies their usefulness to measure knowledge flows, map technological change, and create patent quality indicators. Thereby the paper contributes to the growing literature on text-based indicators for patent analysis. We provide thorough documentations of the method, including all code, indicators, and intermediate outputs at https://github.com/daniel-hain/patent_embedding_research.