论文标题

改善技术支持问题的细分

Improving Segmentation for Technical Support Problems

论文作者

Chauhan, Kushal, Gupta, Abhirut

论文摘要

技术支持问题通常是漫长而复杂的。它们通常包含有关问题,设置和尝试分辨率的步骤的用户描述。通常,它们还包含各种非天然语言文本元素,例如命令的输出,代码段,错误消息或堆栈跟踪。这些要素包含有关解决问题的潜在至关重要信息。但是,无法通过为自然语言设计的工具正确解析它们。在本文中,我们解决了针对技术支持问题的细分问题。我们将问题提出为序列标记任务,并研究最新方法的性能。我们将其与直观的上下文句子级分类基线以及有理由监督的文本细分方法进行比较。我们还介绍了一个新的组成部分,该组成部分结合了从多种语言模型进行的上下文嵌入在不同的数据源上的嵌入,从而实现了使用单个预训练的语言模型的嵌入方式明显改进的。最后,我们还通过对答案检索的下游任务进行了改进,证明了这种细分的有用性。

Technical support problems are often long and complex. They typically contain user descriptions of the problem, the setup, and steps for attempted resolution. Often they also contain various non-natural language text elements like outputs of commands, snippets of code, error messages or stack traces. These elements contain potentially crucial information for problem resolution. However, they cannot be correctly parsed by tools designed for natural language. In this paper, we address the problem of segmentation for technical support questions. We formulate the problem as a sequence labelling task, and study the performance of state of the art approaches. We compare this against an intuitive contextual sentence-level classification baseline, and a state of the art supervised text-segmentation approach. We also introduce a novel component of combining contextual embeddings from multiple language models pre-trained on different data sources, which achieves a marked improvement over using embeddings from a single pre-trained language model. Finally, we also demonstrate the usefulness of such segmentation with improvements on the downstream task of answer retrieval.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源