论文标题

APP遥测的可扩展统计根本原因分析

Scalable Statistical Root Cause Analysis on App Telemetry

论文作者

Murali, Vijayaraghavan, Yao, Edward, Mathur, Umang, Chandra, Satish

论文摘要

尽管工程工作流程旨在防止车间代码部署,但错误仍在Facebook应用程序中。当这些错误的症状(例如用户提交的报告和自动捕获的崩溃)的症状被报告时,发现其根本原因是解决它们的重要一步。但是,根据Facebook数十亿用户的规模,根据部署软件的各种用户和执行环境,一个错误可以表现为几种不同的症状。因此,根本原因分析(RCA)需要乏味的手动调查和域专业知识来提取在报告组中观察到的常见模式,并将其用于调试。 我们提出了Minesweeper,这是RCA的一种技术,它朝着自动从其症状中识别出虫子的根本原因。该方法基于两个关键方面:(i)一种可扩展的算法,可从远程记录信息中有效地挖掘出与报告一起收集的遥测信息,以及(ii)精确的统计概念和对模式的回忆,有助于指向根本原因。我们评估了扫雷器在发现现实世界中的症状以及Facebook应用程序崩溃报告中的根本原因方面的可伸缩性和有效性。我们的评估表明,扫雷器可以在不到3分钟的时间内执行数万个报告的RCA,并且在识别回归的根本原因方面准确地超过85%。

Despite engineering workflows that aim to prevent buggy code from being deployed, bugs still make their way into the Facebook app. When symptoms of these bugs, such as user submitted reports and automatically captured crashes, are reported, finding their root causes is an important step in resolving them. However, at Facebook's scale of billions of users, a single bug can manifest as several different symptoms according to the various user and execution environments in which the software is deployed. Root cause analysis (RCA) therefore requires tedious manual investigation and domain expertise to extract out common patterns that are observed in groups of reports and use them for debugging. We propose Minesweeper, a technique for RCA that moves towards automatically identifying the root cause of bugs from their symptoms. The method is based on two key aspects: (i) a scalable algorithm to efficiently mine patterns from telemetric information that is collected along with the reports, and (ii) statistical notions of precision and recall of patterns that help point towards root causes. We evaluate Minesweeper's scalability and effectiveness in finding root causes from symptoms on real world bug and crash reports from Facebook's apps. Our evaluation demonstrates that Minesweeper can perform RCA for tens of thousands of reports in less than 3 minutes, and is more than 85% accurate in identifying the root cause of regressions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源