论文标题
通货膨胀测量中的扫描仪数据:从原始数据到价格指数
Scanner data in inflation measurement: from raw data to price indices
论文作者
论文摘要
扫描仪数据为CPI或HICP计算提供了新的机会。它们可以从各种各样的〜零售商(超市,家庭电子,互联网商店等)中获得,并在条形码级别提供信息。 〜使用扫描仪数据的优势之一是它们包含完整的交易信息,即每个已出售商品的价格和数量。要使用扫描仪数据,必须仔细处理。清除数据并统一产品名称后,应仔细分类产品(例如,将产品分类为5或以下),匹配,过滤和汇总。这些过程通常需要创建新的IT或编写自定义脚本(R,Python,Mathematica,SAS等)。与扫描仪数据相关的新挑战之一是〜索引公式的适当选择。在本文中,我们介绍了〜处理扫描仪数据的单个阶段的建议。我们还指出了扫描仪数据处理期间的潜在问题及其解决方案。最后,我们根据实际扫描仪数据集比较了大量的〜价格指数方法,并验证了它们对采用数据过滤和汇总方法的敏感性。
Scanner data offer new opportunities for CPI or HICP calculation. They can be obtained from a~wide variety of~retailers (supermarkets, home electronics, Internet shops, etc.) and provide information at the level of~the barcode. One of~advantages of~using scanner data is the fact that they contain complete transaction information, i.e. prices and quantities for every sold item. To use scanner data, it must be carefully processed. After clearing data and unifying product names, products should be carefully classified (e.g. into COICOP 5 or below), matched, filtered and aggregated. These procedures often require creating new IT or writing custom scripts (R, Python, Mathematica, SAS, others). One of~new challenges connected with scanner data is the appropriate choice of~the index formula. In this article we present a~proposal for the implementation of~individual stages of~handling scanner data. We also point out potential problems during scanner data processing and their solutions. Finally, we compare a~large number of~price index methods based on real scanner datasets and we verify their sensitivity on adopted data filtering and aggregating methods.