阻止聚类回归

论文标题

阻止聚类回归

Blocked Clusterwise Regression

论文作者

Cytrynbaum, Max

论文摘要

计量经济学中的最新文献模型在面板数据中未观察到的横截面异质性，通过为每个横截面单元分配一个一维，离散的潜在类型。已显示此类模型可以通过回归聚类方法进行估计和推断。本文的动机是，即使面板具有明显的离散横截面结构，也可以误解了本文中研究的聚类异质性模型。为了解决这个问题，我们通过允许每个单元具有多个不完美相关的潜在变量来描述其对不同协变量的响应类型，从而概括了先前的方法来传播未观察到的异质性。我们为我们的模型的K-均值样式估计器提供推理结果，并制定信息标准，以共同选择每个潜在变量的数字簇。蒙特卡洛模拟证实了我们的理论结果，并就估计和模型选择的有限样本性能提供了直觉。我们还通过过度指定的簇数量为聚类理论做出了贡献，并在此设置中得出了新的收敛速率。我们的结果表明，当簇数量过多时，在K-均值样式估计器中过度拟合可能很严重。

A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type. Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure. To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates. We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable. Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection. We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.

下载PDF全文

下载文献需遵守相关版权规定

论文标题