论文标题
深度卷积神经网络中的单个单位状态的面部识别:稀疏重新定义
Single Unit Status in Deep Convolutional Neural Network Codes for Face Identification: Sparseness Redefined
论文作者
论文摘要
对面部识别训练的深度卷积神经网络(DCNN)会开发出概括可变图像的表示形式,同时保留主题(例如性别)和图像(例如,观点)信息。在“神经单位”和面部识别网络的集合水平上研究了身份,性别和观点代码。在单位级别,通过删除单元以在顶部网络层上创建可变大小的随机采样子空间来测量标识,性别分类和观点估计。由于尺寸从512个单位降低到16(0.95),4(0.80)和2(0.72)单元,因此3,531个身份的识别仍然很高(ROC下的面积约为1.0)。个体身份在每个顶层单元上统计分开。跨单元响应的最小相关性,表明单位代码非冗余身份提示。此“分布式”代码仅需要一个稀疏的随机样本,即可准确识别面部。性别分类逐渐下降,随着维度下降,观点估计急剧下降。各个单位对性别和观点的预测较弱,但合奏被证明是有效的预测因素。因此,在网络单元中共存的分布式和稀疏代码以表示不同的面部属性。在整体级别上,面部表示的主成分分析表明,通过解释方差订购的身份,性别和观点信息分为高维子空间。身份,性别和观点信息有助于所有单个单位响应,从而削弱了面部属性的神经调整类比。从单个单位响应中推断出DCNN的神经样代码的解释以及高级视觉代码。相反,“含义”是由高维空间中的方向编码的。
Deep convolutional neural networks (DCNNs) trained for face identification develop representations that generalize over variable images, while retaining subject (e.g., gender) and image (e.g., viewpoint) information. Identity, gender, and viewpoint codes were studied at the "neural unit" and ensemble levels of a face-identification network. At the unit level, identification, gender classification, and viewpoint estimation were measured by deleting units to create variably-sized, randomly-sampled subspaces at the top network layer. Identification of 3,531 identities remained high (area under the ROC approximately 1.0) as dimensionality decreased from 512 units to 16 (0.95), 4 (0.80), and 2 (0.72) units. Individual identities separated statistically on every top-layer unit. Cross-unit responses were minimally correlated, indicating that units code non-redundant identity cues. This "distributed" code requires only a sparse, random sample of units to identify faces accurately. Gender classification declined gradually and viewpoint estimation fell steeply as dimensionality decreased. Individual units were weakly predictive of gender and viewpoint, but ensembles proved effective predictors. Therefore, distributed and sparse codes co-exist in the network units to represent different face attributes. At the ensemble level, principal component analysis of face representations showed that identity, gender, and viewpoint information separated into high-dimensional subspaces, ordered by explained variance. Identity, gender, and viewpoint information contributed to all individual unit responses, undercutting a neural tuning analogy for face attributes. Interpretation of neural-like codes from DCNNs, and by analogy, high-level visual codes, cannot be inferred from single unit responses. Instead, "meaning" is encoded by directions in the high-dimensional space.