$ mc^2RAM $：马尔可夫链蒙特卡洛在sram中的快速贝叶斯推断

论文标题

$ mc^2RAM $：马尔可夫链蒙特卡洛在sram中的快速贝叶斯推断

$MC^2RAM$: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference

论文作者

Shukla, Priyesh, Shylendra, Ahish, Tulabandhula, Theja, Trivedi, Amit Ranjan

论文摘要

这项工作讨论了马尔可夫链蒙特卡洛（MCMC）从SRAM中任意高斯混合模型（GMM）采样的实施。我们通过将其嵌入随机数发生器（RNG），数字到Analog转换器（DACS）和类似物到数字化的转换器（ADC）来展示SRAM的新型体系结构，以便可以将SRAM阵列用于高性能Metropolis-Hastings（MH）基于基于Algorithm的MCMCMCMCMCMCMCSpling。大多数昂贵的计算都是在SRAM中执行的，可以并行化以进行高速采样。我们的迭代计算流量可最大程度地减少抽样过程中的数据移动。我们通过模拟45 nm CMOS技术来表征设计的功率绩效权衡。对于二维混合物GMM，该实现每次采样迭代均消耗了约91个微瓦特功率，并在2000年时钟周期中以1 GHz时钟频率平均产生500个样品。我们的研究强调了有关低级硬件非理想性如何影响高级抽样特征的有趣见解，并建议在高性能抽样的区域/功率约束中最佳操作SRAM。

This work discusses the implementation of Markov Chain Monte Carlo (MCMC) sampling from an arbitrary Gaussian mixture model (GMM) within SRAM. We show a novel architecture of SRAM by embedding it with random number generators (RNGs), digital-to-analog converters (DACs), and analog-to-digital converters (ADCs) so that SRAM arrays can be used for high performance Metropolis-Hastings (MH) algorithm-based MCMC sampling. Most of the expensive computations are performed within the SRAM and can be parallelized for high speed sampling. Our iterative compute flow minimizes data movement during sampling. We characterize power-performance trade-off of our design by simulating on 45 nm CMOS technology. For a two-dimensional, two mixture GMM, the implementation consumes ~ 91 micro-Watts power per sampling iteration and produces 500 samples in 2000 clock cycles on an average at 1 GHz clock frequency. Our study highlights interesting insights on how low-level hardware non-idealities can affect high-level sampling characteristics, and recommends ways to optimally operate SRAM within area/power constraints for high performance sampling.

下载PDF全文

下载文献需遵守相关版权规定

论文标题