论文标题
贝叶斯目标编码中的采样技术
Sampling Techniques in Bayesian Target Encoding
论文作者
论文摘要
目标编码是一种有效的分类变量编码技术,通常用于机器学习系统中,用于处理使用混合数字和分类变量的表格数据集。最近,通过使用共轭贝叶斯建模提出了这种编码技术的EN增强版本。本文通过使用采样技术提出了贝叶斯编码方法的进一步开发,这有助于从目标变量的类别内分布中提取信息,改善概括并减少目标泄漏。
Target encoding is an effective encoding technique of categorical variables and is often used in machine learning systems for processing tabular data sets with mixed numeric and categorical variables. Recently en enhanced version of this encoding technique was proposed by using conjugate Bayesian modeling. This paper presents a further development of Bayesian encoding method by using sampling techniques, which helps in extracting information from intra-category distribution of the target variable, improves generalization and reduces target leakage.