机器学习模型评估和特征在NPL数据集上的重要性分析

论文标题

机器学习模型评估和特征在NPL数据集上的重要性分析

Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset

论文作者

Fekadu, Rufael, Getachew, Anteneh, Tadele, Yishak, Ali, Nuredin, Goytom, Israel

论文摘要

预测个人不良贷款的可能性对于银行降低信用风险并在贷款前做出正确的决定具有至关重要的有益作用。做出这些决定的趋势是基于信用研究，并根据公认的标准，贷款付款历史记录和客户的人口统计数据。在这项工作中，我们评估了如何在埃塞俄比亚的私人银行提供的数据集上进行不同的机器学习模型，例如随机森林，决策树，KNN，SVM和XGBoost。此外，在此评估中，我们探索了不同的特征选择方法，以说明银行的重要功能。我们的发现表明，XGBoost在Kmeans Smote中获得了最高的F1分数。我们还发现，最重要的特征是申请人的年龄，雇用年的年龄以及申请人的总收入，而不是与抵押相关的特征评估信用风险。

Predicting the probability of non-performing loans for individuals has a vital and beneficial role for banks to decrease credit risk and make the right decisions before giving the loan. The trend to make these decisions are based on credit study and in accordance with generally accepted standards, loan payment history, and demographic data of the clients. In this work, we evaluate how different Machine learning models such as Random Forest, Decision tree, KNN, SVM, and XGBoost perform on the dataset provided by a private bank in Ethiopia. Further, motivated by this evaluation we explore different feature selection methods to state the important features for the bank. Our findings show that XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data. We also found that the most important features are the age of the applicant, years of employment, and total income of the applicant rather than collateral-related features in evaluating credit risk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题