论文标题
使用Apache Spark的准确的物联网入侵检测框架
An accurate IoT Intrusion Detection Framework using Apache Spark
论文作者
论文摘要
自1980年代出现以来,互联网已经造成了巨大的变化,现在,物联网(IoT)似乎也在这样做。物联网的潜力使其成为许多人的关注焦点,但是,有些人看到有贡献的机会,其他人可能会将IoT网络视为被利用的目标。大量的物联网设备使它们成为可能带来毁灭性后果的拒绝服务攻击(DOS)的理想设置。这使得明显的网络安全措施(例如入侵检测系统(IDS))的需求。本文的目的是使用大数据平台Apache Spark构建ID。 Apache Spark与其ML库(MLLIB)和BOT-IOT数据集一起使用。然后根据F量测试(F1)对ID进行测试和评估,以及评估不平衡数据时的标准。进行了两轮测试,一个用于最小化偏差的部分数据集,以及用于探索安全设置中的大数据和ML功能的完整BOT-IOT数据集。对于部分数据集,随机森林算法的平均F1度量为99.7%,对主要类别分类的二进制分类性能最高,而对于Sub类别分类,则具有99.6%的二进制分类性能。至于完整的数据集,决策树算法对所有进行测试的F1量度最高。二进制分类为97.9%,主要类别分类为79%,子类别分类为77%。
The internet has caused tremendous changes since its appearance in the 1980s, and now, the Internet of Things (IoT) seems to be doing the same. The potential of IoT has made it the center of attention for many people, but, where some see an opportunity to contribute, others may see IoT networks as a target to be exploited. The high number of IoT devices makes them the perfect setup for staging denial-of-service attacks (DoS) that can have devastating consequences. This renders the need for cybersecurity measures such as intrusion detection systems (IDSs) evident. The aim of this paper is to build an IDS using the big data platform, Apache Spark. Apache Spark was used along with its ML library (MLlib) and the BoT-IoT dataset. The IDS was then tested and evaluated based on F-Measure (f1), as was the standard when evaluating imbalanced data. Two rounds of tests were performed, a partial dataset for minimizing bias, and the full BoT-IoT dataset for exploring big data and ML capabilities in a security setting. For the partial dataset, the Random Forest algorithm had the highest performance for binary classification at an average f1 measure of 99.7%, as well as 99.6% for main category classification, and an 88.5% f1 measure for sub category classification. As for the complete dataset, the Decision Tree algorithm scored the highest f1 measures for all conducted tests; 97.9% for binary classification, 79% for main category classification, and 77% for sub category classification.