Skip to main navigation Skip to search Skip to main content

A Deterministic Comparison of Classical Machine Learning and Hybrid Deep Representation Models for Intrusion Detection on NSL-KDD and CICIDS2017

Research output: Contribution to journalArticlepeer-review

Abstract

Intrusion detection systems (IDSs) must balance detection quality with operational transparency. We present a deterministic, leakage-free comparison of three classical classifiers: Naïve Bayes (NB), Logistic Regression (LR), and Linear Discriminant Analysis (LDA). We also propose a hybrid pipeline that trains LR on Autoencoder embeddings (AE). Experiments use NSL-KDD and CICIDS2017 under two regimes (with/without SMOTE (Synthetic Minority Oversampling Technique) applied only on training data). All preprocessing (one-hot encoding, scaling, and imputation) is fitted on the training split; fixed seeds and deterministic TensorFlow settings ensure exact reproducibility. We report a complete metric set—Accuracy, Precision, Recall, F1, Area Under the Curve (AUC), and False Alarm Rate (FAR)—and release a replication package (code, preprocessing artifacts, and saved prediction scores) to regenerate all reported tables and metrics. On NSL-KDD, AE+LR yields the highest AUC (≈0.904) and the strongest F1 among the evaluated models (e.g., 0.7583 with SMOTE), while LDA slightly edges LR on Accuracy/F1. NB attains very high Precision (≈0.98) but low Recall (≈0.24), resulting in the weakest F1, yet a low FAR due to conservative decisions. On CICIDS2017, LR delivers the best Accuracy/F1 (0.9878/0.9752 without SMOTE), with AE+LR close behind; both approach ceiling AUC (≈0.996). SMOTE provides modest gains on NSL-KDD and limited benefits on CICIDS2017. Overall, LR/LDA remain strong, interpretable baselines, while AE+LR improves separability (AUC) without sacrificing a simple, auditable decision layer for practical IDS deployment.

Original languageEnglish
Article number749
JournalAlgorithms
Volume18
Issue number12
DOIs
StatePublished - Dec 2025

Bibliographical note

Publisher Copyright:
© 2025 by the authors.

Keywords

  • AUC
  • autoencoder
  • CICIDS2017
  • false alarm rate
  • intrusion detection system (IDS)
  • Linear Discriminant Analysis
  • Logistic Regression
  • Naïve Bayes
  • NSL-KDD
  • SMOTE

Cite this