TY - JOUR
T1 - A Deterministic Comparison of Classical Machine Learning and Hybrid Deep Representation Models for Intrusion Detection on NSL-KDD and CICIDS2017
AU - Arcos-Argudo, Miguel
AU - Bojorque, Rodolfo
AU - Torres, Andrés
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/12
Y1 - 2025/12
N2 - Intrusion detection systems (IDSs) must balance detection quality with operational transparency. We present a deterministic, leakage-free comparison of three classical classifiers: Naïve Bayes (NB), Logistic Regression (LR), and Linear Discriminant Analysis (LDA). We also propose a hybrid pipeline that trains LR on Autoencoder embeddings (AE). Experiments use NSL-KDD and CICIDS2017 under two regimes (with/without SMOTE (Synthetic Minority Oversampling Technique) applied only on training data). All preprocessing (one-hot encoding, scaling, and imputation) is fitted on the training split; fixed seeds and deterministic TensorFlow settings ensure exact reproducibility. We report a complete metric set—Accuracy, Precision, Recall, F1, Area Under the Curve (AUC), and False Alarm Rate (FAR)—and release a replication package (code, preprocessing artifacts, and saved prediction scores) to regenerate all reported tables and metrics. On NSL-KDD, AE+LR yields the highest AUC (≈0.904) and the strongest F1 among the evaluated models (e.g., 0.7583 with SMOTE), while LDA slightly edges LR on Accuracy/F1. NB attains very high Precision (≈0.98) but low Recall (≈0.24), resulting in the weakest F1, yet a low FAR due to conservative decisions. On CICIDS2017, LR delivers the best Accuracy/F1 (0.9878/0.9752 without SMOTE), with AE+LR close behind; both approach ceiling AUC (≈0.996). SMOTE provides modest gains on NSL-KDD and limited benefits on CICIDS2017. Overall, LR/LDA remain strong, interpretable baselines, while AE+LR improves separability (AUC) without sacrificing a simple, auditable decision layer for practical IDS deployment.
AB - Intrusion detection systems (IDSs) must balance detection quality with operational transparency. We present a deterministic, leakage-free comparison of three classical classifiers: Naïve Bayes (NB), Logistic Regression (LR), and Linear Discriminant Analysis (LDA). We also propose a hybrid pipeline that trains LR on Autoencoder embeddings (AE). Experiments use NSL-KDD and CICIDS2017 under two regimes (with/without SMOTE (Synthetic Minority Oversampling Technique) applied only on training data). All preprocessing (one-hot encoding, scaling, and imputation) is fitted on the training split; fixed seeds and deterministic TensorFlow settings ensure exact reproducibility. We report a complete metric set—Accuracy, Precision, Recall, F1, Area Under the Curve (AUC), and False Alarm Rate (FAR)—and release a replication package (code, preprocessing artifacts, and saved prediction scores) to regenerate all reported tables and metrics. On NSL-KDD, AE+LR yields the highest AUC (≈0.904) and the strongest F1 among the evaluated models (e.g., 0.7583 with SMOTE), while LDA slightly edges LR on Accuracy/F1. NB attains very high Precision (≈0.98) but low Recall (≈0.24), resulting in the weakest F1, yet a low FAR due to conservative decisions. On CICIDS2017, LR delivers the best Accuracy/F1 (0.9878/0.9752 without SMOTE), with AE+LR close behind; both approach ceiling AUC (≈0.996). SMOTE provides modest gains on NSL-KDD and limited benefits on CICIDS2017. Overall, LR/LDA remain strong, interpretable baselines, while AE+LR improves separability (AUC) without sacrificing a simple, auditable decision layer for practical IDS deployment.
KW - AUC
KW - autoencoder
KW - CICIDS2017
KW - false alarm rate
KW - intrusion detection system (IDS)
KW - Linear Discriminant Analysis
KW - Logistic Regression
KW - Naïve Bayes
KW - NSL-KDD
KW - SMOTE
UR - https://www.scopus.com/pages/publications/105026061338
U2 - 10.3390/a18120749
DO - 10.3390/a18120749
M3 - Article
AN - SCOPUS:105026061338
SN - 1999-4893
VL - 18
JO - Algorithms
JF - Algorithms
IS - 12
M1 - 749
ER -