ENSEMBLE MACHINE LEARNING APPROACH FOR IDENTIFYING THREATS IN SECURITY OPERATIONS CENTER
No Thumbnail Available
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Description
Cyberattacks can be prevented by identifying threats before they cause damage, requiring
robust cybersecurity measures. However, recent years have seen an increase in cyber threats
and data breaches, often exploiting infrastructure weaknesses. These attacks lead to significant
financial losses and compromised personal information, necessitating proactive defence
strategies. Traditionally, detecting threats involves laborious log analysis, but machine learning
can automate this process in intrusion detection systems (IDS). This study aims to implement
a blended ensemble approach for cyberattack detection in security operation centers,
combining predictions from base classifiers like Random Forest, XGBoost, HMM, and LSTM,
Feature selection was performed by aggregating importance scores from these classifiers, with
selected features used to improve the model's performance. A web application interface was
developed using the Python Flask framework. The integration of trained models into the
application programming interface (API) facilitated model training and dependency
management. The testing and evaluation were performed on both real production network
traffic flows and the testing set of the CICIDS2017 Thursday-WorkingHours-
Morning.pcap_ISCX.csv dataset, as well as the generated real-time network traffic dataset.
Real web attacks were intentionally executed on the server where the API/Intrusion Detection
System was implemented, and these unlabelled attack network flows were accurately labelled
by the IDS. To implement the ensemble model, the "Thursday-WorkingHours-Morning-
WebAttacks.pcap_ISCX.csv" was extracted from the renowned CICIDS2017 Thursday
Morning Hours Dataset was utilized to train the model. To enhance the diversity of network
traffic patterns and potential security incidents, real-time network traffic was generated using
Sqlite, Zenmap Nmap, ID2T, and Python. The generated real-time network traffic was also
used to train the model to detect unseen attacks. The proposed model performed well on the
balanced Thursday Morning Dataset. With precision, recall, and F1-score all at 0.99, the model
achieved an overall accuracy of 99% across the binary classification task, highlighting its
robustness and effectiveness in handling real-time malicious traffic. These findings validate
the model's ability to detect real-time network traffic patterns, particularly in the context of
potential security incidents. The proposed model demonstrated high performance on the
generated dataset, achieving a precision of 1.00 for detecting malicious threats, thereby
correctly identifying all instances without false positives. The recall of 1.00 further underscored
its capability to detect all actual instances of malicious activity. An F1-score of 1.00 for
legitimate traffic reflected the model's balanced precision and recall, ensuring reliable
classification across categories. Additionally, the cross-validation results exhibited consistently
high accuracy, with an average accuracy of approximately 0.999 across five folds. This
outcome confirms the model's robustness and generalizability across various data subsets,
highlighting its potential for reliable real-time threat detection and enhanced cybersecurity in
practical applications.
Keywords
QA Mathematics, QA75 Electronic computers. Computer science