Skip to content

Latest commit

 

History

History
109 lines (67 loc) · 6.84 KB

README.md

File metadata and controls

109 lines (67 loc) · 6.84 KB

Military Network Intrusion Dataset

Overview

This dataset was curated by simulating a wide variety of intrusions in a military network environment. The simulation aimed to replicate a typical US Air Force LAN, providing an authentic environment to acquire raw TCP/IP dump data. The LAN was designed to mirror real-world conditions and was subjected to numerous attack scenarios.

Dataset Description

  • Environment: The dataset was created by simulating a typical US Air Force LAN.
  • Data Type: Raw TCP/IP dump data.
  • Connection: A connection in this dataset refers to a sequence of TCP packets that start and end over a certain time duration. During this duration, data flows from a source IP address to a target IP address under a specific protocol.
  • Labels: Each connection is labeled as either:
    • Normal: Indicating typical, non-malicious traffic.
    • Anomalous: Indicating traffic that is indicative of an attack. Each anomalous connection is further labeled with a specific attack type.

Features

The dataset comprises 41 features for each TCP/IP connection. These features are a mix of quantitative and qualitative data:

  • Quantitative Features: 38
  • Qualitative Features: 3

The features provide insights into the nature of the traffic, helping in distinguishing between normal and anomalous connections.

Class Variable

The class variable in the dataset categorizes the connections into two main categories:

  1. Normal: Connections that represent regular, non-malicious traffic.
  2. Anomalous: Connections that represent potential threats or attacks.

Model Performance

We trained several machine learning models on this dataset and evaluated their performance using two key metrics: Train Score and Test Score. Here are the results:

Model Train Score Test Score
K-Nearest Neighbors (KNN) 0.977 0.977
Logistic Regression 0.916 0.908
Decision Tree 1.000 0.995
Random Forest 0.999 0.996
Gradient Boosting Machine (GBM) 0.996 0.993
XGBoost 1.000 0.996
Adaboost 0.986 0.986
Light GBM 1.000 0.995
CatBoost 0.999 0.995
Naive Bayes Model 0.896 0.897
Voting Ensemble 0.999 0.996
Support Vector Machine (SVM) 0.964 0.963

Insights

  • The models have been evaluated using Train and Test Scores, with Test Scores ranging from 0.895 to 0.996.
  • Several models, such as Decision Tree, Random Forest, XGBoost, Light GBM, and the Voting Ensemble, have achieved high Test Scores, indicating strong performance in distinguishing between normal and anomalous network connections.
  • The dataset's class distribution and the diversity of features have likely contributed to the model's performance.
  • Further analysis and domain-specific considerations may be necessary for selecting the most suitable model for real-world network intrusion detection.

Usage

You can use the code and data in this repository to:

  1. Train and evaluate machine learning models on similar network intrusion detection datasets.
  2. Modify and fine-tune existing models or develop new models for improved performance.
  3. Gain insights into the effectiveness of different machine learning algorithms for network security tasks.

Further Approaches and Model Fine-Tuning

While the initial model evaluation has yielded promising results, there are several additional approaches and strategies you can explore to enhance the performance of network intrusion detection models:

1. Hyperparameter Tuning

Fine-tuning model hyperparameters can significantly improve model performance. Consider using techniques like grid search, random search, or Bayesian optimization to find optimal hyperparameter settings for your chosen models. This can lead to better generalization and increased accuracy.

2. Feature Engineering

Feature engineering plays a crucial role in network intrusion detection. Experiment with different feature selection methods, dimensionality reduction techniques, or the creation of new features to capture more relevant information from network connections.

3. Ensembling

Ensemble methods, such as stacking or blending, can further boost model performance. Combining the predictions of multiple well-tuned models can often yield better results than individual models. Explore different ensemble strategies to find the most effective combination.

4. Anomaly Detection Algorithms

Consider incorporating specialized anomaly detection algorithms into your model pipeline. These algorithms are designed to detect unusual patterns and can complement traditional classification models. Techniques like Isolation Forests, One-Class SVM, or autoencoders may be worth exploring.

5. Cross-Validation and Robust Evaluation

Ensure robust model evaluation by using cross-validation techniques. This helps assess the model's generalization performance and reduces the risk of overfitting. Stratified K-Fold cross-validation is a common choice for imbalanced datasets.

6. Imbalanced Data Handling

Since network intrusion datasets often suffer from class imbalance, employ techniques to handle this issue. Options include oversampling the minority class, undersampling the majority class, or using synthetic data generation methods like SMOTE (Synthetic Minority Over-sampling Technique).

7. Deep Learning Architectures

Consider experimenting with deep learning architectures, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), especially if the dataset is large and complex. Deep learning models can automatically learn hierarchical features from raw data.

8. Model Interpretability

Enhance model interpretability to gain insights into the features and decisions made by the models. Techniques like SHAP (SHapley Additive exPlanations) values or LIME (Local Interpretable Model-agnostic Explanations) can help explain model predictions.

Feel free to explore these approaches and adapt them to your specific use case. Remember to keep track of your experiments, document your findings, and iterate on model improvements to achieve even better results in network intrusion detection.

Contributing

Contributions to this project are highly encouraged. If you have insights, enhancements, or new approaches to share, please open an issue or submit a pull request. Collaboration can lead to more robust and effective network intrusion detection models.

Feel free to contribute to this project by providing enhancements, additional models, or domain-specific expertise.