Wired/Wireless Intrusion Detection System Using Heuristic Based

Wired/Wireless Intrusion Detection System Using Heuristic Based
Principal Component Analysis
Abstract
Normally Principle Component Analysis(PCA)is need to detect intrusion by transforming a set
of multivariate observations to a lower dimension space retaining the variability of the origin
data from any change .However PCA is successful in reducing dimensionality ,but it doesn't take
the labels into account and it fails to present the data in a way to be analyzed, but wireless traffic
is non-linear and therefore it is not feasible for PCA ,In this research Latent Semantic Analysis
(LSA)is proposed to reveal the variables in data.
We are intending to introduce superior algorithm to frame Dynamic Principle Component
Analysis (DPCA)in a heuristic fashion ,this achievement will be explored in properties of
emerging platforms such as smartness and mobility ,we need to merge DPCA and LSA to reveal
semantics over variables.
So for, a group of algorithms have been created and the testing and analysis included
transferring friendly packets and intruding packets .the simulation using data mining and
Artificial Intelligence(AI) showed how the intruding packets were detected and analysed,this
analysis has taken the stationary networks. The new stage of this research will take the mobility
into account all different speeds and directions.
Motivation of this Proposal
1-
PCA works under the following restricted assumption:
โ€ข
the distribution of events occurred within data flows is normal
โ€ข
No auto-correlation is exist among observations
1
โ€ข
Variables are stationary
Mobile and Wireless networks are producing dynamic environment and auto-correlation in
variables is possible, thus time lags of the time series is incorporating within vectors describing
the observation.
2-
Wireless and Mobile network traffic have observations that hold semantic among
variables, and this semantic can be recruited to produce smart PCA of the data set.
3-
Wireless and Mobile data flows is non-stationary, especially, in handoff and resuming
points which weakening the reliability of results obtained by PCA and DPCA.
4-
PCA and dynamic PCA do not take in account the semantic interpretation of the variables
describe the observation while new emerged wireless and mobile networks are working in a
smart environment; this smart environment imposes semantic relationships among variables.
Introduction
With the explosive rapid expansion of computers in last decade and so, their security has become
an important issue, Security is important in any environment . As large information is available on
the network and it is possible to share this data through it, it should be secure. It is somewhat
defined in wired network but in wireless there is great challenge of different attacks. The process
of monitoring the events occurring in a computer system and analyzing them for identifying
intrusions is known as intrusion detection technique and the system is known as intrusion
detection system (IDS). [1]
Intrusion Detection System (IDS) is an important detection used as a countermeasure to preserve
data integrity and system availability from attacks. Intrusion Detection Systems (IDS) is a
combination of software and hardware that attempts to perform intrusion detection. Intrusion
detection is a process of gathering intrusion related knowledge occurring in the process of
monitoring the events and analyzing them for sign or intrusion. It raises the alarm when a possible
2
intrusion occurs in the system. The network data source of intrusion detection consists of large
amount of textual information, which is difficult to comprehend and analyze.[2]
Intrusion detection in wireless networks has gained considerable attention in the last few years.
Wireless networks are not only susceptible to TCP/IP-based attacks native to wired networks,
they are also subject to a wide array of 802.11-specific threats. Such threats range from passive
eavesdropping to more devastating denial of service attacks. To detect these intrusions classifiers
are built to distinguish between normal and anomalous traffic.[3]
Principal Component Analysis (PCA) is a multivariate statistical method which models the linear
correlation structure of a multivariate process from nominal historical data. PCA transforms a set
of multivariate observations to a lower dimension orthogonal space, retaining the most
variability of the original data . Because of the simplification and the orthogonal property
obtained with PCA, this has been used with success for fault diagnosis issues.[2]
Theoretical Background
๐‘‰ฬ…๐‘– = โˆ‘๐‘
๐‘—=1 ๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘— โˆ— ๐‘ข๐‘—
---- 1
๐‘† = {๐‘ฃ1 , ๐‘ฃ2 , โ€ฆ , ๐‘ฃ๐‘€ }
---- 2
๐ถ = {๐‘1 , ๐‘2 , โ€ฆ , ๐‘๐‘€ }
----3
ฬ… ๐’Š : Vector of attributes collected due to event occurred within the problem world
๐‘ฝ
๐’‚๐’•๐’•๐’“๐’Š๐’ƒ๐’‹ : Scalar value of an attribute in ๐‘ข๐‘— direction
๐‘บ: Set of ๐‘‰ฬ…๐‘–
Such that:
โˆ€(๐‘ฃ โˆˆ ๐‘†) โˆƒ(๐‘ โˆˆ ๐ถ) ๐ถ๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘ฆ(๐‘, ๐‘ฃ)
Let
3
๐ท๐‘–๐‘š(๐‘ฃ๐‘– ) = ๐ฟ
๐‘ฆ๐‘–๐‘’๐‘™๐‘‘๐‘ 
โ†’
and ๐ท๐‘–๐‘š(๐‘ฃ๐‘– )๐‘ƒ๐ถ๐ด = ๐‘ƒ
๐‘ƒ๐ถ๐ด: ๐ฟ โ†’ ๐‘ƒ , ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’ ๐‘ƒ < ๐ฟ
๐‘ฆ๐‘–๐‘’๐‘™๐‘‘๐‘ 
โ†’
โˆ€๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘๐‘˜ โˆƒ๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’ (๐‘‰๐‘Ž๐‘Ÿ๐‘–๐‘’๐‘›๐‘๐‘’(๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’) > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘)
โ†’ ๐‘†๐‘–๐‘”๐‘›๐‘Ž๐‘ก๐‘ข๐‘Ÿ๐‘’(๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘๐‘˜, ๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’)
and
๐‘ƒ๐ถ๐ดโ„Ž๐‘’๐‘ข๐‘Ÿ๐‘–๐‘ ๐‘ก๐‘–๐‘๐‘  : ๐ฟ โ†’ ๐พ , ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’ ๐พ < ๐‘ƒ < ๐ฟ ๐‘‚๐‘… (๐‘ƒ < ๐พ < ๐ฟ โ†’ ๐ผ๐ท๐‘†๐พโ„Ž๐‘’๐‘ข๐‘Ÿ๐‘ ๐‘ก๐‘–๐‘๐‘  > ๐ผ๐ท๐‘†๐‘ƒ )
Facts and Axioms
-
PCA is a statistical orthogonal transformation
-
PCA is combined with knowledge guiding to reduce noise and probabilistic behavior
(e.g., PCA + ANN)
-
PCA is successful in reducing the dimensionality of the data sets but it does not take the
parameters or labels into accounts; this way it fails to represent the data in a way that
simplifies the interpretation of underlying parameters
-
Attributes in KDD are divided into three groups: basic features, content features, and
statistical features of network connection.
-
Classes in KDD dataset are mainly categorized into five classes: Normal, denial of
service (DoS), remote to user (R2L), user to root (U2R), and probing.
-
new classes of threats are added by emerging platforms such as cloud and Smartphones
Hypothesis To be investigated and pursued
1-Hypothesis 1: Reduced set of attributes is a local non-complete set over network
attack domain (hint: PCA and Heuristic methodology are domain specific)
4
2-Hypothesis 2: Mobile network adds new dimensions to the vector of security
attributes due to dynamic architecture it imposes.
3-Hypothesis 3: time series analysis of occurrence is a crucial value in perceiving
network threats and events (hint: dynamic principal component analysis)
4-Hypothesis 4: application level attributes are crucial values in detecting
intrusions in application level distributed systems (i.e., web services over the
cloud)
5-Hypothesis 5: LSA (Latent Semantic Analysis) increases the performance of
DPCA algorithm and produces more reliable results (Hint: LSA is executed in
parallel with DPCA)
The Proposed Scheme
Considering the limitation of the conventional PCA, figure (1) presents the proposed
scheme where dynamic PCA (DPCA) has been suggested to monitor non-stationary data of
network and conduct on-line means estimation; this is combined with LSA (Latent Semantic
Analysis) which is proposed to reveal semantics over variables and change the objective function
of DPCA according revealed semantics. DPCA extracts time-dependent relationship in the
measurement through augmenting the measured data matrix by time lagged measured variables.
Training
Samples
Traffic Tracer
5
and Sampler
Traffic
capture
Attribute
vector
Configuration &
control
Attribute
estimator
Ontolog
y
Heuristic
Dynamic
PCA
LSA
On-line mean estimator
Initial Training Samples
KDD
KDD
Initial Dataset
Training Examples
Generate
Decision tree
ID3
1-KDD 99 Data Set
Decision Tree
The KDD Cup 1999 Intrusion detection contest data was prepared by DARPA Intrusion
detection evaluation
by MIT
Lincoln
Laboratory.
They
operated
the and
LANID3
as if it were
Figure program
1: Proposed
Intrusion
Detection
System
based
on hPCA
a true Air Force environment, but peppered it with multiple attacks. The raw data was processed
into connection records. Most of the researchers use this KDD99 data set as input to their
approaches. There are main 4 attacks in KDD99 dataset.
1) Denial of Service Attack (DoS): is an attack in which the attacker makes some computing or
memory resource too busy or too full to handle legitimate requests, or denies legitimate users
access to a machine.
2) Remote to Local Attack (R2L): occurs when an attacker who has the ability to send packets to
a machine over a network but who does not have an account on that machine exploits some
vulnerability to gain local access as a user of that machine .
6
3) User to Root Attack (U2R): is an attack in which attacker starts out with access to a normal
user account on the system and is able to exploit some vulnerability to gain root access in
system.
4) Probe Attack: is an attempt to gain access to a computer and its files through a known or
probable weak point in the computer system.[4,5]
2-Data Mining
Data mining is the art and science of intelligent data analysis. The aim is to discover meaningful
insights and knowledge from data. Discoveries are often expressed as models, and we often
describe data mining as the process of building models. A model captures, in some formulation,
the essence of the discovered knowledge. A model can be used to assist in our understanding of
the world. Models can also be used to make predictions.
For the data miner, the discovery of new knowledge and the building of models that nicely
predict the future can be quite rewarding. Indeed, data mining should be exciting and fun as we
watch new insights and knowledge emerge from our data. With growing enthusiasm, we
meander through our data analyses, following our intuitions and making new discoveries all the
time_discoveries that will continue to help change our world for the better.. Data mining Data
Mining has been applied in most areas of endeavor. There are data mining teams working in
business, government, financial services, biology, medicine, risk and intelligence, science, and
engineering. Anywhere we collect data, data mining is being applied and feeding new knowledge
into human endeavor.and one of the important data mining methods is the decision tree.[6]
2.1 Decision tree
A decision tree is one of the most widely used supervised learning methods used
for data exploration. It is easy to interpret and can be re-represented as
If-then-
else rules. A decision tree consists of nodes and branches connecting the nodes.
The nodes located at the bottom of the tree are called leaves and indicate classes,
7
A decision tree aids in data exploration in the following manner :
โ€ขIt reduces a volume of data by transformation into a more compact form that preserves the
essential characteristics and provides an accurate summary.
โ€ขIt discovers whether the data contains well-separated classes of patterns, such that the classes
can be interpreted meaningfully in the context of a substantive theory.
โ€ขIt maps data in the form of a tree so that prediction values can be generated by backtracking
from the leaves to its root. This may be used to predict the outcome for a new data or query.[7]
The most popular decision tree algorithm is ID3. The following subsections explain basic
concepts of ID3 algorithm:
2.1.1 ID3 Algorithm
Based on Huntโ€™s algorithm, Quinlan developed an algorithm called ID3, in which he used
Shannonโ€™s entropy as a criterion for selecting the most significant/discriminatory feature:
Entropy(S)= โˆ‘๐‘๐‘–=1 โˆ’๐‘๐‘– . log 2 ๐‘๐‘–
(1)
where p_i is the proportion of the patterns belonging to the ith class.
The uncertainty in each node is reduced by choosing the feature that most reduces its entropy
(via the split). To achieve this result, Information Gain (InfoGain) that measures expected
reduction in entropy caused by knowing the value of a feature F_j, is used:
InfoGain(S, F_j) =Entropy(S) โˆ’โˆ‘๐‘ฃ๐‘–โˆˆ๐‘‰๐น
๐‘—
|๐‘†๐‘ฃ๐‘– |
|S|
. ๐ธ๐‘›๐‘ก๐‘Ÿ๐‘œ๐‘๐‘ฆ(๐‘†๐‘ฃ๐‘– )
(2)
where V_(F_j ) is a set of all possible values of feature F_j and S_(v_i )is a subset of S for which
feature F_j has value v_i.
8
The InfoGain is used to select the best feature (reducing the entropy by the largest amount) at
each step of growing a decision tree. To compensate for the bias of the InfoGain for features with
many outcomes, a measure called the Gain Ratio is used:
GR(S, F_j) =
๐ผ๐‘›๐‘“๐‘œ๐บ๐‘Ž๐‘–๐‘›(๐‘†,๐น๐‘— )
(3)
๐‘†๐‘๐‘™๐‘–๐‘ก ๐ผ๐‘›๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘ก๐‘–๐‘œ๐‘›(๐‘†,๐น๐‘— )
where
|๐‘†๐‘– |
|๐‘† |
๐ถ
Split Information(S,F_j) =โˆ‘๐‘–=1 โˆ’
. log 2 ( ๐‘– )
|S|
|S|
(4)
The Split Information is the entropy of S with respect to values of
feature F_j. In a situation
when two or more features have the same value of InfoGain the feature that has the smaller
number of values is selected. Use of the GR results in the generation of smaller trees .[8]
Algorithm (2.2) ID3
Input: S a set of training examples.
Output: A decision tree.
Steps:
1. Create the root node containing the entire set S
2. If all examples are positive, or negative, then stop: decision tree has one node.
3. Otherwise (the general case).
Select feature F_j that has the largest GR value
For each value v_i from the domain of feature F_j:
(a) add a new branch corresponding to this best feature value v_i, and a new node, which stores
9
all the examples that have value v_i for feature F_j
(b) if the node stores examples belonging to one class only, then it becomes a leaf node,
otherwise below this node add a new subtree, and go to step 3
4. End
3-Principle Component Analysis
Principal component analysis (PCA) is a statistical analysis of data in an effective way. With the
aim of space in the data as much as possible to find a set of data variance explained by a special
matrix, the original projection of high dimensional data to lower dimensional data space, and
retains the main information data in order to deal with data information easily. Principal
component analysis is a feature selection and feature extraction process, its main goal is to enter
a large search space characteristics of a suitable vector, and the characteristics of all the main
features extracted. Characteristics of the selection process is to achieve the characteristics of the
input space from the space map, the key to this process is to select feature vectors and input at
all the features on the projector, making these projectors feature extraction can meet both the
requirements of the smallest error variance. For a given M-dimensional random vector
X=[×1+ ×2+โ€ฆ..+×m]T For its mean[X]=0หˆThe covariance Expressed as follows:
CX= E[( X โ€“E[ X])( X โ€“E[ X])T ]
(1)
Because of E[X]=0, covariance matrix is therefore autocorrelation matrix
Cx= E[ XXT]
(2)
Calculation eigenvalues of Cx ฦ›1, ฦ›2,โ€ฆ,ฦ›m and the corresponding normalized
eigenvector ฯ‰1,ฯ‰2,โ€ฆ,ฯ‰m,the following equation
10
Cx ฯ‰j = ฦ›i ฯ‰j
i=1,2,โ€ฆ,m
(3)
Where ฯ‰j =[ ฯ‰j1, ฯ‰j2,โ€ฆ, ฯ‰jm]T .Eigenvector here ฯ‰1, ฯ‰2 ,โ€ฆ ฯ‰m is to satisfy the
characteristics of the input conditions. Eigenvalue based ฦ›1 โ‰ฅ ฦ›2 โ‰ฅ... ฦ›m ,
The Yi = ฯ‰iT X i= 1,2,โ€ฆ,m feature vector is input to the projector, express the
matrix as follows:
Y= ฯ‰T X
(4)
With a linear combination of eigenvectors can be reconfigurable X, The following
formula:
X=ฯ‰Y=โˆ‘๐‘š
๐‘–=1 ๐‘Œ๐‘– ๐œ”๐‘–
(5)
Characteristics obtained through the selection of all the principal components, and
in the feature extraction process, then select the main features to achieve the
purpose of dimensionality reduction.
Y to the mean of the vector analysis
E[Y]= E[ฯ‰T X ]= ฯ‰T E[X ] = 0
(6)
since the covariance matrix CY is the autocorrelation of the matrix Y,be:
CY= E[Y YT]= E[ฯ‰T X XT ฯ‰]= ฯ‰T E[X XT ฯ‰ ] = 0
ฯ‰ for X because of the eigenvectors matrix, so there i
ฦ›1 0 โ‹ฏ
0
0 ฦ›2 0
0
11
(7)
CY=
โ‹ฎ
โ‹ฎ
0
โ‹ฑ โ‹ฎ
(8)
0 โ‹ฏ ฦ›m
In the truncated Y, it is necessary to ensure the cut-off is the sense of mean square
deviation is the optimal. ฦ›1 , ฦ›2,โ€ฆ, ฦ›m
can only consider the first L largest
eigenvalues, with these characteristics for Reconstruction of X , the estimated
value of reconstruction is as follows:
แบŒ= โˆ‘๐ฟ๐‘–=1 ๐œ”i Yi
(9)
Its variance are met as follows:
ะตL = E[( X- แบŒ)2 ]= โˆ‘๐‘€๐‘–=๐ฟ+1 ฦ›๐‘–
(10)
According to the formula (10), The current characteristic value L is larger, the
minimum mean square error can be achieved. Also the formula as follows:
๐‘š
โˆ‘๐‘š
๐‘–=1 ฦ›i = โˆ‘๐‘–=1 ๐‘ž๐‘–๐‘–
(11)
Where qii is the diagonal matrix element of CX , the contribution rate of variance
as follows:
12
When ๐œ‘(L) is large enough, you can pre-L constitute a feature vector space ฯ‰1 , ฯ‰2 ,โ€ฆ ฯ‰L as a
low-dimensional projection space, thus completing the deal with dimensionality reduction.[9]
4-Latent Semantic Analysis
LSA is a new algebraic model of information retrieval, proposed by Landauer and Dumais et al.
It is a calculation theory and method for knowledge acquisition and representation that has been
applied to information retrieval, question answering system.[10]
LSA has been widely used to analyze the latent semantics of documents in an unsupervised way
by exploring the relationships between a set of terms and a corpus of documents and a set of
latent topics .The mathematical formulation of LSA is based on the singular value decomposition
(SVD) of matrices, which imposes the restriction that all latent topics are mutually orthogonal,
which is not always proper or reasonable for real-world applications.[11]
LSA derives the meaning of terms from approximating the structure of term usage among
documents through SVD.This underlying relationship between terms is believed to be mainly
due to transitive relationships between terms, that is, terms are similar if they cooccur with the
same terms within files.
Constructing a latent semantic space model relies on the process of Singular Value
Decomposition (SVD) which can be expressed by Formula:
13
X=VSDT
(1)
Where X is an m-by-n matrix whose rows denote the features and columns denote
the documents.m is the number of features and n is the number of documents in the
training corpus. The three matrices V, S, D on the right of the equation are the
results of the process of SVD of X matrix. The S matrix is a diagonal matrix, the
values on the diagonal are the singular values(the positive square roots of
eigenvalues of XXT (or XTX) matrix), and those values are distributed on the
diagonal in descending order. V matrix is composed of the eigenvectors of XXT
matrix and these eigenvectors correspond to singular valuesโ€™ order, D matrix is
composed of the eigenvectors of XTX matrix and these eigenvectorsโ€™ order also
corresponds to singular valuesโ€™ order. If we use โ€œrโ€ to denote the number of XXT
(or XTX) matrixโ€™s positive eigenvalues, V will be an m-by-r matrix and D will be
an n-by-r matrix. We view V as a matrix which describes the features in latent
semantic space, and view D as a matrix which denotes the documents in latent
semantic space. When we truncate the three matrices to k dimensions, we will get a
model with lower dimensions.
The dimensions in the Latent semantic space model represent latent concepts, so
both the features and the documents will be described by latent concepts.
When we describe a new document Q in this space, we can use the formulas (2) ~
(5):
X=VSDTโ†’DT= (VS)-1 X
(2)
Because both V and D are orthogonal matrices, we have
14
V-1 =VH
(3)
Then
DT =SVH X
(4)
So, the new document can be mapped into latent semantic space through (5):
Q=XVS
(5)
Itโ€™s clear that Q can be directly mapped into latent semantic space by using the VS
matrix.[12]
15
16
17
Related Works
In [1], two drawback of PCA have been investigated, first one is the assumption of the
existing of linear relationships among process variables, and second one is the challenge
of process dynamics which it hasnโ€™t been considered due to fact that PCA is created for
analyzing steady state processes, thus it is not able to handle any process dynamics. The
authors presented in this work a PCA based multivariate time-series segmentation
method which addressed the first drawback, and dynamic extension or multivariate timeseries segmentation has been developed to segment these series based on the changes in
process dynamics.
1- In [2], a modification to the DPCA algorithm for fault detection has been proposed, in
which an appropriate standardization with respect to on-line estimated statistical
parameters is carried out if simple healthy relations between variables can be obtained.
18
This idea allows to deal with non-stationary signals and to reduce significatively the rate
of false alarms. It was shown through a series of tests the effectiveness of the proposed
fault detection algorithm to distinguish between normal changes in signals and the
variations due to the presence of faults.
2- In [3] Baig M. N. et al., present model for feature selection uses the information gain
ratio measure as a means to compute the relevance of each feature and the k-means
classifier to select the optimal set of MAC layer features that can improve the accuracy of
intrusion detection systems while reducing the learning time of their learning algorithm.
The optimization of the feature set for wireless intrusion detection systems on the
performance and learning time of different types of classifiers based on neural networks.
Experimental results with three types of neural network architectures clearly show that
the optimization of a wireless feature set has a significant impact on the Efficiency and
accuracy of the intrusion detection system. In [13] Neelakantan N. P. et al., present that
802.11 network, the features used for training and testing the intrusion detection systems
consist of basic information related to the TCP/IP header, with no considerable attention
to the features associated with lower level protocol frames. The resulting detectors were
efficient and accurate in detecting network attacks at the network and transport layers,
but unfortunately, not capable of detecting 802.11-specific attacks such as
deauthentication attacks or MAC layer DoS attack. In [14] Al-Janabi S. T. et al., they
tend to develop an anomaly based intrusion detection system (IDS) that can promptly
detect and classify various attacks. Anomaly-based IDSs need to be able to learn the
dynamically changing behavior of users or systems. they are experimenting with packet
behavior as parameters in anomaly intrusion detection. There are several methods to
assist IDSs to learn system's behavior. Their proposed IDS use a back propagation
artificial neural network (ANN) to learn system's behavior. They have used the KDD'99
data set in our experiments and the obtained results satisfy the work objective.
In [15] Reddy E. K. et al, they see network security technology has become crucial in protecting
government and industry computing infrastructure. Modern intrusion detection applications
facing complex problems. These applications has to be require reliable, extensible, easy to
manage, and have low maintenance cost. In recent years, data mining-based intrusion detection
19
systems (IDSs) have demonstrated high accuracy, good generalization to novel types of
intrusion, and robust behavior in a changing environment. Still, significant challenges exist in the
design and implementation of production quality IDSs. Instrumenting components such as of
data transformations, model deployment, cooperative distributed detection and complex
engineering endeavor. In [16] Suebsing A. et al, see in the previous researches on feature
selection, the criteria and way about how to select the features in the raw data are mostly difficult
to implement. Therefore, this work presents the easy and novel method, for feature selection,
which can be used to separate correctly between normal and attack patterns of computer network
connections. The goal in their work is to effectively apply Euclidean Distance for selecting a
subset of robust features using smaller storage space and getting higher Intrusion detection
performance. Experimental results show that the proposed approach based on the Euclidean
Distance can improve the performance of a true positive intrusion detection rate especially for
detecting known attack patterns. In [17] Bensefia H. et al., propose a new approach for IDS
adaptability by integrating a Simple Connectionist Evolving System (SECOS) and a WinnerTakes-All (WTA) hierarchy of XCS (eXtended Classifier System). This integration puts in relief
an adaptive hybrid intrusion detection core that plants the adaptability as an intrinsic and native
functionality in the IDS. In[18] Dr. Saad K. Majeed present a proposal Hybrid Multilevel
Network Intrusion Detection System (HMNIDS) which is a "hybrid multilevel IDS", is hybrid
because use misuse and anomaly techniques in intrusion detection, and is multilevel since it
apply the two detection techniques hierarchal in two levels. First level applies anomaly ID
technique using Support Vector Machine (SVM) for detecting the traffics either normal or
intrusions, if normal then passes it else the system input the intrusion traffic to the second level
to detect the class of intrusion where this level apply Misuse ID technique using Artificial
Neural Networks (ANN). The proposal depend on Data mining is a DM-based HMNIDS since
mining provide iterative process so if results are not satisfied with optimal solution, the mining
steps will continue to be carried out until mining results are corresponding intention results. For
training and testing of MHNIDS in our experiment, we used NSL-KDD data set. It has solved
some of the inherent problems of the KDDโ€™99. NSL-KDD similar to KDD99 their connections
contains 41 features and is labeled as either normal or attack type, many of these features are
irrelative in classification process. Principle Component Analysis (PCA) is used as feature
20
extraction to reduce no. of features to avoid time consuming in training and real-time detecting.
PCA introduce 8 features as subset of correlated
intrinsic features present the basic point in classification. The sets of features that have been
resulted from PCA and the all features set will be the feeding of HMNIDS. The results obtained
from HMNIDS showing that accuracy rate of SVM and ANN classifiers separately are both high
but they are higher with PCA (8) features than all (41) features. Confusion matrix of HMNIDS
gives high detection rates and less false alarm rate, also they are higher with (8) PCA than all
(41).In [19] Dr. Saad K. Majeed presents a proposal Wireless Network Intrusion Detection
System (WNIDS) which is use misuse and anomaly techniques in intrusion detection. The
proposal depend on Data mining is a DM-based WNIDS since mining provide iterative process
so if results are not satisfied with optimal solution, the mining steps will continue to be carried
out until mining results are corresponding intention results. For training and testing of WNIDS in
our experiment, we used collected dataset called it Wdataset, the collection done on an organized
WLAN 802.11 consist of 5 machines. The collection of data involved frames from all types
(normal and the four known intrusions and unknown intrusion).
The collected connections contain features those appear directly in the header of 802.11 frames
and we added one more feature (casting) since it is critical in distinguish among intrusions.
These connections are labeled as either normal or attack type, many of these features are
irrelative in classification process. Here we propose Support Vector Machine SVM classifier as
feature extraction to reduce no. of features to avoid time consuming in training and real-time
detecting. SVM introduce 8 features as subset of correlated intrinsic features present the basic
point in classification. The sets of features that have been resulted from SVM and the all features
set will be the feeding of WNIDS.
The results obtained from WNIDS showing that accuracy rate of ANN and ID3 classifiers are
both higher with SVM (8) features than set of all features. And absolutely, ANN accuracy is
higher than ID3 with both sets of features.
References
21
1- Zoltan Banko, Laszlo Dobos, and Janos Abonyi, โ€œ Dynamic Principal Component
Analysis in Multivariate Time-Series Segmentationโ€, 2011,
2- Jesus Mina and Cristina Verde,โ€ Fault Detection for Large Scale Systems Using
Dynamic Principal Components Analysis with Adaptationโ€, International Journal of
Computers, Communications & Control, Vol. II, 2007
3- Baig M. N. and Kumar K. K. , โ€œIntrusion Detection in Wireless Networks Using
Selected Featuresโ€, (IJCSIT) International Journal of Computer Science and Information
Technologies, Vol. 2 (5) , 2011, 1887-1893.
4- Vidit Pathak, Dr. Ananthanarayana V. S." A Novel Multi-Threaded K-Means Clustering
Approach for Intrusion Detection",978-1-4673-2008-5/12/$31.00 ©2012 IEEE.
5- "KDD Cup 1999 Data", The UCI KDD Archive, Information and Computer Science,
University of California, Irvine, 1999, available at:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
6- Chetan R & 2Ashoka D.V." Data Mining Based Network Intrusion Detection System: A
Database Centric Approach"2012 International Conference on Computer
Communication and Informatics (ICCCI -2012), Jan. 10 โ€“ 12, 2012, Coimbatore,
INDIA"
7- Krzysztof J. Cios, Witold Pedrycz, Roman W. Swiniarski, and Lukasz A. Kurgan, "Data
Mining A Knowledge Discovery Approach", Springer, 2007.
8- Jiawei Han, and Micheline Kamber, "Data Mining: Concepts and Techniques", Morgan
Kaufmaan Publishers, 2006.
9- Chen Yu, Zhang jian ,Yi Bo, Chen Deyun" A Novel Principal Component Analysis
Neural Network Algorithm for Fingerprint Recognition in Online Examination System
" 2009 Asia-Pacific Conference on Information Processing.
10- Wei Song and Soon Cheol Park" Analysis of Web Clustering Based on Genetic
Algorithm with Latent Semantic Indexing Technology" Sixth International Conference
on Advanced Language Processing and Web Information Technology, 0-7695-2930-5/07
$25.00 © 2007 IEEE DOI 10.1109/ALPIT.2007.77.
11- Sheng-Yi Kong and Lin-Shan Lee" Semantic Analysis and Organization of Spoken
22
Documents Based on Parameters DerivedFrom Latent Topics" IEEE TRANSACTIONS
ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 7,
SEPTEMBER 2011
12- Dongfeng Cai, Liwei Chang, Duo Ji," LATENT SEMANTIC ANALYSIS BASED ON
SPACE INTEGRATION", Proceedings of IEEE CCIS2012, 978-1-4673-18570/12/$31.00 ©2012 IEEE.
13- Neelakantan N. P., Nagesh C. and Tech M.., โ€œRole of Feature Selection in Intrusion
Detection Systems for 802.11 Networksโ€, International Journal of Smart Sensors and Ad
Hoc Networks (IJSSAN) Volume-1, Issue-1, 2011.
14- Al-Janabi S. T., and Saeed H. A., โ€œA Neural Network Based Anomaly Intrusion
Detection Systemโ€, IEEE Computer Society, 2011 Developments in E-systems
Engineering, pp. 221-226.
15- Reddy E. K. , Reddy V. N., Rajulu P. G., โ€œA Study of Intrusion Detection in Data
Miningโ€, Proceedings of the World Congress on Engineering 2011 Vol III WCE 2011,
July 6 - 8, 2011, London, U.K.
16- Suebsing A., Hiransakolwong N. , โ€œEuclidean-based Feature Selection for Network
Intrusion Detectionโ€, 2009 International Conference on Machine Learning and
Computing IPCSIT vol.3 (2011) © (2011) IACSIT Press, Singapore.
17- Bensefia H. and Ghoualmi N., โ€œA New Approach for Adaptive Intrusion Detectionโ€,
2011 Seventh International Conference on Computational Intelligence and Security,
2011.
18- Saad K. Majeed, Soukaena H. Hashem and Ikhlas K. Gbashi," Propose HMNIDS
Hybrid Multilevel Network Intrusion Detection System",IJCSI International Journal of
Computer Science Issues, Vol. 10, Issue 5, No 2, September 2013.
19- Saad K. Majeed, Soukaena H. Hashem, Ikhlas K. Gbashi" Proposal to WNIDS Wireless
Network Intrusion Detection System" IJSR - INTERNATIONAL JOURNAL OF SCIENTIFIC
RESEARCH , Volume : 2 | Issue : 10 | October 2013 โ€ข ISSN No 2277 โ€“ 8179
23