As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.
As a follow-up to our earlier model Autocorrel I, we have implemented a two-stage event correlation approach
with improved performance. Like Autocorrel I, the new model correlates intrusion detection system (IDS) alerts
to automate alert and incidents management, and reduce the workload on an IDS analyst. We achieve this
correlation by clustering similar alerts, thus allowing the analyst to only consider a few clusters rather than
hundreds or thousands of alerts. The first stage uses an artificial neural network (ANN)-based autoassociator
(AA). The AA's objective is to attempt to reproduce each alert at its output. In the process, it uses an
error metric, the reconstruction error (RE), between its input and output to cluster similar alerts. In order to
improve the accuracy of the system we add another machine-learning stage which takes into account the RE as
well as raw attribute information from the input alerts. This stage uses the Expectation-Maximisation (EM)
clustering algorithm. The performance of this approach is tested with intrusion alerts generated by a Snort IDS
on DARPA's 1999 IDS evaluation data as well as incidents.org alerts.
Intrusion detection analysts are often swamped by multitudes of alerts originating from installed intrusion detection
systems (IDS) as well as logs from routers and firewalls on the networks. Properly managing these alerts
and correlating them to previously seen threats is critical in the ability to effectively protect a network from
attacks. Manually correlating events can be a slow tedious task prone to human error. We present a two-stage
alert correlation approach involving an artificial neural network (ANN) autoassociator and a single parameter
decision threshold-setting unit. By clustering closely matched alerts together, this approach would be beneficial
to the analyst. In this approach, alert attributes are extracted from each alert content and used to train an
autoassociator. Based on the reconstruction error determined by the autoassociator, closely matched alerts are
grouped together. Whenever a new alert is received, it is automatically categorised into one of the alert clusters
which identify the type of attack and its severity level as previously known by the analyst. If the attack is
entirely new and there is no match to the existing clusters, this would be appropriately reflected to the analyst.
There are several advantages to using an ANN based approach. First, ANNs acquire knowledge straight from
the data without the need for a human expert to build sets of domain rules and facts. Second, once trained,
ANNs can be very fast, accurate and have high precision for near real-time applications. Finally, while learning,
ANNs perform a type of dimensionality reduction allowing a user to input large amounts of information without
fearing an effciency bottleneck. Thus, rather than storing the data in TCP Quad format (which stores only
seven event attributes) and performing a multi-stage query on reduced information, the user can input all the
relevant information available and instead allow the neural network to organise and reduce this knowledge in an
adaptive and goal-oriented fashion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.