Department of Computer Science, The University of Houston Intrusion Detection Module 3. Network Intrusion Detection Stephen Huang Department of Computer Science University of Houston 1 Department of Computer Science, The University of Houston Overview • • • • • • Motivation Previous work Thumbprints Local thumbprints Conclusion and future work Reference 2 Department of Computer Science, The University of Houston Intrusion Detection • The goal: to develop means by which intruders can be traced efficiently. • Terminology: extended connection, connection chain 3 Department of Computer Science, The University of Houston Tracing mechanisms • Proactive: keep track of all individuals on the network and account all activity to network-wide user-ids. • Reactive: No global accounting of users is attempted until a problem arises. Then the activity is traced back to its source. 4 Department of Computer Science, The University of Houston Proactive Method • DIDS System developed at UC Davis is one example, • Ideal for – A local area network, or – A wide area network under a central administration • Not feasible for the Internet. 5 Department of Computer Science, The University of Houston Reactive Method • CIS (Caller Identification System) is an example. • One tracing system per network host, (each host is responsible for finding the predecessor in the chain) • Tracing is accomplished by the hosts communicating in some was to establish the whole chain. (Global coordination?) 6 Department of Computer Science, The University of Houston Caller ID • Host based system, • If an intruder hops through intermediate hosts prior to making an attack, there is a high probability that these systems have known vulnerabilities which the intruder used to access them. • Having the knowledge of the same attack methods that the intruder did, Caller ID reversed the attack chain. 7 Department of Computer Science, The University of Houston Host based solutions • The difficulty with all host-based tracing systems is that, when an extended connection crosses a host which is not running the system, accountability is altogether lost at that point. • This severely limits their usefulness as a general purpose tracing Mechanism on the Internet. 8 Department of Computer Science, The University of Houston Thumbprint Method • Assumptions: – The content of an extended connection is invariant at all points of the chain (once protocol details are abstracted out). – We can compute summaries (thumbprints) of the content of each connection. – Similarity in content similarity in summaries. 9 Department of Computer Science, The University of Houston Thumbprints • • • • Main idea of thumbprints Properties Difficulties Applicability 10 Department of Computer Science, The University of Houston Main idea of thumbprints • A small quantity of data that summarizes a certain section of a connection. • Suppose C1, and C2 are two different Connection, we have T1 = summary (C1), T2 = summary (C2), and T1 ≠ T2. • If C1, and C2 are two different sections of one connection, then T1 = T2 . 11 Department of Computer Science, The University of Houston Thumbprints • The ideal is to find a summary function of the connection which uniquely distinguishes a given connection from all other unrelated connections, but has the same value over two connections which are related by being links in the same connection chain. • By comparing the summaries, we can find all pieces of a chain. 12 Department of Computer Science, The University of Houston How does it work? • All components of the system must routinely store thumbprints of all connections going through it. • In the event an intrusion being detected, it is possible to trace back by comparing the thumbprints from the hosts or network. 13 Department of Computer Science, The University of Houston Discussions • TCP only in this paper. • May be extended to UDP. • Lengthy connections should be broken up into time intervals, and each interval separately thumbprinted. Interval size of 1 minute is suggested. 14 Department of Computer Science, The University of Houston Main idea of thumbprints • Suppose we focus on TCP connections (Telnet, or Rlogin) Intruder direction CP5 C1 CP4 C1 CP3 C1 CP2 C1 CP1 Trace back direction 15 Department of Computer Science, The University of Houston Thumbprints • • • • Main idea of thumbprints Properties Difficulties Applicability 16 Department of Computer Science, The University of Houston Desirable Properties • • • • • Small, Sensitive, Robust, Additive, Light. 17 Department of Computer Science, The University of Houston Small Space • Need to keep a log on all connections, • Requires little space to minimize storage needs, 18 Department of Computer Science, The University of Houston Sensitive • The probability that two unrelated pieces of connection will be close together in thumbprint space should be as small as possible. • A small change in the content results in a different thumbprint. (Same idea as hashing). 19 Department of Computer Science, The University of Houston Robust • Thumbprints should change as little as possible when the connection gets distorted by the kinds of errors that are likely in practice. 20 Department of Computer Science, The University of Houston Additive • Successive thumbprints can be combined into a thumbprint for a longer interval. • If thumbprints of 1 minute is not enough, merge two 1-minute thumbprints into a 2-minute thumbprint. 21 Department of Computer Science, The University of Houston Light • It should not cost too much to – Create the thumbprints – Compare thumbprints 22 Department of Computer Science, The University of Houston Thumbprints • • • • Main idea of thumbprints Properties Difficulties Applicability 23 Department of Computer Science, The University of Houston Difficulties • Clock skew – It is essential that synchronization errors be much smaller than the thumbprints interval. • Propagation delays – Thumbprints may contain slightly different data in different places because the connections they are measuring are delayed by propagation times. • Loss of Characters – Not have the rights to access the error and flow control of TCP, so if it lose some characters, thumbprints can not recover them. • Packetization variation – Packetatization, and timing of packet transmission are variant at different points in the connection, so this causes difficult to make thumbprints. 24 Department of Computer Science, The University of Houston Some possible solutions • Check sum • Compression 25 Department of Computer Science, The University of Houston Thumbprints • • • • Main idea of thumbprints Properties Difficulties Applicability 26 Department of Computer Science, The University of Houston Applicability • Inside intruder capture • Decide if this site is used as stepping-stone 27 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 28 Department of Computer Science, The University of Houston Definitions of Local thumbprints • Sequence of Characters a1, a2, a3, …, an Function : take a character as a argument, and returns a short vector of real numbers. д ℜK Thumbprint : 1 n T n (ai ) i 1 T is a vector of short fixed length K. 29 Department of Computer Science, The University of Houston Frequency (K=26) (‘a’) = (1, 0, …, 0) (‘b’) = (0, 1, …, 0) … (‘z’) = (0, 0, …, 1) K = 128 for 7-bit ASCII 30 Department of Computer Science, The University of Houston Count (K=1) (‘a’) = 1 (‘b’) = 1 … (‘z’) = 1 31 Department of Computer Science, The University of Houston Definitions of Local thumbprints • Comments – Locality: it only depends locally on the character steam. – Robustness: if a2 is lost, only (a2) is affected. – Additivity: it is obviously satisfied. – Small: just a few real numbers. – Light: it is cheap to compute, because can be stored in a lookup table. – Is it sensitivity? 32 Department of Computer Science, The University of Houston Definitions of Local thumbprints • Another form of definitions or higher orders are also possible a digram thumbprint: 1 nk T (ai , ai k ) n k i 1 For k = 1, we are counting the frequencies of aa, ab, ac, …, zz (262 of them). 33 Department of Computer Science, The University of Houston Definitions of Local thumbprints • Comments – Digram, trigram, or even more characters thumbprint are more sensitive than single character. – Because they capture the order of the characters in a sequence. – We still use single character scheme because experiments suggest that it makes little difference. 34 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 35 Department of Computer Science, The University of Houston Overview of Experiments • Settings – Program with C++ code. – Sun 4/280 on ethernet LANs. – Monitor each packet and associate it with pair of machines and ports. – Reconstruct the data flow connections. 36 Department of Computer Science, The University of Houston Overview of Experiments • Several key points – Mask all characters down to 7 bits. – Set the weight of ASCII 24 to 0 – Execute a program to simulate a human’s action on typing command – Use one week’s data to analyze 37 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 38 Department of Computer Science, The University of Houston Concept Experiments Thumbprints in concept experiment 1 2 3 4 5 6 P1 38.2 6741.1 6975.7 2587.2 3446.5 2451.2 P2 11.8 13505.5 92.8 2569.7 3388.7 2446.0 ? × × √ √ √ Total count of characters in each minute. 39 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 40 Department of Computer Science, The University of Houston Principal Component Analysis • Given a series of vectors and how to find a set of linear combinations of the components which explains the maximal proportion of the variance of the vector. – Computing the covariance matrix of the vectors – Get the eigenvalues and eigenvectors. – The eigenvector corresponding to the largest eigenvalue represents the linear combination of the data which has the most variance. And the eigenvalue is the largest variance. – Repeat the above step, we can find K principal components (eigenvalues) 41 Department of Computer Science, The University of Houston Thumbprint Function • Given the vector of character frequencies for a particular period of some connection is – f = (f1, f2, f3,…, fL) – The thumbprint can be written as a linear combination L T j j (a) f a a 1 42 Department of Computer Science, The University of Houston Thumbprint Function • So we condense the vector of L character counts into a vector of K thumbprint components. • Which linear combinations of the fi should be used? – Use the PCA statistic method. 43 Department of Computer Science, The University of Department of Computer Science, The University of Houston Houston PCA 1 2 3 4 5 6 … L 1 2 3 … K 44 Department of Computer Science, The University of Houston Thumbprint Function 45 Department of Computer Science, The University of Houston Thumbprint Function 32 = space 46 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 47 Department of Computer Science, The University of Houston Comparison Algorithm • Comparison is complicated because of two points: – Have to cope with displacements of some characters across interval boundaries. – Existence of noise in the data due to dropped packets 48 Department of Computer Science, The University of Houston Comparison Algorithm K i (C , C ' ) log( Tk (C ' , t ) Tk (C , t ) ) k 1 • Dead hit: if the thumbprints are exactly the same, the value is zero. • Generally, any dead hits are very strong grounds for suspecting that the two connections have identical content. 49 Department of Computer Science, The University of Houston Local Thumbprints • • • • • • • Definitions of Local thumbprints Overview of experiments Concept Experiments Thumbprint function Comparison Algorithm Tests of Thumbprints Applicability Beyond Ethernet 50 Department of Computer Science, The University of Houston Tests of Thumbprints • Experiment I: – ToadflaxK2toadflax • Experiment II – K2toadflaxk2 • Experiment III – Toadflaxk2helvellynalps.cc.gatech.eduk2toadflax • Experiment IV – Toadflaxk2helvellynpo.csc.liv.ac.ukalps.cc.gatech.eduk2t oadflax 51 Department of Computer Science, The University of Houston Tests of Thumbprints Experiments result (%) errors 52 Department of Computer Science, The University of Houston Conclusions • It is easily possible, an Ethernet, to save summaries of interactive connections which can be stored in only a few tens of bytes per minutes per connection. • We are also studying ways to break up the connection into pieces that do not depend on time, but rather on content based triggers. • Success at this would obviate the need to synchronize geographically separated thumbprint stations. 53
© Copyright 2025 Paperzz