MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar [email protected] Army High Performance Computing Research Center University of Minnesota http://www.cs.umn.edu/research/minds/ Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, Aleksandar Lazarevic, Michael Steinbach, George Simon, Varun Chandola, Mark Shaneck, Jaideep Srivastava, Zhi-Li Zhang, Yongdae Kim, Vipin Kumar AHPCRC 1 Information Assurance Sophistication of cyber attacks and their severity is increasing 90000 80000 70000 60000 50000 ARL, the Army, DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber terrorists 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 2000 11 2001 12 2002 13 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 Cyber strategies can be a major force multiplier and equalizer Across DoD, computer assets have been compromised, information has been stolen, putting technological advantage and battlefield superiority at risk Incidents Reported to Computer Emergency Response Team/Coordination Center Security mechanisms always have inevitable vulnerabilities Firewalls are not sufficient to ensure security in computer networks Insider attacks Spread of SQL Slammer worm 10 minutes after its deployment AHPCRC 2 Information Assurance Intrusion Detection System – Combination of software and hardware that attempts to perform intrusion detection – Raises the alarm when possible intrusion happens • Traditional intrusion detection system IDS tools are based on signatures of known attacks Limitations – Signature database has to be manually revised for each new type of discovered intrusion – Substantial latency in deployment of newly created signatures across the computer system – They cannot detect emerging cyber threats – Not suitable for detecting policy violations and insider abuse – Do not provide understanding of network traffic – Generate too many false alarms Example of SNORT rule (MS-SQL “Slammer” worm) any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send") www.snort.org AHPCRC 3 Data Mining for Intrusion Detection Increased interest in data mining based intrusion detection – Attacks for which it is difficult to build signatures – Unforeseen/Unknown/Emerging attacks • Misuse detection – Building predictive models from labeled labeled data sets (instances are labeled as “normal” or “intrusive”) to identify known intrusions – High accuracy in detecting many kinds of known attacks – Cannot detect unknown and emerging attacks • Anomaly detection – Detect novel attacks as deviations from “normal” behavior – Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies AHPCRC 4 Data Mining for Intrusion Detection Training Set Tid SrcIP Start time Dest IP Dest Port Number Attack of bytes 1 206.135.38.95 11:07:20 160.94.179.223 139 192 No 2 206.163.37.95 11:13:56 160.94.179.219 139 195 No 3 206.163.37.95 11:14:29 160.94.179.217 139 180 No 1 206.163.37.81 11:17:51 160.94.179.208 150 ? 4 206.163.37.95 11:14:30 160.94.179.255 139 199 No 2 206.163.37.99 11:18:10 160.94.179.235 208 ? 5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes 3 206.163.37.55 11:34:35 160.94.179.221 195 ? 6 206.163.37.95 11:14:35 160.94.179.253 139 177 No 4 206.163.37.37 11:41:37 160.94.179.253 199 ? 7 206.163.37.95 11:14:36 160.94.179.252 139 172 No 5 206.163.37.41 11:55:19 160.94.179.244 181 ? 8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes 9 206.163.37.95 11:14:41 160.94.179.250 139 195 No 10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes Tid SrcIP Start time Number Attack of bytes Dest Port Test Set Misuse Detection – Building Predictive Models Key Technical Challenges Large data size High dimensionality Temporal nature of the data Skewed class distribution Data preprocessing On-line analysis 10 Summarization of attacks using association rules Learn Classifier Model Anomaly Detection Rules Discovered: {Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK} AHPCRC 5 Data Mining for Intrusion Detection Training Set Tid SrcIP Start time Dest IP Dest Port 1 206.135.38.95 11:07:20 160.94.179.223 139 192 No 2 206.163.37.95 11:13:56 160.94.179.219 139 195 No 3 206.163.37.95 11:14:29 160.94.179.217 139 180 No 1 206.163.37.81 11:17:51 160.94.179.208 150 ? 4 206.163.37.95 11:14:30 160.94.179.255 139 199 No 2 206.163.37.99 11:18:10 160.94.179.235 208 ? 5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes 3 206.163.37.55 11:34:35 160.94.179.221 195 ? 6 206.163.37.95 11:14:35 160.94.179.253 139 177 No 4 206.163.37.37 11:41:37 160.94.179.253 199 ? 7 206.163.37.95 11:14:36 160.94.179.252 139 172 No 5 206.163.37.41 11:55:19 160.94.179.244 181 ? Tid 8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes 9 206.163.37.95 11:14:41 160.94.179.250 139 195 No 10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes SrcIP Start time Number Attack of bytes Dest Port Test Set 10 Summarization of attacks using association rules Rules Discovered: Learn Classifier Model Anomaly Anomaly Detection Detection {Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK} AHPCRC Misuse Detection – Building Predictive Models Number Attack of bytes 6 Key Technical Challenges Large data size High dimensionality Temporal nature of the data Skewed class distribution Data preprocessing On-line analysis MINDS – Minnesota INtrusion Detection System MINDS system Anomaly scores network Data capturing device Net … … tcpdump Detected novel attacks Summary and characterization of attacks Human analyst Labels Feature Extraction Known attack detection Detected known attacks Data mining based intrusion detection system Incorporated into Interrogator architecture at ARL Center for Intrusion Monitoring and Protection (CIMP) Anomaly detection flow tools Filtering Association pattern analysis Helps analyze data from multiple sensors at DoD sites around the country MINDS anomalies are used as the primary key when viewing related alerts from other tools (SNORT, Jids, etc.) MINDS is the first effective anomaly intrusion detection system used by ARL Routinely detects attacks and intrusive behavior not detected by widely used intrusion detection systems Insider Abuse / Policy Violations / Worms / Scans AHPCRC 7 Feature Extraction Module • Three groups of features – Basic features of individual TCP connections • • • • • • source & destination IP source & destination port Protocol Duration Bytes per packets number of bytes Features 1 & 2 Features 3 & 4 Feature 5 Feature 6 Feature 7 Feature 8 – Time based features • For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last T seconds – Features 9 (13) • Number of connections from source (destination) IP to the same destination (source) port in last T seconds – Features 11 (15) – Connection based features • For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14) • Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16) AHPCRC 8 Detection of Anomalies on Real Network Data Anomalies/attacks picked by MINDS include scanning activities, worms, and non-standard behavior such as policy violations and insider attacks. Many of these attacks detected by MINDS, have already been on the CERT/CC list of recent advisories and incident notes. Some illustrative examples of intrusive behavior detected using MINDS at U of M • Scans –Detected scanning for Microsoft DS service on port 445/TCP • Undetected by SNORT since the scanning was non-sequential (very slow). Rule added to SNORT in September 2002 –Detected scanning for Oracle server • Undetected by SNORT because the scanning was hidden within another Web scanning –Detected a distributed windows networking scan from multiple source locations • Policy Violations –Identified machine running Microsoft PPTP VPN server on non-standard ports • Undetected by SNORT since the collected GRE traffic was part of the normal traffic –Identified compromised machines running FTP servers on non-standard ports, which is a policy violation • Example of anomalous behavior following a successful Trojan horse attack –Detected computers on the network apparently communicating with outside computers over a VPN or on IPv6 • Worms –Detected several instances of slapper worm that were not identified by SNORT since they were variations of existing worm code –Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with Stacheldract worm (a DDos agent) AHPCRC 9 M I N D S Typical Anomaly Detection Output –January 26, 2003 score 37674.69 26676.62 24323.55 21169.49 19525.31 19235.39 17679.1 8183.58 7142.98 5139.01 4048.49 4008.35 3657.23 3450.9 3327.98 2796.13 2693.88 2683.05 2444.16 2385.42 2114.41 2057.15 1919.54 1634.38 1596.26 1513.96 1389.09 1315.88 1279.75 1237.97 1180.82 srcIP 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 63.150.X.253 142.150.Y.101 200.250.Z.20 202.175.Z.237 63.150.X.253 63.150.X.253 63.150.X.253 142.150.Y.101 63.150.X.253 142.150.Y.236 142.150.Y.101 63.150.X.253 142.150.Y.101 142.150.Y.101 142.150.Y.101 63.150.X.253 142.150.Y.107 63.150.X.253 63.150.X.253 142.150.Y.103 63.150.X.253 63.150.X.253 sPort 1161 1161 1161 1161 1161 1161 1161 1161 1161 1161 0 27016 27016 1161 1161 1161 0 1161 0 0 1161 0 0 0 1161 0 1161 1161 0 1161 1161 (48 hours after the “slammer” worm) dstIP 128.101.X.29 160.94.X.134 128.101.X.185 160.94.X.71 160.94.X.19 160.94.X.80 160.94.X.220 128.101.X.108 128.101.X.223 128.101.X.142 128.101.X.127 128.101.X.116 128.101.X.116 128.101.X.62 160.94.X.223 128.101.X.241 128.101.X.168 160.94.X.43 128.101.X.240 128.101.X.45 160.94.X.183 128.101.X.161 128.101.X.99 128.101.X.219 128.101.X.160 128.101.X.2 128.101.X.30 128.101.X.40 128.101.X.202 160.94.X.32 128.101.X.61 dPort 1434 1434 1434 1434 1434 1434 1434 1434 1434 1434 2048 4629 4148 1434 1434 1434 2048 1434 2048 2048 1434 2048 2048 2048 1434 2048 1434 1434 2048 1434 1434 protocolflags packets bytes 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 1 16 [2,4) [0,1829) 17 16 [2,4) [0,1829) 17 16 [2,4) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 1 16 [2,4) [0,1829) 17 16 [0,2) [0,1829) 1 16 [2,4) [0,1829) 1 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 1 16 [0,2) [0,1829) 1 16 [2,4) [0,1829) 1 16 [2,4) [0,1829) 17 16 [0,2) [0,1829) 1 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 1 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 17 16 [0,2) [0,1829) 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 10 11 12 13 14 15 16 0.81 0 0.59 0 0 0 0 0 0.81 0 0.59 0 0 0 0 0 0.81 0 0.58 0 0 0 0 0 0.81 0 0.58 0 0 0 0 0 0.81 0 0.58 0 0 0 0 0 0.81 0 0.58 0 0 0 0 0 0.81 0 0.58 0 0 0 0 0 0.82 0 0.58 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0.82 0 0.57 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.82 0 0.57 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0 Anomalous connections that correspond to the “slammer” worm Anomalous connections that correspond to the ping scan Connections corresponding to UM machines connecting to “half-life” game servers Summarization Using Association Patterns Ranked connections attack Anomaly Detection System Discriminating Association Pattern Generator normal update Knowledge Base 1. Build normal profile 2. Study changes in normal behavior R1: TCP, DstPort=1863 Attack … 3. Create attack summary 4. Detect misuse behavior 5. Understand nature of the attack AHPCRC 11 … … … R100: TCP, DstPort=80 Normal Typical MINDS Output score 31.2 3.04 15.4 14.4 c1 138 - c2 12 - src IP 218.19.X.168 64.156.X.74 218.19.X.168 134.84.X.129 sPort 5002 ----5002 4770 dst IP dPort 134.84.X.129 4182 xxx.xxx.xxx.xxx----134.84.X.129 4896 218.19.X.168 5002 protocolflags packets bytes 6 27 [5,6) [0,2045) xxx 4 [0,2) [0,2045) 6 27 [5,6) [0,2045) 6 27 [5,6) [0,2045) 1 2 3 0 0.01 0.12 0.48 0.01 7.81 3.09 2.41 6.64 5.6 2.7 4 64 12 1 8 0 134.84.X.129 3890 xxx.xxx.xxx.xxx4729 xxx.xxx.xxx.xxx----218.19.X.168 5002 218.19.X.168 5002 xxx.xxx.xxx.xxx----- 218.19.X.168 5002 xxx.xxx.xxx.xxx----200.75.X.2 ----134.84.X.129 3676 134.84.X.129 4626 xxx.xxx.xxx.xxx113 6 6 xxx 6 6 6 27 [5,6) [0,2045) ------ --------- -------------- --------- [0,2045) 27 [5,6) [0,2045) 27 [5,6) [0,2045) 2 [0,2) [0,2045) 4.39 4.34 4.07 3.49 3.48 3.34 2.46 8 51 0 0 218.19.X.168 218.19.X.168 160.94.X.114 218.19.X.168 218.19.X.168 218.19.X.168 200.75.X.2 134.84.X.129 4571 134.84.X.129 4572 64.8.X.60 119 134.84.X.129 4525 134.84.X.129 4524 134.84.X.129 4159 xxx.xxx.xxx.xxx21 6 6 6 6 6 6 6 27 27 24 27 27 27 2 2.37 2.45 42 58 5 0 xxx.xxx.xxx.xxx21 200.75.X.2 ----- 200.75.X.2 ----xxx.xxx.xxx.xxx21 6 6 5002 5002 51827 5002 5002 5002 ----- 5 6 7 8 9 10 11 12 13 14 15 16 0.01 0.03 0 0 0 0 0 0 0 0 0 0 1 0 0.26 0.58 0 0 0 0 0.07 0.27 0 0 0 0 0 0 0.01 0.01 0.06 0 0 0 0 0 0 0 0 0 0 1 0 0.01 0.01 0.05 0.01 0 0 0 0 0 0 1 0 0 0 0 0 0.01 0.02 0.09 0.02 0 0 0 0 0 0 1 0 0 0 0 0 0.14 0.33 0.17 0.47 0 0 0 0 0 0 0.2 0 0 0 0 0 0.33 0.27 0.21 0.49 0 0 0 0 0 0 0 0 0.28 0.25 0.01 0 0.03 0.03 0.03 0.15 0 0 0 0 0 0 0 0 0 0 0.99 0 0.03 0.03 0.03 0.17 0 0 0 0 0 0 0 0 0 0 0.98 0 0.25 0.09 0.15 0.15 0 0 0 0 0 0 0.08 0 0.79 0.15 0.01 0 0.04 0.05 0.05 0.26 0 0 0 0 0 0 0 0 0 0 0.96 0 0.04 0.05 0.05 0.23 0 0 0 0 0 0 0 0 0 0 0.97 0 0.09 0.26 0.16 0.24 0 0 0 0.91 0 0 0 0 0 0 0 0.06 0.06 0.06 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0 0.06 0.06 0.07 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0 0.06 0.07 0.07 0.37 0 0 0 0 0 0 0 0 0 0 0.92 0 0.19 0.64 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 20 --------- [0,2045) 0.35 ------ --------- [0,2045) 0.19 0.31 0.22 0.57 0 0 0 0 0 0 0 0 0.18 0.28 0.01 0 0.63 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 [5,6) [0,2045) [5,6) [0,2045) [483,-) [8424,-) [5,6) [0,2045) [5,6) [0,2045) [5,6) [0,2045) --------- [0,2045) 4 0 0 0 0 UM computer connecting to a remote FTP server, running on port 5002 Summarized TCP reset packets received from 64.156.X.74, which is a victim of DoS attack, and we were observing backscatter, i.e. replies to spoofed packets Summarization of FTP scan from a computer in Columbia, 200.75.X.2 Summary of IDENT lookups, where a remote computer tries to get user name Summarization of a USENET server transferring a large amount of data AHPCRC 12 Typical MINDS Output prot 6 6 6 6 4 3 2 1 packets bytes flags 0 0 0 0 ---AP--- [24k,124k][20M ,182M ] 0 0 0 0 ---A---- [24k,124k][3M ,5M ] 0 0 0 0 ---AP--- [24k,124k][20M ,182M ] 0.08 0.1 0.1 0.3 ---APRSF[338,379] [15k,17k] --4949 ### ### ### 3989 6 6 6 6 6 6 ---AP-SF ---AP-----AP-SF ---AP--F ---AP--F ---AP-SF 0.36 --[4,4] 0 [24k,124k][3M ,5M ] 0 [24k,124k][20M ,182M ] 0 [24k,124k][20M ,182M ] 0 [24k,124k][20M ,182M ] [217,217] [252k,265k] 0.16 0.4 0 0 0 0 0.2 0.7 0 0 0 0 0.3 --4010 3995 3992 4007 4004 4001 6 6 6 6 6 6 6 ---AP-SF ---AP-SF ---AP-SF ---AP-SF ---AP-SF ---AP-SF ---AP-SF [4,4] [217,217] [217,217] [217,217] [217,217] [218,234] [217,217] 0.37 --[252k,265k] 0.16 [252k,265k] 0.16 [252k,265k] 0.16 [252k,265k] 0.16 [265k,309k] 0.16 [252k,265k] 0.16 0.4 0.2 0.2 0.2 0.2 0.2 0.2 ----- 6 6 ---AP-SF [4,4] ---AP-SF [4,4] 0.38 0.39 0.4 0.4 score 611 348 24 11 c1 - c2 - src IP 128.118.x.96 160.94.x.50 128.101.x.33 24.223.x.59 sPort 873 4529 20 1135 dPort dst IP 160.94.x.50 4529 128.118.x.96 873 200.95.x.225 5001 554 160.94.x.1 7.8 10 9.6 9.5 9.5 9.4 11 - 0 - x.x.x.x 128.101.x.173 128.101.x.113 192.18.x.40 192.18.x.40 24.33.x.62 8200 22 20 ### ### 2011 160.94.x.154 24.26.x.13 81.168.x.40 134.84.x.19 134.84.x.19 160.94.x.150 7.8 9.1 9.1 9.1 9 8.9 8.9 13 - 1 - x.x.x.x 24.33.x.62 24.33.x.62 24.33.x.62 24.33.x.62 24.33.x.62 24.33.x.62 8200 2011 2011 2011 2011 2011 2011 134.84.x.21 160.94.x.150 160.94.x.150 160.94.x.150 160.94.x.150 160.94.x.150 160.94.x.150 5.7 7.3 10 # 63.251.x.177 27 7 66.151.x.190 8200 x.x.x.x 8200 x.x.x.x --[559,559] 5 6 7 9 8 14 13 12 11 10 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1 0 0 0 0 0.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 0.2 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.7 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.1 0.1 0.1 0.1 0.1 0.1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.3 0.7 0.4 0 0 0 0.2 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 UM computers doing bulk transfers Attack on Real-Media server (Reported by CERT on September 9, 2003, RealNetworks media server RTSP protocol parser buffer overflow) 8200/tcp traffic related to gotomypc.com which allows users to remotely control a desktop (involves a third party) Mysterious traffic currently being investigated AHPCRC 13 Typical MINDS Output score c1 57973 - c2 - 6530 3227 1534 19.3 9 67 src IP 128.101.X.1 sPort dst IP dPort protocolflags packets bytes 1 56025 192.67.X.205 22 tcp ---A P --- [32k,1M ] [8M ,1765M ] 141.213.X.100 4354 160.94.X.142 59999 192.67.X.206 43710 128.101.X.1 22 160.94.X.142 59999 141.213.X.100 4354 193.62.X.38 ----160.94.X.132 ----- 14.9 23 81 134.84.X.117 ----26.6 81 258 208.2.X.101 ----88.2 5 1 208.2.X.101 ----143 160.94.X.132 35755 xxx.xxx.xxx.xxx----xxx.xxx.xxx.xxx 139 xxx.xxx.xxx.xxx 139 193.62.X.38 45288 57 216.196.X.78 ----- 9 10 11 12 13 14 15 16 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0.1 0 0 0 0.1 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0 0 tcp tcp tcp tcp ---A P --- --------- --------- 0.3 0.2 0 0 0.3 0.3 0.3 0.3 0.1 0 0 0 0.5 0.4 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0.1 0 0 1 0.1 0.1 0 0 0 0 0 0 0.1 0.1 0 0 0 0 0 0 0.1 0.1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.3 0.3 0 0.3 0.2 0 0.3 0.5 0 0 0 0 0.5 0 0 0 0.3 0 0 0 1 0.2 0.1 0 0.1 0.1 0 0 0.1 0 0.1 0.1 0 0 0 0 0 0.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.3 0.3 0.4 0 0 0 0 0.2 0 0 0 0.2 0 0 0 ------S - [4,4] --------[200,200] ---A ---F [32k,1M ] [1M ,3M ] ------S - [4,4] ---A -R -- --------- --------- 73 xxx.xxx.xxx.xxx 8 0 0 0 0.5 tcp 12.1 23 7 0 0 0 0.3 ---A P -S - [32k,1M ] [8M ,1765M ] 62727 50789 6881 2355 6 0 0 0 0.3 tcp tcp tcp tcp 67.40.X.170 65.221.X.2 134.84.X.43 160.94.X.1 5 0 0 0 0 0 ---A P --- [32k,1M ] [8M ,1765M ] 0 ---A --S F [32k,1M ] [3M ,8M ] 0 ---A --S F --------- --------0.3 58.9 54 34.4 28.4 134.84.X.2 554 128.101.X.39 54906 62.70.X.101 17534 220.120.X.249 15074 4 0 ---A P -S F [32k,1M ] [8M ,1765M ] ---A ---- [32k,1M ] [1M ,3M ] - 3 0 tcp tcp tcp tcp 117 144.34.X.164 1676 128.101.X.190 22 tcp 13.4 4 31 128.101.X.204 ----xxx.xxx.xxx.xxx----tcp 12.3 11 101 xxx.xxx.xxx.xxx----134.84.X.117 ----tcp - 2 0 ---A ---F --------- -----------A P --- --------- -----------A P --- [32k,1M ] [8M ,1765M ] ---A P --- [32k,1M ] [8M ,1765M ] ---A P --- [32k,1M ] [8M ,1765M ] 0 0 0 0 0 0 0 0 0 0 0 0 UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from computers that do not have 57/TCP open. AHPCRC 14 Scan Detection • Despite the importance of scan detection its value is often overlooked – Lack of good tools for scan detection • Existing methods either miss stealth scans or give too many false alarms • Fast scans are easy to catch using existing schemes but stealth scans are very difficult to recognize • MINDS employs our new methodology for detecting network scans – Makes use of powerful new heuristics • Only considers flows with a small number of packets • Only considers scans in a subnet (not the whole internet) – Makes effective use of usage information • Touches to rare IP / port combinations are more suspicious than others • A scanner will hit machines where the service is not available resulting in a low count • Very low False Alarm rate – Evaluation of 36 million flows over a 30-minute window at the University of Minnesota showed 2583 alarms but only 22 false alarms – Evaluation on an hour of data at the ARL showed 1150 scans report, but only 5 false alarms • Routinely finds compromised machines at ARL-CIMP AHPCRC 15 Detecting Suspicious Ports for Possible Worm Activity • We find destinations located within the network for which there is a high connection failure rate on specific ports for inbound, non-scan connections • Then we find ports on which there are many such destinations • The existence of these ports indicates a potential worm or slow scan • This warrants targeted and more detailed data collection and analysis that cannot be done easily on the entire data – Packet content analysis – Signature generation AHPCRC 16 IP / port pairs for which a large percentage of connections failed AHPCRC 17 IP / port pairs for which a large percentage of connections failed (only for ports with many hits) AHPCRC 18 0 1 4 5 16 HP 17 Apple 20 CSC 21 64 65 68 69 80 81 84 85 2 3 GE 6 7 18 MIT 8 9 IBM 10 11 14 15 HP 32 ATT 33 36 34 35 Merit Halliburto Netw orks n 19 Ford 22 23 66 67 70 71 82 83 86 87 25 28 29 72 73 76 77 88 89 92 93 26 27 30 31 74 75 78 79 90 91 94 95 37 48 Prudential 49 52 DuPont 53 Chrysler 96 97 100 101 112 113 116 117 38 PSI 39 50 51 54 Merck 55 98 99 102 103 114 115 118 119 56 57 60 61 104 105 108 109 120 121 124 125 12 ATT 13 Xerox 24 Cable 40 Eli Lily 41 44 Am Rad Digi Com 45 Interop Show Net 42 43 46 47 Nortel 58 59 62 63 106 107 110 111 122 123 126 127 128 129 132 133 144 145 148 149 192 193 196 197 208 209 212 213 130 131 134 135 146 147 150 151 194 195 198 199 210 211 214 215 136 137 140 141 152 153 156 157 200 201 204 205 216 217 220 221 138 139 142 143 154 155 158 159 202 203 206 207 218 219 222 223 160 161 164 165 176 177 180 181 224 225 228 229 240 241 244 245 162 163 166 167 178 179 182 183 226 227 230 231 242 243 246 247 168 169 173 184 185 188 189 232 233 236 237 248 249 252 253 175 186 187 190 191 234 235 238 239 250 251 254 255 172 AOL 170 171 174 APNIC (Asia) US Military RIPE (Europe) USPS IANA Reserved Private Use LACNIC (Lat. Am.) ARIN Loopback Japan Inet UK Government Public Data Network SITA (French) AHPCRC 19 Multicast 999 unique sources (Min:1, Max:28, Avg:1) 1126 unique destinations (Min:1, Max:55, Avg:1) 1516 total flows involved 1472 scan flows on port 80 (found by scan detector) 7982 unique sources (Min:1, Max:16, Avg:1) 6184 unique destinations (Min:1, Max:28, Avg:1) 9930 total flows involved 9406 scan flows on port 445 (found by scan detector) Clustering • Useful for detecting modes of behavior – Shared Nearest Neighbor (SNN) clustering works quite well at determining modes of behavior • Not distracted by “noise” in the data • • SNN is CPU intensive, O(N^2) Requires storing an N x K matrix – K (number of neighbors) is typically between 10 – 20 – K should be about the size of the smallest expect mode • • • Clustered 850,000 connections collected over one hour at one US Army Fort Took 10 hours using 3 Quad 2.8 Ghz Servers, and 4 2 Ghz workstations (total of 16 CPUs) Required around 100 Meg of memory per PE for the distance calculations – 500 Meg of memory for the final clustering step on a single PE • Found 3135 clusters – Largest clusters around 500 records, smallest cluster 10 records AHPCRC 24 Detecting Large Modes of Network Traffic Using Clustering Large clusters of VPN traffic (hundreds of connections) Used between forts for secure sharing of data and working remotely Start Time Duration 20040407.10:00:00.428036 0:00:00 20040407.10:00:00.685520 0:00:03 20040407.10:00:00.748920 0:00:00 20040407.10:01:44.138057 0:00:00 20040407.10:01:59.267932 0:00:00 20040407.10:02:44.937575 0:00:01 20040407.10:04:00.717395 0:00:00 20040407.10:04:30.976627 0:00:01 20040407.10:04:46.106233 0:00:00 20040407.10:05:46.715539 0:00:00 20040407.10:06:16.975202 0:00:01 20040407.10:06:32.105013 0:00:00 Src IP A A A A A A A A A A A A Src Port -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Dst IP B B B B B B B B B B B B Dst Port -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Proto gre gre gre gre gre gre gre gre gre gre gre gre TTL Packets Bytes 237 1 556 237 1 556 237 1 556 237 1 556 237 1 96 237 1 556 237 1 556 237 1 556 237 1 556 237 1 556 237 1 556 237 1 556 Start Time Duration 20040407.10:00:40.685522 0:00:03 20040407.10:00:58.748922 0:00:00 20040407.10:01:44.138059 0:00:00 20040407.10:02:14.678442 0:00:00 20040407.10:02:44.937577 0:00:01 20040407.10:03:15.308206 0:00:00 20040407.10:04:30.976629 0:00:01 20040407.10:06:16.975204 0:00:01 20040407.10:06:32.105015 0:00:00 20040407.10:06:47.234837 0:00:00 20040407.10:07:02.367471 0:00:00 20040407.10:07:17.494574 0:00:00 Src IP B B B B B B B B B B B B Src Port -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Dst IP A A A A A A A A A A A A Dst Port -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Proto gre gre gre gre gre gre gre gre gre gre gre gre TTL packets Bytes 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 237 1 96 Detecting Unusual Modes of Network Traffic Using Clustering Clusters Involving GoToMyPC.com (Army Data) Policy violation, allows remote control of a desktop Start Time Duration 20040407.10:00:10.428036 0:00:00 20040407.10:00:40.685520 0:00:03 20040407.10:00:58.748920 0:00:00 20040407.10:01:44.138057 0:00:00 20040407.10:01:59.267932 0:00:00 20040407.10:02:44.937575 0:00:01 20040407.10:04:00.717395 0:00:00 20040407.10:04:30.976627 0:00:01 20040407.10:04:46.106233 0:00:00 20040407.10:05:46.715539 0:00:00 20040407.10:06:16.975202 0:00:01 20040407.10:06:32.105013 0:00:00 Src IP A A A A A A A A A A A A Src Port 4125 4127 4138 4141 4143 4149 4163 4172 4173 4178 4180 4181 Dst IP B B B B B B B B B B B B Dst Port 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 Proto tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp TTL 123 123 123 123 123 123 123 123 123 123 123 123 Flags Packets Bytes ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 ***AP*SF 5 248 Start Time Duration 20040407.10:00:40.685522 0:00:03 20040407.10:00:58.748922 0:00:00 20040407.10:01:44.138059 0:00:00 20040407.10:02:14.678442 0:00:00 20040407.10:02:44.937577 0:00:01 20040407.10:03:15.308206 0:00:00 20040407.10:04:30.976629 0:00:01 20040407.10:06:16.975204 0:00:01 20040407.10:06:32.105015 0:00:00 20040407.10:06:47.234837 0:00:00 20040407.10:07:02.367471 0:00:00 20040407.10:07:17.494574 0:00:00 Src IP B B B B B B B B B B B B Src Port 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 8200 Dst IP A A A A A A A A A A A A Dst Port 4127 4138 4141 4145 4149 4153 4172 4180 4181 4182 4183 4184 Proto TTL tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 tcp 123 Flags packets Bytes ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 ***AP*SF 4 211 Detecting Unusual Modes of Network Traffic Using Clustering Clusters involving mysterious ping and SNMP traffic Start Time Duration 20040407.10:01:00.181261 0:00:00 20040407.10:01:23.183183 0:00:00 20040407.10:02:54.182861 0:00:00 20040407.10:03:03.196850 0:00:00 20040407.10:04:45.179841 0:00:00 20040407.10:06:27.180037 0:00:00 20040407.10:09:48.420365 0:00:00 20040407.10:11:04.420353 0:00:00 20040407.10:11:30.420766 0:00:00 20040407.10:12:47.421054 0:00:00 20040407.10:13:12.423653 0:00:00 20040407.10:14:53.420635 0:00:00 Src IP A A A A A A A A A A A A Src Port 1176 -1 1514 -1 -1 -1 -1 3013 -1 3329 -1 -1 Dst IP B B B B B B B B B B B B Start Time Duration 20040407.10:01:00.181488 0:00:00 20040407.10:01:23.183291 0:00:00 20040407.10:01:55.180590 0:00:00 20040407.10:02:54.184537 0:00:00 20040407.10:03:03.196958 0:00:00 20040407.10:04:45.179965 0:00:00 20040407.10:05:09.180542 0:00:00 20040407.10:06:27.180159 0:00:00 20040407.10:09:48.420410 0:00:00 20040407.10:11:30.420773 0:00:00 20040407.10:13:12.423663 0:00:00 20040407.10:14:53.421019 0:00:00 Src IP B B B B B B B B B B B B Src Port 161 -1 161 161 -1 -1 161 -1 -1 -1 -1 -1 Dst IP A A A A A A A A A A A A Dst Port 161 -1 161 -1 -1 -1 -1 161 -1 161 -1 -1 Dst Port 1176 -1 1326 1514 -1 -1 1927 -1 -1 -1 -1 -1 Proto udp icmp udp icmp icmp icmp icmp udp icmp udp icmp icmp Proto udp icmp udp udp icmp icmp udp icmp icmp icmp icmp icmp TTL ICMP Type ICMP Code # Packets # Bytes 123 1 95 123 8 0 1 84 123 1 95 123 8 0 1 84 123 8 0 1 84 123 8 0 1 84 123 8 0 1 84 123 1 95 123 8 0 1 84 123 1 95 123 8 0 1 84 123 8 0 1 84 TTL ICMP Type ICMP Code # Packets # Bytes 63 1 103 254 0 0 1 84 63 1 234 63 1 134 254 0 0 1 84 254 0 0 1 84 63 1 234 254 0 0 1 84 254 0 0 1 84 254 0 0 1 84 254 0 0 1 84 254 0 0 1 84 Detecting Unusual Modes of Network Traffic Using Clustering Clusters involving unusual repeated ftp sessions Further investigations revealed a misconfigured Army computer was trying to contact Microsoft Start Time 20040407.10:10:57.097108 20040407.10:11:27.113230 20040407.10:11:37.111176 20040407.10:11:57.118231 20040407.10:12:17.125220 20040407.10:12:37.132428 20040407.10:13:17.146391 20040407.10:13:37.153713 20040407.10:14:47.178228 20040407.10:15:47.199100 20040407.10:16:07.206450 Duration 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 0:00:00 Start Time Duration 20040407.10:00:06.627895 0:00:01 20040407.10:00:16.633872 0:00:01 20040407.10:00:36.638794 0:00:01 20040407.10:01:16.652664 0:00:01 20040407.10:01:26.659694 0:00:01 20040407.10:01:56.666816 0:00:01 20040407.10:02:06.670680 0:00:01 20040407.10:02:56.687932 0:00:01 20040407.10:03:26.698413 0:00:01 20040407.10:04:06.712495 0:00:01 20040407.10:05:06.733731 0:00:01 20040407.10:06:16.758442 0:00:01 Src IP A A A A A A A A A A A Src IP B B B B B B B B B B B B Src Port 3004 3007 3008 3011 3013 3015 3020 3022 3031 3040 3042 Src Port 21 21 21 21 21 21 21 21 21 21 21 21 Dst IP B B B B B B B B B B B Dst IP A A A A A A A A A A A A Dst Port 21 21 21 21 21 21 21 21 21 21 21 Dst Port 2924 2925 2927 2932 2933 2937 2938 2944 2947 2952 2961 2969 Proto tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp Proto tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp tcp TTL 123 123 123 123 123 123 123 123 123 123 123 123 TTL 123 123 123 123 123 123 123 123 123 123 123 Flags packets Bytes ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 ***AP*SF 7 318 Flags ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF ***AP*SF packets Bytes 7 7 7 7 7 7 7 7 7 7 7 7 449 449 449 449 449 449 449 449 449 449 449 449 MINDS: CRITICAL TOTOCOMPLETE FUNCTIONALITY MINDS: CRITICAL COMPLETE FUNCTIONALITY Header Analysis Scans with Automatic Virus Attacks Packet-Based Signature Detection Behavior Analysis Simple Scans Viruses and Worms (MINDS) Scans with Target Responses Anomaly Detection and New Attacks Compromises Session-Based Signature Detection New and Variant Attacks Army Research Laboratory (ARL), supported by the AHPCRC and the MINDS initiative, successfully monitors and analyzes network data to protect ARL and its Army and DoD customer infospace Current MINDS Research and Development Work • Correlation of suspicious events across network sites – Helps detect sophisticated attacks not identifiable by single site analyses – Scalable anomaly detection – Distributed correlation algorithms – Grids & middleware • Analysis of long term data (months/years) – Uncover suspicious stealth activities (e.g. insiders leaking/modifying information) M I N D S M I N D S M I N D S M I N D S AHPCRC M I N D S 30
© Copyright 2026 Paperzz