Slides - Splunk .conf

Cra$ing Geoloca.onal Deviance as an Indicator of Aggression Andrew Hunt Bechtel CIRT 08 October 2014 Beau.ful Data Delivery Indexers Forwarders Syslog Receivers 2 Old Query Architecture Oregon England Chicago Search Heads Virginia Singapore Australia Middle East Indexers Chile 3 New Query Architecture Oregon Chicago Search Heads Virginia Indexers 4 Data Distribu.on 5 The Problem •  Dude has a grudge •  ASacks senior personnel of target corpora.ons and their families •  Seeks notoriety through embarrassment •  Poten.al loss of sensi.ve data –  Blackmail –  Loss of business –  In general… bad 6 A Poten.al Solu.on •  Track high-‐value persons likely to be targeted •  Watch when their account access loca.ons vary from their normal places –  May indicate an external party trying to access their account –  Will definitely show when HVPs are moving –  Gives us a heads up to watch when they might be targeted 7 Employ a Mathema.cal Model •  Access aSempts produce traceable ar.facts –  La.tude and longitude of the accessing IP address –  Time associated with the connec.on •  Gaussian curves can calculate variance of values in one dimension (one variable) –  Covariant theorems can be used to apply Gaussian mathema.cs to mul.ple dimensions •  Model a user’s normal (aka “average”) access behavior. This becomes their “center” over the .me horizon queried •  Look for deviances from that normal behavior. This could be the remote aSacker aSemp.ng access 8 Assump.ons •  ASacker will not tunnel to an exit point at the specific loca.on of the user (w/i a city for US) •  ASacker will not become the majority player of the targeted users account –  Could throw the center of loca.on, but would s.ll likely alert –  Not if the user doesn’t access their account that day •  One day is large enough to generate a nexus of loca.on •  Efficacy of this solu.on is not validated. It’s just a cool idea that might work. We’ll see 9 Geoloca.ng Source IPs earliest=-‐30d index=bro_hSp (”public_domain1" OR ”public_domain2” OR ”public_domain.co.uk") (hv_user1 OR hv_user2 OR hv_user3 OR hv_user4) | fields src_ip, _.me | geoip src_ip | rex field=_raw "(?mi)(username|user|login)=(?<uname>.{0,35}?)&" | table * 10 Break it Down earliest=-‐30d index=bro_hSp (”public_domain1" OR ”public_domain2” OR ”public_domain.co.uk") (hv_user1 OR hv_user2 OR hv_user3 OR hv_user4) • 
• 
• 
• 
Search BRO HTTP logs Limit to a 30 day .me horizon First parenthe.cal set: filter to just Bechtel login domains Second parenthe.cal set: filter for the high-‐value persons 11 | fields src_ip, _.me | geoip src_ip | rex field=_raw "(?mi)(username|user|login)=(?<uname>.{0,35}?)&" | table * •  Increase performance by limi.ng the number of extracted fields with ‘fields’ •  Order is important to reduce drag on the system. Perform the ‘geoip’ lookup only a$er filtering the result set to a minimum number of events before you must have something that GEOIP provides •  Extract the username with REX •  Now present the remaining fields as a table so the Google Map visualizer will render it correctly 12 Visualize Centers of Ac.vity 13 Visualize Centers of Ac.vity That’s
no
of our t one sites 14 Count Loca.ons to Find Outliers 15 Count Loca.ons to Find Outliers 16 Sta.s.cs Apply to Geoloca.on, Too • 
• 
• 
• 
We can count by loca.on Can also calculate standard devia.on Can calculate a threshold Which means… WE CAN CALCULATE WHAT DOESN’T FIT 17 Standard Devia.on Applies 70 60 50 Open 40 High 30 Low 20 Close 10 0 1/5/02 1/6/02 1/7/02 1/8/02 1/9/02 18 Standard Devia.on Applies 70 60 50 Open 40 30 High That’s OK Low 20 Close 10 0 1/5/02 1/6/02 1/7/02 1/8/02 1/9/02 19 Standard Devia.on Applies 70 60 THAT’S WEIRD! 50 Open 40 High 30 Low 20 Close 10 0 1/5/02 1/6/02 1/7/02 1/8/02 1/9/02 20 Meshing Matrices •  A mul.-‐dimensional Gauss curve can be computed to compare la.tude and longitude •  It’s hard to visualize in two dimensions …but I’ll try… 21 Gauss in Mul.ple Dimensions merideth bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan 22 Gauss in Mul.ple Dimensions merideth bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan 23 Gauss in Mul.ple Dimensions merideth mary bob 50 0 -‐50 -‐100 -‐150 lat judy long BLUE
juan threshold lat is the
la
long value .tudthreshold e 24 Gauss in Mul.ple Dimensions merideth mary bob 50 0 -‐50 -‐100 -‐150 judy juan lat Whe
re
long that blue ex
a
ce d
lat s is DE rea is wh ethreshold VIAN
ere l green, a.tu
T fro
de long m ththreshold e no
rm 25 Gauss in Mul.ple Dimensions merideth bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan 26 Gauss in Mul.ple Dimensions merideth bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan 27 Gauss in Mul.ple Dimensions The are
a where
exceedmerideth DEVIAN s purple ries d T fr norm al
longitoum
de bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan 28 Gauss in Mul.ple Dimensions merideth bob 50 0 -‐50 -‐100 -‐150 judy lat long threshold lat threshold long mary juan It’s hard for humans to see, but is no problem for a sta.s.cs engine 29 Calculate Outliers With Stats • 
• 
• 
• 
Calculate the average Calculate the standard devia.on Create a calculated threshold = avg + stdev Do this for both la.tude and longitude by user 30 But Wait, There’s More •  Distribute the Gaussian curves to allow for per-‐loca.on, per-‐user deviance –  Allows weigh.ng of the averages and devia.ons •  Allows ‘centers of gravity’ where users spend most of their .me to have greater weight •  Fixes some issues with aggregate devia.on not being enough to alert on even a significant area shi$ •  Add a filter a$er aggrega.on to only show when the loca.ons returned for a user is > 1 –  Removes sta.onary results from display 31 Final Dra$ index=bro_hSp (”public_domain1" OR ”public_domain2” OR ”public_domain.co.uk") (hv_user1 OR hv_user2 OR hv_user3 OR hv_user4) | fields src_ip | bin span=1d _.me | rex field=_raw "(?mi)(username|user|login)=(?<uname>.{0,35}?)(@|&)" | eventstats count as mul._ip by _.me, uname, src_ip | where mul._ip > 1 | stats count as count by _.me, uname, src_ip | geoip src_ip | fillnull value="-‐" | where src_ip_region_name != "-‐" | eventstats sum(count) as count_per_user by _.me, uname | eval src_ip_lat_w_avg = src_ip_la.tude * (count/count_per_user) | eval src_ip_lon_w_avg = src_ip_longitude * (count/count_per_user) | eval src_ip_lat_w_stdev = sqrt((src_ip_la.tude * src_ip_la.tude)/count_per_user) * (count/count_per_user) | eval src_ip_lon_w_stdev = sqrt((src_ip_longitude * src_ip_longitude)/count_per_user) * (count/count_per_user) | eventstats sum(src_ip_lat_w_avg) as src_ip_lat_avg2, sum(src_ip_lon_w_avg) as src_ip_lon_avg2, sum(src_ip_lat_w_stdev) as src_ip_lat_stdev2, sum(src_ip_lon_w_stdev) as src_ip_lon_stdev2 by _.me, uname | eval src_ip_lat_threshold_low2 = src_ip_lat_avg2 -‐ src_ip_lat_stdev2 | eval src_ip_lat_threshold_high2 = src_ip_lat_avg2 + src_ip_lat_stdev2 | eval src_ip_lon_threshold_low2 = src_ip_lon_avg2 -‐ src_ip_lon_stdev2 | eval src_ip_lon_threshold_high2 = src_ip_lon_avg2 + src_ip_lon_stdev2 | where src_ip_la.tude > src_ip_lat_threshold_high2 OR src_ip_la.tude < src_ip_lat_threshold_low2 OR src_ip_longitude > src_ip_lon_threshold_high2 OR src_ip_longitude < src_ip_lon_threshold_low2 | stats count as geo_count by _.me, uname, src_ip_city, src_ip_region_name, src_ip_country_code 32 Wha?? Let’s Bust That Down a Bit index=bro_hSp (”public_domain1" OR ”public_domain2” OR ”public_domain.co.uk") (hv_user1 OR hv_user2 OR hv_user3 OR hv_user4) | fields src_ip | bin span=1d _.me • 
• 
• 
• 
• 
• 
Search the BRO HTTP log No .me horizon is specified, so obey the .me widget 1st parens: Filter for only Bechtel domains 2nd parens: filter for only high-‐value persons Reduce load by only selec.ng the fields used for extrac.on, src_ip Aggregate the returned events into one-‐day buckets. This method is more efficient than aggrega.ng by date_day and date_month tags 33 • 
• 
• 
• 
| rex field=_raw "(?mi)(username|user|login)=(?<uname>.{0,35}?)(@|&)" | eventstats count as mul._ip by _.me, uname, src_ip | where mul._ip > 1 | stats count as count by _.me, uname, src_ip Locate and extract the user name from the returned records. This is needed for the next step Use the ‘eventstats’ func.on to perform sta.s.cs that get added as a column to all records. Using _.me as one of the grouping aggregators limits the scope to the 1-‐
day buckets defined earlier. This provides a count of how many IP addresses the user has Use ‘where’ to filter out singular addresses. This reduces superfluous returns that do not present mul.ple loca.ons that would generate a difference in later steps Perform the count again, only with the ‘stats’ command. The difference is ‘stats’ does full aggrega.on, only returning the fields defined in the grouping statement. Other data is removed. It is also map-‐reduced to the indexers, making it an efficient way to consolidate data 34 • 
• 
• 
• 
| geoip src_ip | fillnull value="-‐" | where src_ip_region_name != "-‐" | eventstats sum(count) as count_per_user by _.me, uname Lookup the geoloca.on informa.on for the source IP address. This provides regional, as well as specific la.tude and longitude data for each IP address in the event Use ‘fillnull’ to provide a default value to ensure all fields are represented in the result set. This sets up the dataset for complementary set filtering Use ‘where’ to apply the complementary set. In this case, filter out unmapped IP addresses – those that did not get at least regional granularity Count the hits per user and append it as a new column. This provides the total value that we will use to manually calculate the averages later because the provided average func.on is insufficient to mul.-‐dimensional (co-‐variant) averaging 35 | eval src_ip_lat_w_avg = src_ip_la.tude * (count/count_per_user) | eval src_ip_lon_w_avg = src_ip_longitude * (count/count_per_user) | eval src_ip_lat_w_stdev = sqrt((src_ip_la.tude * src_ip_la.tude)/count_per_user) * (count/count_per_user) | eval src_ip_lon_w_stdev = sqrt((src_ip_longitude * src_ip_longitude)/count_per_user) * (count/count_per_user) • 
• 
• 
Because we are using a co-‐variant calcula.on for the average and standard devia.on values, they must have their own ‘eval’ statements –  First mul.ply each value by its ra.o of occurrence to weight it –  Then sum all of the weighted values to provide the average Weighted Average = sum (value * (rela.ve occurrence ra.o) ) Weighted Standard Devia.on = sum( square root ( ( value1 * value2 / total count per user ) * ( rela.ve occurrence ra.o ) ) ) 36 | eventstats sum(src_ip_lat_w_avg) as src_ip_lat_avg2, sum(src_ip_lon_w_avg) as src_ip_lon_avg2, sum(src_ip_lat_w_stdev) as src_ip_lat_stdev2, sum(src_ip_lon_w_stdev) as src_ip_lon_stdev2 by _.me, uname • 
• 
This finishes off the summa.on of the four needed values –  Weighted average for la.tude –  Weighted average for longitude –  Weighted standard devia.on for la.tude –  Weighted standard devia.on for longitude Now we have usable values for the body of data for each user, weighted by occurrence 37 | eval src_ip_lat_threshold_low2 = src_ip_lat_avg2 -‐ src_ip_lat_stdev2 | eval src_ip_lat_threshold_high2 = src_ip_lat_avg2 + src_ip_lat_stdev2 | eval src_ip_lon_threshold_low2 = src_ip_lon_avg2 -‐ src_ip_lon_stdev2 | eval src_ip_lon_threshold_high2 = src_ip_lon_avg2 + src_ip_lon_stdev2 • 
• 
• 
Per user model is taking shape The ‘eval’ statements calculate the thresholds per user as a new column (field) Calculates high and low value thresholds for each user at a distance of one standard devia.on from their average. –  Because we are using lat/long values, this average can be considered the user’s “center of presence” 38 | where src_ip_la.tude > src_ip_lat_threshold_high2 OR src_ip_la.tude < src_ip_lat_threshold_low2 OR src_ip_longitude > src_ip_lon_threshold_high2 OR src_ip_longitude < src_ip_lon_threshold_low2 | stats count as geo_count by _.me, uname, src_ip_city, src_ip_region_name, src_ip_country_code • 
• 
Use ‘where’ to isolate remaining events where the geolocated lat/long exceeds variance from the expected “center”. Find values that are greater than high, or less than low thresholds Finally, use ‘stats’ to count the number of .mes the geoloca.onal variance was exceeded. Using states with the ‘by’ feature to include the desired field names is an efficient way to de-‐duplicate the records while crea.ng some useful analy.cal data (the count) 39 Reduced Dataset Only Shows Outliers 40 Produc.ze Flexibility •  This search might be useful in other instances •  Convert search into a parameterized macro –  Which monitored players can be a parameter –  Range of aggrega.on (binning) may also have usefulness being adjusted as a parameter –  Adjust search to obey the .me selector widget •  Code, test, distribute to Splunk search heads 41 Setup Aler.ng With Sensible Report •  On saved-‐searches head, invoke macro with the HVP string ‘person1 OR person2 OR …’ and desired aggrega.on .me frame (1 day) •  Set the saved search for the prior day (-‐1d@d) •  Execute daily some.me in the morning •  Email any result set (>0) to SOC for follow-‐up 42 Add Process to SOC •  Upon alert, verify with Physical Security –  Did high-‐value user in ques.on go to the loca.on on the alerted date? •  If legit, close immediately. •  If not legit, inves.gate further –  Tunnels? –  Rou.ng via mul.ple Internet exit points? –  Satellite ground sta.ons? •  Is there any indica.on of malice? → Correla.on 43 Learn, share and hack Security office hours: 11:00 AM – 2:00 PM @Room 103 Everyday Geek out, share ideas with Enterprise Security developers Red Team / Blue Team -‐ Challenge your skills and learn new tricks Mon-‐Wed: 3:00 PM – 6:00 PM @Splunk Community Lounge Thurs: 11:00 AM – 2:00 PM Birds of a feather-‐ Collaborate and brainstorm with security ninjas Thurs: 12:00 PM – 1:00 PM @Meal Room 44 Ques.ons? Andrew Hunt [email protected] 45

Download Report

Slides - Splunk .conf

Paperzz.com

Your Paperzz