Semantics for Big Data (,) Security and Privacy Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD NSF Workshop on Big Data Security and Privacy 2014-09-16, University of Texas at Dallas http://ebiq.org/r/363 The plot outline • Big data → Variety → Need for integration & fusion → Must understand data semantics → Use semantic languages & tools (reasoners, ML) → Have shared ontologies & background knowledge • Relevance to security and privacy –Protect personal information, especially in mobile/IOT scenarios –Better intrusion detection systems Use Case Examples We’ve used semantic technologies in support of assured information tasks including – Representing & enforcing information sharing policies – Negotiating for cloud services respecting organizational constraints (e.g., data privacy, location, …) – Modeling context for mobile users and using this to manage information sharing – Acquiring, using and sharing knowledge for situationally-aware intrusion detection systems Key technologies include Semantic Web languages (OWL, RDF) and tools and information extraction from text Context-Aware Privacy and Security • Smart mobile devices know a great deal about their users, including their current context • Acquiring and using this knowledge We’re in a two-hour helps them provide better services budget meeting at X • Sharing the information with other users,with A, B and C We’re in a impororganizations and service providers can also be We’re busy tant meeting beneficial (Mobile Ad-Hoc Knowledge Networks) • Context-aware policies can be used to limit information sharing as well as to control the actions and information access of mobile apps http://ebiq.org/p/589 Context-aware power management • Maintaining context model uses power • We empirically determine power usage for a phone’s sensors and use this for optimization Context-aware power management When updating context model • Maintaining the context model use power 1. Only enable sensors required by policy, reuse • We developed an accurate power models for a recent sensor readings whenever appropriate phone’s sensors and useatthis optimization e.g., disable GPS sensor when homefor in evening 2. Prefer sensors with lower energy footprint or already in use when several available e.g., Choose Wifi to GPS for location at office during day 3.Reorder rule conditions to reduce energy use e.g., Check conditions requiring no sensor access first http://ebiq.org/p/632 Intrusion Detection Systems • Current intrusion detection systems poor for zero-day and “low and slow” attacks, and APTs • Sharing Information from heterogeneous data sources can provide useful information even when an attack signature is unavailable • Implemented prototypes that integrate and reason over data from IDSs, host and network scanners, and text at the knowledge level • We’ve established the feasibility of the approach in simple evaluation experiments From dashboards & watchstanding (Simple) Analysis … to situational awareness [ a IDPS:text_entity; IDPS:has_vulnerability_term "true"; IDPS:has_security_exploit "true"; IDPS:has_text “Internet Explorer"; IDPS:has_text “arbitrary code "; IDPS:has_text "remote attackers".] Context/Situation [ a IDPS:system; IDPS:host_IP "130.85.93.105”.] [ a IDPS:scannerLog IDPS:scannerLogIP "130.85.93.105"; …] [ a IDPS:gatewayLog IDPS:gatewayLogIP "130.85.93.105"; …] Facts / Information Policies [ IDPS:scannerLog IDPS:hasBrowser ?Browser IDPS:gatewayLog IDPS:hasURL ?URL ?URL IDPS:hasSymantecRating “unsafe” IDPS: scannerLog IDPS:hasOutboundConnection “true” IDPS:WiresharkLog IDPS:isConnectedTo ?IPAddress ?IPAddress IDSP:isZombieAddress “true”] => [IDPS:system IDPS:isUnderAttack “user-after-free vulnerability” IDPS:attack IDPS:hasMeans “Backdoor” IDPS:attack IDPS:hasConsequence “UnautorizedRemoteAccess”] Alerts Rules Analytics Traditional Sensors Use-after-free vulnerability in Microsoft Internet Explorer 6 through 8 …. Non Traditional “Sensors” http://ebiq.org/p/604 Maintaining the vulnerability KB • Our approach requires us to keep the KB of software products and known or suspected vulnerabilities and attacks up to date • Resources like NVD are great, but tapping into text can enrich their info and give earlier warn-ings of problems Attacker finds vuln. & exploits it (01/10/13) Vuln. Analyzed & included in NVD feed CVE disclosed (01/14/13) (02/16/2013) Analysis Vendor deploys software System update (03/04/2013) Vendor Analysis Patch development Resolution Patch released (Critical Patch Update) Exploit reported in mailing list (01/10/13) Threat disclosed in vendor bulletin (06/18/2013) Vuln. reported in NVD RSS feed Information extraction from text Identify relationships ebqids:hasMean s Link concepts to entities http://dbpedia.org/resourc e/Buffer_overflow ebqids:affectsProduct CVE-2012-0150 Buffer overflow in msvcrt.dll in Microsoft Windows Vista SP2, Windows Server 2008 SP2, R2, and R2 SP1, and Windows 7 Gold and SP1 allows remote attackers to execute arbitrary code via a crafted media file, aka ”Msvcrt.dll Buffer Overflow Vulnerability.” http://dbpedia.org/resource/Arbitrary_code_execution http://dbpedia.org/resource/Wind ows_7 • We use information extraction techniques to identify entities, relations and concepts in security related text • These are mapped to terms in our ontology and the DBpedia LOD KB (based on Wikipedia) • Google’s slogan: “Things, not strings” Maintaining the vulnerability KB NVD dataset Structured Data (XML) Unstructured Data (Vuln. Summaries) Security Bulletins Blogs Web Text Entity & Concept Spotter Extracted Concepts <Concept, Class> RDF Generation Linking & Mapping Entities Triple Store IDS Ontology Consumers Linked Cybersecurity Data http://ebiq.org/p/629 Faceblock Click image to play 80 second video or go to Youtube http://ebiq.org/p/666 Faceblock Ontology Faceblock’s (OWL) ontology lets one to write context policy rules using predefined activity and place types Faceblock Ontology Faceblock’s (OWL) ontology lets one to write context policy rules using predefined activity and place types Faceblock Protocols User device maintains context, reasons with policy rules and informs glass devices of Faceblock property: True or Fase Taming Wild Big Data • WBD is structured or semi-structured data for which we lack schema-level understanding –e.g, raw tables, graphs, xml, logs • Developed tools to generate semantic data from background ontologies & KBs, e.g. for clinical trial tables • It’s harder when the domain is not even known. We’re developing systems that use large background KBs (e.g., Google’s Freebase) to predict types/subtypes of data instances http://ebiq.org/p/661 http://ebiq.org/p/672 Conclusion • Google’s new slogan: things, not strings • We also need: measurements, not numbers • Common ontologies in semantic representations enable big data integration at a “knowledge level” –data, meta-data, provenance, certainty, rules • Many advantages: –Enhancing discovery, integration and interoperability –Enabling inference and knowledge-level analytics –Expressing policy constraints in common semantic terms http://ebiq.org/r/363
© Copyright 2025 Paperzz