Semantics for Big Data (,) Security and Privacy

Semantics for Big Data
(,) Security and Privacy
Tim Finin and Anupam Joshi
University of Maryland, Baltimore County
Baltimore MD
NSF Workshop on Big Data Security and Privacy
2014-09-16, University of Texas at Dallas
http://ebiq.org/r/363
The plot outline
• Big data
→ Variety
→ Need for integration & fusion
→ Must understand data semantics
→ Use semantic languages & tools (reasoners, ML)
→ Have shared ontologies & background knowledge
• Relevance to security and privacy
–Protect personal information, especially in
mobile/IOT scenarios
–Better intrusion detection systems
Use Case Examples
We’ve used semantic technologies in support of
assured information tasks including
– Representing & enforcing information sharing policies
– Negotiating for cloud services respecting organizational
constraints (e.g., data privacy, location, …)
– Modeling context for mobile users and using this to
manage information sharing
– Acquiring, using and sharing knowledge for
situationally-aware intrusion detection systems
Key technologies include Semantic Web languages (OWL,
RDF) and tools and information extraction from text
Context-Aware Privacy and Security
• Smart mobile devices know a great deal about
their users, including their current context
• Acquiring and using this knowledge
We’re in a two-hour
helps them provide better services
budget meeting at X
• Sharing the information with other users,with A, B and C
We’re in a impororganizations and service providers can also
be We’re busy
tant meeting
beneficial (Mobile Ad-Hoc Knowledge Networks)
• Context-aware policies can be used to limit
information sharing as well as to control the
actions and information access of mobile apps
http://ebiq.org/p/589
Context-aware power management
• Maintaining context model uses power
• We empirically determine power usage for a
phone’s sensors and use this for optimization
Context-aware power management
When
updating
context
model
• Maintaining
the context
model
use power
1. Only enable sensors required by policy, reuse
• We developed an accurate power models for a
recent sensor readings whenever appropriate
phone’s
sensors
and
useatthis
optimization
e.g., disable
GPS sensor
when
homefor
in evening
2. Prefer sensors with lower energy footprint or
already in use when several available
e.g., Choose Wifi to GPS for location at office during day
3.Reorder rule conditions to reduce energy use
e.g., Check conditions requiring no sensor access first
http://ebiq.org/p/632
Intrusion Detection Systems
• Current intrusion detection systems poor for
zero-day and “low and slow” attacks, and APTs
• Sharing Information from heterogeneous data
sources can provide useful information even
when an attack signature is unavailable
• Implemented prototypes that integrate and
reason over data from IDSs, host and network
scanners, and text at the knowledge level
• We’ve established the feasibility of the
approach in simple evaluation experiments
From dashboards & watchstanding
(Simple) Analysis
… to situational awareness
[ a IDPS:text_entity;
IDPS:has_vulnerability_term "true";
IDPS:has_security_exploit "true";
IDPS:has_text “Internet Explorer";
IDPS:has_text “arbitrary code ";
IDPS:has_text "remote attackers".]
Context/Situation
[ a IDPS:system;
IDPS:host_IP "130.85.93.105”.]
[ a IDPS:scannerLog
IDPS:scannerLogIP "130.85.93.105";
…]
[ a IDPS:gatewayLog
IDPS:gatewayLogIP "130.85.93.105";
…]
Facts / Information
Policies
[ IDPS:scannerLog IDPS:hasBrowser
?Browser
IDPS:gatewayLog
IDPS:hasURL
?URL
?URL
IDPS:hasSymantecRating
“unsafe”
IDPS: scannerLog
IDPS:hasOutboundConnection “true”
IDPS:WiresharkLog IDPS:isConnectedTo
?IPAddress
?IPAddress
IDSP:isZombieAddress
“true”]
=>
[IDPS:system IDPS:isUnderAttack
“user-after-free vulnerability”
IDPS:attack
IDPS:hasMeans
“Backdoor”
IDPS:attack
IDPS:hasConsequence “UnautorizedRemoteAccess”]
Alerts
Rules
Analytics
Traditional Sensors
Use-after-free vulnerability
in Microsoft Internet
Explorer 6 through
8 ….
Non Traditional
“Sensors”
http://ebiq.org/p/604
Maintaining the vulnerability KB
• Our approach requires us to keep the KB of
software products and known or suspected
vulnerabilities and attacks up to date
• Resources like NVD are great, but tapping into
text can enrich their info and give earlier
warn-ings of problems
Attacker finds vuln. &
exploits it
(01/10/13)
Vuln. Analyzed & included
in NVD feed
CVE disclosed
(01/14/13)
(02/16/2013)
Analysis
Vendor deploys
software
System update
(03/04/2013)
Vendor Analysis
Patch development
Resolution
Patch released
(Critical Patch Update)
Exploit reported in
mailing list
(01/10/13)
Threat disclosed in
vendor bulletin
(06/18/2013)
Vuln. reported in NVD RSS feed
Information extraction from text
Identify relationships
ebqids:hasMean
s
Link concepts to entities
http://dbpedia.org/resourc
e/Buffer_overflow
ebqids:affectsProduct
CVE-2012-0150
Buffer overflow in msvcrt.dll in Microsoft Windows Vista SP2, Windows Server
2008 SP2, R2, and R2 SP1, and Windows 7 Gold and SP1 allows remote
attackers to execute arbitrary code via a crafted media file, aka ”Msvcrt.dll
Buffer Overflow Vulnerability.”
http://dbpedia.org/resource/Arbitrary_code_execution
http://dbpedia.org/resource/Wind
ows_7
• We use information extraction techniques to identify
entities, relations and concepts in security related text
• These are mapped to terms in our ontology and the
DBpedia LOD KB (based on Wikipedia)
• Google’s slogan: “Things, not strings”
Maintaining the vulnerability KB
NVD dataset
Structured
Data (XML)
Unstructured
Data (Vuln.
Summaries)
Security
Bulletins
Blogs
Web Text
Entity & Concept
Spotter
Extracted Concepts
<Concept, Class>
RDF Generation
Linking &
Mapping Entities
Triple Store
IDS Ontology
Consumers
Linked
Cybersecurity
Data
http://ebiq.org/p/629
Faceblock
Click image to play 80 second video or go to Youtube
http://ebiq.org/p/666
Faceblock Ontology
Faceblock’s (OWL) ontology lets one to write context
policy rules using predefined activity and place types
Faceblock Ontology
Faceblock’s (OWL) ontology lets one to write context
policy rules using predefined activity and place types
Faceblock Protocols
User device maintains context, reasons with policy rules and
informs glass devices of Faceblock property: True or Fase
Taming Wild Big Data
• WBD is structured or semi-structured data for
which we lack schema-level understanding
–e.g, raw tables, graphs, xml, logs
• Developed tools to generate semantic
data from background ontologies
& KBs, e.g. for clinical trial tables
• It’s harder when the domain is not even known.
We’re developing systems that use large
background KBs (e.g., Google’s Freebase) to
predict types/subtypes of data instances
http://ebiq.org/p/661
http://ebiq.org/p/672
Conclusion
• Google’s new slogan: things, not strings
• We also need: measurements, not numbers
• Common ontologies in semantic representations
enable big data integration at a “knowledge level”
–data, meta-data, provenance, certainty, rules
• Many advantages:
–Enhancing discovery, integration and interoperability
–Enabling inference and knowledge-level analytics
–Expressing policy constraints in common semantic terms
http://ebiq.org/r/363