Searching with Privacy: a survey

Distributing a Classified
Search*
Rafail Ostrovsky
William Skeith
Stealth Software Technologies, LLC
http://www.stealthsoftwareinc.com
Topics We’ll Cover

Motivational example (“no-fly” list)


The process:






Savings in computation
Savings in communication
Simple and efficient monitoring
The implementation



Generating an encrypted search
Distributed execution/result monitoring
Decryption/analysis of data
The benefits:


Just one of many applications, but it illustrates the ideas well
High-performance
Parallelized design
Demonstration
Motivational example:
“No-fly” list


Search for classified names and aliases of
suspected terrorists
Knowledge of aliases must be kept secret


Until now, this precludes a distributed search


If not, the advantage derived from this intelligence
may become void
Without our technology, one must rely on an
“import, then process” method
Our technology allows any willing and able
party to help perform the search
Problems with Import, then
Process

Expensive in communication



Expensive in processing


Averse to dynamic data
Difficult to manage and synchronize data
from vast and disparate sources
Processing must be done locally
Not entirely respectful of citizens’
privacy
Our Technology






Allows data to be searched where it naturally resides,
despite the criteria being sensitive or classified
Attractive alternative to the import, then process
paradigm
Ideal for dynamic, distributed, streaming data
Creates savings in communication and processing
Enables low-latency, low-complexity monitoring
Symmetrically preserves privacy

The records of “un-interesting” citizens will not be collected
Process Outline


Step 1: (secure environment) given sensitive or
classified search criteria, create an encrypted search
Step 2: (any environment, may be unclassified)
migrate encrypted search to multiple machines on
any network



Every machine runs encrypted search on (local) data,
writing output to small encrypted buffers
Migrate encrypted buffers to a classified machine *as
needed* using real-time monitoring
Step 3: (secure environment) decrypt buffers and
analyze results
Step 1: Create Encrypted
Search
Mohamed Atta
Hani Hanjour
Ziad Jarrah
101010101011100000
110101011000100100
101010101010000101
111110100100110100
110101011101011001
001000111011010110
101100010010011100
100101101011101010
010101010000101110
Encrypted version of search is indistinguishable from a random distribution.
Encrypted Search:


Provably reveals no information about search
terms
The guarantee of security holds even if an
adversary acquires:




The encrypted search description
The program’s output (which is also encrypted)
The program’s source code
Therefore, it can be distributed outside of a
classified environment
Step 2: Distribute Search
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
Step 2: Distribute Search

Any willing and able parties may now
participate


The outside participants know they are helping
with a search, but remain oblivious as to what
they are searching for
Generic program (distributed only once)
executes encrypted search descriptions on
plaintext data

Results are collected in small encrypted buffers
How does it work?

Based on homomorphic encryption



Given E(x), E(y) it holds that E(x)+E(y) = E(x+y)
Allows a party without the decryption key to still
do something “useful” with encrypted data,
although it remains unreadable
Allows us to conditionally encrypt only
matching documents

Process outputs E(0) for a non-matching
document, and outputs E(D) if the document D in
fact matches the query
Real-Time Monitoring


Traditional methods are unpleasant- typically
very complex and communication-intensive
Constant downloads / synchronization


High complexity, high communication
Waiting for batches

Reduces complexity, but increases latency and still
involves unnecessary communication
Real-Time Monitoring – Our
Solution
I’m John
Doe.
Mohammed
Atta.
I’m Jane Lane.
Small 0/1 flag
(Encrypted)
A small encrypted flag can be periodically transmitted indicating the
presence or absence of any search results. This provides a simple
mechanism for real-time monitoring.
Real-Time Monitoring
The encrypted flags can be
aggregated so that one
small value can indicate
the presence or absence of
results for an entire airport,
if desired.
Rather than monitoring a
constant stream of
thousands of names, one
small value can be
periodically checked.
Real-Time Monitoring

Saves communication- only download data
when needed




Furthermore, you only download what you need
Low-overhead, low-complexity method for
monitoring vast data sources
Ideal for highly dynamic data
Ideal for situations where long knowledge
latency is unacceptable
A Note on Encrypted Flags



Encrypted flags can contain a lot, or
only a little information, depending on
the application
They can give additional information,
e.g. a more specific location where a hit
was found and the number of hits
If desired, it can be guaranteed to only
take values of “yes” or “no”
Step 3: Decryption
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
110100100
101001001
001001110
110101011
Step 3: Decryption

Once it has been determined that
“interesting data” has been collected:



Download the small buffers
Transfer to a classified environment
Then, decrypt buffers to obtain results
Summary of Benefits

Strong security guarantees enable distribution
of a sensitive/classified search






Massive parallelism
Process data where it naturally resides
Creates savings in processing
Creates vast savings in storage and
communication
Low-latency monitoring on highly dynamic
data and low-latency searching
Preserves privacy in both directions
Other applications


Google-like search service for the intelligence community, using
an unclassified server farm to perform searches
Distributed intelligence search, similar to SETI@home







Federal to state interactions
Agency to agency interactions
Ship/Truck manifests, routes, anomalies
Truck driver information
Private aircraft flight plan/pilot/cargo information
Financial data mining


Monitor news feeds, etc…
Auditing financial data in private
Immigration/visa data-mining
Implementation: Design and
Performance

Parallelism: a growing industry trend



Intel now ships nearly 100% of its servers with
multi-core processors, and over 90% of its
desktops
“Multi-core processors represent a major evolution
in computing technology… they will eventually
become the pervasive computing model” – AMD
Our software dynamically takes advantage of
all processors on the client system

Absolutely no modification of code nor of input
parameters is necessary
Implementation: Design and
Performance

Based on independently developed highperformance library for long integers and
number theory


64 bit library outperforms 64 bit optimized NTL (a
well-respected high-performance library) by more
than a factor of 7 for multiplication of 1024 bit
integers.
Most arithmetic routines are close to optimal,
approaching the theoretical limits of the Intel Core
2 µ-arch
Implementation: Design and
Performance


Makes use of special purpose arithmetic
algorithms, ideal for the task
Processes documents at ≈ 100KB/sec.
(for smaller documents) and ≈
120KB/sec. (for larger documents) on a
2GHz Intel Core 2 Duo

It may be of interest to note that the
original prototype (based on NTL)
processed documents at ≈ 1KB/sec.
Up next…


Demonstration
Questions