Distributing a Classified Search* Rafail Ostrovsky William Skeith Stealth Software Technologies, LLC http://www.stealthsoftwareinc.com Topics We’ll Cover Motivational example (“no-fly” list) The process: Savings in computation Savings in communication Simple and efficient monitoring The implementation Generating an encrypted search Distributed execution/result monitoring Decryption/analysis of data The benefits: Just one of many applications, but it illustrates the ideas well High-performance Parallelized design Demonstration Motivational example: “No-fly” list Search for classified names and aliases of suspected terrorists Knowledge of aliases must be kept secret Until now, this precludes a distributed search If not, the advantage derived from this intelligence may become void Without our technology, one must rely on an “import, then process” method Our technology allows any willing and able party to help perform the search Problems with Import, then Process Expensive in communication Expensive in processing Averse to dynamic data Difficult to manage and synchronize data from vast and disparate sources Processing must be done locally Not entirely respectful of citizens’ privacy Our Technology Allows data to be searched where it naturally resides, despite the criteria being sensitive or classified Attractive alternative to the import, then process paradigm Ideal for dynamic, distributed, streaming data Creates savings in communication and processing Enables low-latency, low-complexity monitoring Symmetrically preserves privacy The records of “un-interesting” citizens will not be collected Process Outline Step 1: (secure environment) given sensitive or classified search criteria, create an encrypted search Step 2: (any environment, may be unclassified) migrate encrypted search to multiple machines on any network Every machine runs encrypted search on (local) data, writing output to small encrypted buffers Migrate encrypted buffers to a classified machine *as needed* using real-time monitoring Step 3: (secure environment) decrypt buffers and analyze results Step 1: Create Encrypted Search Mohamed Atta Hani Hanjour Ziad Jarrah 101010101011100000 110101011000100100 101010101010000101 111110100100110100 110101011101011001 001000111011010110 101100010010011100 100101101011101010 010101010000101110 Encrypted version of search is indistinguishable from a random distribution. Encrypted Search: Provably reveals no information about search terms The guarantee of security holds even if an adversary acquires: The encrypted search description The program’s output (which is also encrypted) The program’s source code Therefore, it can be distributed outside of a classified environment Step 2: Distribute Search 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 Step 2: Distribute Search Any willing and able parties may now participate The outside participants know they are helping with a search, but remain oblivious as to what they are searching for Generic program (distributed only once) executes encrypted search descriptions on plaintext data Results are collected in small encrypted buffers How does it work? Based on homomorphic encryption Given E(x), E(y) it holds that E(x)+E(y) = E(x+y) Allows a party without the decryption key to still do something “useful” with encrypted data, although it remains unreadable Allows us to conditionally encrypt only matching documents Process outputs E(0) for a non-matching document, and outputs E(D) if the document D in fact matches the query Real-Time Monitoring Traditional methods are unpleasant- typically very complex and communication-intensive Constant downloads / synchronization High complexity, high communication Waiting for batches Reduces complexity, but increases latency and still involves unnecessary communication Real-Time Monitoring – Our Solution I’m John Doe. Mohammed Atta. I’m Jane Lane. Small 0/1 flag (Encrypted) A small encrypted flag can be periodically transmitted indicating the presence or absence of any search results. This provides a simple mechanism for real-time monitoring. Real-Time Monitoring The encrypted flags can be aggregated so that one small value can indicate the presence or absence of results for an entire airport, if desired. Rather than monitoring a constant stream of thousands of names, one small value can be periodically checked. Real-Time Monitoring Saves communication- only download data when needed Furthermore, you only download what you need Low-overhead, low-complexity method for monitoring vast data sources Ideal for highly dynamic data Ideal for situations where long knowledge latency is unacceptable A Note on Encrypted Flags Encrypted flags can contain a lot, or only a little information, depending on the application They can give additional information, e.g. a more specific location where a hit was found and the number of hits If desired, it can be guaranteed to only take values of “yes” or “no” Step 3: Decryption 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 110100100 101001001 001001110 110101011 Step 3: Decryption Once it has been determined that “interesting data” has been collected: Download the small buffers Transfer to a classified environment Then, decrypt buffers to obtain results Summary of Benefits Strong security guarantees enable distribution of a sensitive/classified search Massive parallelism Process data where it naturally resides Creates savings in processing Creates vast savings in storage and communication Low-latency monitoring on highly dynamic data and low-latency searching Preserves privacy in both directions Other applications Google-like search service for the intelligence community, using an unclassified server farm to perform searches Distributed intelligence search, similar to SETI@home Federal to state interactions Agency to agency interactions Ship/Truck manifests, routes, anomalies Truck driver information Private aircraft flight plan/pilot/cargo information Financial data mining Monitor news feeds, etc… Auditing financial data in private Immigration/visa data-mining Implementation: Design and Performance Parallelism: a growing industry trend Intel now ships nearly 100% of its servers with multi-core processors, and over 90% of its desktops “Multi-core processors represent a major evolution in computing technology… they will eventually become the pervasive computing model” – AMD Our software dynamically takes advantage of all processors on the client system Absolutely no modification of code nor of input parameters is necessary Implementation: Design and Performance Based on independently developed highperformance library for long integers and number theory 64 bit library outperforms 64 bit optimized NTL (a well-respected high-performance library) by more than a factor of 7 for multiplication of 1024 bit integers. Most arithmetic routines are close to optimal, approaching the theoretical limits of the Intel Core 2 µ-arch Implementation: Design and Performance Makes use of special purpose arithmetic algorithms, ideal for the task Processes documents at ≈ 100KB/sec. (for smaller documents) and ≈ 120KB/sec. (for larger documents) on a 2GHz Intel Core 2 Duo It may be of interest to note that the original prototype (based on NTL) processed documents at ≈ 1KB/sec. Up next… Demonstration Questions
© Copyright 2025 Paperzz