Introducing Apache Pirk - Linux Foundation Events

@ApachePirk
PresentedBy:EllisonAnneWilliams
ApachePirkPPMCMember;Founder,EN|VEIL
Outline
WhatisApachePirk?
WhatisPIR?
WhyApachePirk?
PirkBasics
Roadmap
GetInvolved
Appendix:Wideskies
WhatisApachePirk?
FrameworkforScalablePrivateInformationRetrieval(PIR)
BeautifulBlendofMathematics&ComputerScience
DevelopedattheNationalSecurityAgency
DonatedtotheApacheSoftwareFoundationinJuly2016
UndergoingIncubationwithintheApacheIncubator
TwoASFReleasesTo-Date– 0.3.0ReleaseComingSoon
WhatisPIR?
PIR– PrivateInformationRetrieval
FieldofTheoreticalMathematicsandComputerScience- ~20years
AbilitytoPrivatelyRetrieveInformationfromaDataset
WithoutRevealingAnyInformationRegardingtheQuestionsAsked
ORtheResultsObtainedtotheDatasetOwneroranObserver
PoweredbyHomomorphicEncryption
WithoutPIR
WithPIR
Querier
Responder
IhaveaprivatequestionQ
I’mgoingtousePIR…
Data
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIR…
IformE(Q)
E(Q)
Data
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIR…
IformE(Q)
E(Q)
Data
AskE(Q)
ProduceE(A)
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIR…
IformE(Q)
E(Q)
Data
AskE(Q)
ProduceE(A)
E(A)
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIR…
IformE(Q)
E(Q)
Data
AskE(Q)
AnswerA=D(E(A))
ProduceE(A)
E(A)
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIR…
IformE(Q)
E(Q)
Data
AskE(Q)
AnswerA=D(E(A))
PIRisawesome!
ProduceE(A)
E(A)
WhyApachePirk?
PIRHistoricallyLargelyTheoretical
Needfor
PracticalPIR
RobustandDeployablePIRImplementations
ApachePirk
ProvidesaLandingPlaceforRobust,ScalablePIR
FostersaCommunityAroundScalablePIR
PirkBasics
Querier
GeneratesEncryptedQueryVectors
GeneratesNecessaryDecryptionItemsforEachQueryVector
DecryptsEncryptedResults
Responder
PerformsEncryptedQueries
FormsEncryptedQueryResults
Responder
Querier
IhaveaprivatequestionQ
I’mgoingtousePIRK…
IformE(Q)
E(Q)
AskE(Q)
AnswerA=D(E(A))
PIRKisawesome!
ProduceE(A)
E(A)
BeyondtheQuerierandResponder
EncryptionLibrary
Paillier CryptosystemCurrentlyImplemented
DataSchemaFramework
QuerySchemaFramework
GenericDataFilter
Testing– DistributedandIn-MemoryTestSuites
DataSchema
{"date":"2016-02-20T23:29:05.000Z",
"src_ip":"55.55.55.55",
"event_type":"dns-hostname-query",
"query_id":"9cef5344-3dee-41f9aa32da72d9f74778",
"qtype":[1,0],
"dest_ip":"1.2.3.6",
"ip":["10.20.30.40","10.20.30.60"],
"qname":"a.b.c.com",
”rcode":0}
<schema>
<schemaName>nameoftheschema</schemaName>
<element>
<name>elementname</name>
<type>classnameortypename(ifJavaprimitivetype)
oftheelement</type>
<isArray>trueorfalse-- whetherornottheschema
elementisanarraywithinthedata</isArray>
<partitioner>optional- Partitioner classfortheelement;
defaultstoprimitivejavatypepartitioner </partitioner>
</element>
</schema>
DataSchema
{"date":"2016-02-20T23:29:05.000Z",
"src_ip":"55.55.55.55",
"event_type":"dns-hostname-query",
"query_id":"9cef5344-3dee-41f9aa32da72d9f74778",
"qtype":[1,0],
"dest_ip":"1.2.3.6",
"ip":["10.20.30.40","10.20.30.60"],
"qname":"a.b.c.com",
”rcode":0}
<schema>
<schemaName>awesomeDataSchema </schemaName>
<element>
<name>date</name>
<type>string</type>
<isArray>false</isArray>
<partitioner>org.apache.pirk.schema.data.partitioner.
PrimitiveTypePartitioner</partitioner>
</element>
….Lotsmoreelements….
</schema>
QuerySchema
{"date":"2016-02-20T23:29:05.000Z",
"src_ip":"55.55.55.55",
"event_type":"dns-hostname-query",
"query_id":"9cef5344-3dee-41f9aa32da72d9f74778",
"qtype":[1,0],
"dest_ip":"1.2.3.6",
"ip":["10.20.30.40","10.20.30.60"],
"qname":"a.b.c.com",
”rcode":0}
<schema>
<schemaName>myAwesomeQuerySchema </schemaName>
<dataSchemaName>superAwesomeDataSchema </dataSchemaName>
<selectorName>nameoftheelementinthedataschemathatwillbe
theselector</selectorName >
<elements>
<name>elementname</name>
</element>
<filterNames>
<name>(optional)elementnameofelementinthedataschemato
applypre-processingfilters</name>
</filterNames>
<additional>(optional)additionalfieldsforthequeryschema,in
<key,value>pairs
<field>
<key>keycorrespondingthethefield</key>
<value>valuecorrespondingtothefield</value>
</field>
</additional>
</schema>
QuerySchema
{"date":"2016-02-20T23:29:05.000Z",
"src_ip":"55.55.55.55",
"event_type":"dns-hostname-query",
"query_id":"9cef5344-3dee-41f9aa32da72d9f74778",
"qtype":[1,0],
"dest_ip":"1.2.3.6",
"ip":["10.20.30.40","10.20.30.60"],
"qname":"a.b.c.com",
”rcode":0}
<schema>
<schemaName>myAwesomeQuerySchema
</schemaName>
<dataSchemaName>superAwesomeDataSchema
</dataSchemaName>
<selectorName>qname </selectorName >
<elements>
<name>src_ip </name>
<name>dest_ip </name>
</element>
<filterNames>
<name>google.com </name>
</filterNames>
</schema>
Algorithms&Implementations
Algorithms
Wideskies withPaillier
Querier
Standalone,Multi-threaded
Algorithms&Implementations
Responder
Standalone,Multithreaded
DistributedBatch
MapReduce,Spark
DatafromHDFS,Elasticsearch
DistributedStreaming
Storm,SparkStreaming
DatafromKafka
Roadmap
ImplementationRoadmap
InputAdaptors- NoSQLDatabases:Hbase,Accumulo;Kafka,Nifi
Streaming- StormandHeron,SparkStreaming,Flink
Batch– Flink,Beam
AlgorithmicRoadmap
SecureMultipartyComputation,PrivateSetIntersection
FullyHomomorphicEncryption
AlwaysontheRoadmap
Improvements/OptimizationstoExistingCode
Benchmarking
GetInvolved
We❤MathematiciansandComputerScientists
Youdon’thavetocodetocontribute!
ApachePirkWebsite
http://pirk.incubator.apache.org
MailingLists–SubmitandDiscussIdeas/Issues
Dev:[email protected]
Commits:[email protected]
@ApachePirk
Thanks!
[email protected]
Wideskies Appendix