Data you can trust Technology that works for you

Datayoucantrust
Technologythatworks
foryou
DATA61’sFutureScienceVisionv1.4
RobertC.Williamson,2November2016
Preamble
ResearchisattheheartofData61.Ourresearchis
undertakenwithapurposeinmind–tocreateapositive
data-drivenfuture.Thisdocumentoutlinesourvision1
regardingwhatweaimtoachievebyfocusingourresearch
onwhattheworldneedsinareaswherewehaveworldleadingcapability.
Data61playstwocomplementaryrolesintheAustralian
innovationsystem.Weare“L-shaped”(seetheschematicon
theright):
1) Weconductmarketdrivenresearch(end-usedriven
projects)inarangeofindustrysectors;thesecontribute
tothehorizontalpartofData61’smission–solving
problemsinotherCSIRObusinessunits(andleveraging
theircapabilityandconnections)andthecommunity
morebroadly.
2) Wearethehometofundamentalresearchadvancing
thescienceandtechnologyofdata(theverticalpartof
thepicture).
Thesetwopartsmutuallysupporteachother2.Bothare
essential.Themarketcomponent,bydefinition,isnotforus
toplan,buttoadapttoinanagilemanner.Thescientific,
technologicalandengineeringresearchweproposetodois
ourstoplanandshape;thatiswhatthisdocumentdoes.
3
Thepurposeofthisdocument istofocusourworkonthe
verticalpartoftheL-shapedschematic.Thedocument
capturestheboldandambitiousareasofscienceand
technologywewishtoadvance4.Itshouldbeseenasaway
offocusingwhatwedo,andallowingustosay“yes”or“no”
inamoreinformedfashion5.Thegoalisnotsimplyto“put
morewoodbehindfewerarrows”butrathertogetmostof
thearrowspointinginonedirection,andtodescribethe
targettheyareaimingtohit–namelythefourgoalslistedin
thecalloutbox.Thiswillhelpshapeourfuturecapability
investments.
MeasuringtheWorld
Improvingthewholelifecycleofdata
captureanalysisanduse.
DeliveringTrustworthyAnalytics
Changingthewayanalyticsisdelivered;
guaranteeingtrustintheentireprocess.
BuildingSoftwareyoucanTrust
Creatingtechnologiesthatallowthe
constructionofsoftwarethatcanbetrusted.
ShapingSocietalTransformations
Developingbetterdatatechnologiesthrough
improvedunderstandingoftheirpotential
socialimpact.
ExplicitlyarticulatingthelargertechnicalchallengesisespeciallyimportantforData61becauseitisoften
(mistakenly)believedthatdataandinformationtechnologyresearchmerelysupportsothersciences–a
sortofglorifiedIThelpdesk.Infact,thecontraryisarguablythecase,withphysics6,chemistry7,biology8,
socialscience9,andeconomics10allhavingthescienceofdataandinformationattheircore,and
informationtechnologyprecepts,suchasmodularity,areessentialfortheunderstandingofmanynatural
systems11.
Ultimately,asrecentlywitnessedbysocialscience,anyfieldimmersedinaproperlyorganisedbathofdata
progressivelybecomescomputationallybased,ordevelopsacomputationalsubfield12.Thescienceof
informationanddataisarguablythemostfundamentalresearchtopicofthecentury,situatednotonlyat
thecentreofmathematicalresearch13,butunderpinningthenatureofrandomnessandcomplexity14,and
situatedattheverycoreofallthematuresciences.
|2
Context
Technologiesfordataaregeneralpurposetechnologies15thatwillhaveatransformativeimpacton
Australiansociety,althoughwhatthoseimpactswillbeisneitherpredictablenorpre-determined16.These
technologiesareoftendescribedas“artificialintelligence”17andincludemachinelearningandbigdata
analytics,automatedreasoning,computervision,naturallanguageunderstanding,androbotics.Data61’s
focusisontheadvancementoftechnologiesfordatainamannerthatprovidesnationalbenefit(economic,
socialandenvironmental).Thus,adeepunderstandingofthecontextoftechnologyuse,thepotential
impactstheycanhave,andshapingwhatthoseimpactsare,isacentralpartofourresearchvision.
Data61livesinsideanorganisationdedicatedtothediscoveryofscientificknowledge,knowledge
distinguishedbythehighdegreeoftrustonecanplaceinit:trustintheconclusions;trustintheevidence
thatisderivedfromdata;and,trustintheprocessestorevisetheknowledgewhenitisfoundtobefalse.
Sciencehasalwaysbeendata-drivenandwillremainso.Weproposetoexploitthescientificenterprise
withinCSIROasatestbedforideasthatcan,andwill,havemuchbroaderimpact.
Generalprinciples
Thescientificvisionisinformedbythefollowingfiveprinciples18
• P1.Lead:Striveforagreaterproportionofworldleadingresearch.Weshouldfocusoureffortsonareas
whereweare,orrealisticallycouldbe,worldleading.
• P2.Multiply:Aimformultiplicative(compositional)effectsratherthanadditive,elsewecannotscale.
Thisimpliesclever“platformisation”ofourtechnology.
• P3.Unique:Dowhatonlywecando,elseletothersdoit19.
• P4.Bold:Aimhigh.Wereallydowanttochangetheworld(throughuse-inspiredfundamentalresearch).
• P5.Antidisciplinary20:Datatraversesexistingdisciplineboundaries.Weignoredisciplinaryboundaries
andfollowtheproblemswherevertheytakeus.
|3
HeadlineVisions
Data61’sgoalistocreateourdata-drivenfuture–afuturewheretechnologiesfordatawillplayapositive
roleforsocietyatlarge.Newtechnologiesprovokemanyreactions.Fearanduncertaintyiscommon,witha
beliefthatthepreciseformsofnewtechnologyareinevitableandnotopentobeingshaped21.Acounterto
thisistrust,whichcanbeviewedasbeingatthecoreofallthatwedo.Allofourworkrevolvesaround
buildingtrustintechnologiesfordata:inautomation;insecurityandprivacy;thatyoursoftwareonlydoes
whatitclaimstodo;thatyourpersonalidentityisnotstolenfromyou;andtrustinallthingsthatmatterto
people.
Bysaying“datayoucantrust”wedonotmeanthatyoutrustitblindly,andespeciallywedonotmeanthat
youtrustitraw–dataneedstobeprocessedandmanipulatedtobeuseful,anditistheprocessesof
manipulationthatneedtobetrusted.Thisinvolvesbothdesigningsystemsthatdoindeedfacilitatetrustin
data,aswellasbuildingtrustworthytechnologiesfordoingthingswiththedata.Andinallofthis“trust”
itselfiscomplex,multidimensional,andisalwaysultimatelygroundedinhumanneedsandsociety22.
Weareusingtheapparentlysimplenotionof“trust”metaphorically23.Withoutattemptingtomakea
canonicaldefinitionoftrust24,wecansaywehave“trust”astheanchor,orpointofdeparture,formuchof
whatweproposetodo,including:
• Trustworthysoftware–notsoftwarethatyoutrustabsolutely,butsoftwareinwhichyoucanhave
quantifiabledegreesoftrustforsoundreasons
• Trustindata–notdatayoutrustwithoutcause,butdatayoucantrustforyourpurposebecauseofthe
evidenceprovidedregardingitsmanagement,provenanceandwhatwasdonetoit(analyticsthathas
quantifiableeffect)
• Trustinsystems–trustthatyouknowtowhatdegreeyoucanrelyondata-centricsystems,including
communications,notthatyoutrustitabsolutely
• Trustindatatechnologyenabledsocio-technicalsystems–trustthatthesesystemswillbenefityouand
thatanyharmsaremanifestandcontrolled.
Understandingthecomplexinterfacebetweendata,itsmanagement,manipulationandprocessing,and
theimpactsitcanhaveonpeopleiscentraltobuildingtrustarounddataandtechnologiesfordata.Trust
indata(anditsassociatedprocesses)canalsounderpintrustininstitutions,interventionsandpolicies.
Themeansofmanipulatingandprocessingdataaredatatechnologies.Whenwesay“technologiesthat
workforyou”wemeantheydowhattheyaresupposedtodo,theydon’tdoanythingelse,andtheyare
usableanduseful(andimplicitlywerecognisetheimportanceofwhothe“you”is–technologiesthathelp
onegroupcanharmothers).
Whilethesesentimentsmightbetakenforgranted,historyshowstheyareoftenabsent,andimprovingthe
degreetowhichthetechnologieswedevelopachievethesegoalshelpstoshapeswhatwedo.Examples
are:theconstructionofsoftwarethathasanadequatelyhighguaranteeofsecurelydoingonlywhatitis
supposedtodo;or,statisticalmachinelearningmethodsyoutrustbecauseofmathematicaltheoriesthat
provideadequateguaranteesregardingtheirbehaviouranduncertainty.
Boththeseexamplesillustratethenecessityfordeepscientificandmathematicalknowledgeaswellasa
quantitativenotionofperformance.ThisscientificdepthdifferentiateswhatData61doesfrommuchofthe
datatechnologyinthewiderworld.
Theheadlinevisionsandscientificchallengesserveasarallyingpointfornotonlythescientificresearchwe
do,butalsotheshortertermend-usedrivenprojectsdeliveredbyourengineeringteam.Ideallythe
majorityofsuchprojects,inadditiontodeliveringoncustomerexpectations,willfurtherthegoalsbelow.
|4
H.1MeasuringtheWorld25
Thusisbygeometryemesuredallethingis
–WilliamCaxton,MyrrouroftheWorlde(1481)
Theworldbecomesbetterunderstood,andthusinterventionsaremoreeffectiveandacceptable,
throughthedevelopmentofmethodsfordatacaptureandmodelbuildingthatputtrustatthecenter.
Background:Humanstrytoimprovetheworld,butoftenfail.Theirinterventionsdon’twork,orhave
unintendedconsequences.Onereasonforthesefailuresispoormodelsoftheworld–itisdifferentfrom
whatweexpect.Bymeasuringtheworld(iecapturingdataabouttheworld),onelearnsmoreaboutthe
worldandthusinterventionscanbebetterdesigned.Thisisthevisionofempiricalscience.Weproposeto
improvehowdataiscapturedandusedtoadvanceourunderstandingoftheworld.
Theworldisfullofdata,butonlyasmallfractionisknowntous.Ratherthanbeinggiventous(“data”
comesfromtheLatindaremeaning“togive”),itisnecessarytotakethedata–toactivelyselectandgather
it,andthen,ofcourse,todosomethingwithit.Itisthususefultodistinguishdatafromcapta26(fromthe
Latincaperemeaning“totake,seize,obtain,get,enjoyorreap”27).Thisterminologysignalsthatdata
collectionisanactiveprocess,notpassive.
Dataistraditionallyseenasthelowestlevelofahierarchythatrunsfromdatatoinformationtoknowledge
towisdom28.Implicitinthis,isthatinordertoattainknowledge(orwisdom)oneneedstostartwithdata.
Whileclearlytrueatonelevel,thisdoesnotcaptureData61’sperspectivewhichinvertsthehierarchy29,
andhasknowledge(orthedecision,actionorinterventionrequiredforaparticularproblem)astheend
point,thusfocussingtheneedsofdatacollectionandanalyticsfromthereverseperspective.Databecomes
usefulonceitisbothcaptured(capta)andthenmadesenseofthroughmodels.Themodelscanalso
provideguidanceregardingdesirablecapta.
Modelsandmodellingarecentraltomakinguseofcapta.MuchoftheworkthatData61doesismodelling
basedoncapta.Thedistinctionbetweenmodelsanddataorcaptaisblurred30;abstractlyamodelisalways
afunctionofthecapta–whetherithasasmallnumberof“parameters”ornotisirrelevant–whatmatters
isthestabilityofthemodel(ormoreprecisely,thestabilityandreliabilityoftheconclusionsdrawn,and
actionstakenfromthemodel)underdatavariations.
Theimportantpointisthatitisthemodelsthatareultimatelymanipulatedandusedforaction.While
muchismadeofa“fourthparadigm”31(socalled“data-drivenscience”)and“theunreasonable
effectivenessofdata”32,thefactremainsthatalldata-driveninterventionremainsbaseduponmodels;
theyarejustmorecomplexthanthemodelsofold.
Wethusembracethe“primacyofmethod33”ora“methoddeluge”(withmethodsas“firstclasscitizens”34)
overamere“datadeluge”,andcertainlydonotenvisage“makingthescientificmethodobsolete”35.For
science,dataalone(howeveritislinkedorpresented)isnotenough36.Neitherdatanorfactsareever
entirelyraw–theyareconstructedandtheory-laden37.Itisindeedtruethat“‘Rawdata’isbothan
oxymoronandabadidea”38.
Someofthegreatestcontributionstotherecentexplosionofinterestindata-driveneverythingcomesfrom
newmethods39withrefinednotionsoftrust(betterquantificationoferrors).Theblurredboundary
between“data”and“method”driveshowmethods(analysis)arebeingpushedtowardsthedata
(embeddedanalytics40),aswellasthepropagationofallaspectsofthedata(suchasitsprovenance)
throughtheentiremodellingprocess,inordertobetterinforminterventions.
Therealpromiseofadata-drivensocietyisthatitisan“experimentingsociety”41thatallowsdecisions,
actionsorinterventionstobecloselytiedtocapta.
Wewilldevelopnewmethodsforachievingthisuniversal“captafication”42ofthephysicalworld,the
biologicalworldandthesocialworld:
• Frommodellingofmaterialsandbiologicalorganismsatthemolecularandmacroleveltothedesignof
newmaterialsandfood
|5
• Fromsensorsmeasuringanythingthroughtotrusteddatafromthosesensorsandtheassociatedtrusted
interventionsandpolicy
• Fromallthegeospatialdatainthecountrytotherichsetofservicesthatcanexploitthisinformation
• Frompeople’sidentityandreputationtosystemsthatcanguaranteethesecurity,privacyandfairnessof
usingthisinformation
• Fromthecaptaficationofthelawandpublicpolicytomakethemachineryofgovernmenttransparentto
theusertotheverydevelopmentofnewpolicyinatrustworthyevidencedrivenmanner,and
• Fromtransforminghowscienceisdone(trackingdataandevidenceandtheanalyticalconclusions
drawn)totheempiricisationofbusiness(doingproperexperimentsaidedbytechnologiesfordata).
Ourvisionisthatbydevelopingnewandbettermethodswewillbeabletobettermodeltheworld,and
thusactbetter.Centraltothisisthenotionoftrust:
• Trustinthesourceofthedata(collectedtherightcapta)andthatitwasreliablycaptured,transmitted
andnottamperedwith(elseskepticswillchallengetheresult,orworse,wrongactionswillbetaken)
• Trustinthemodelsunderpinningthecaptureofthedata(suchmodelsalwaysleavesomethingout–
howdoesoneknowiftheomissionsdoharm?)
• Trustinthemethodsusedforanalysis(thatitisknownwhatthemethodsactuallydofromauser’s
perspectiveandthattheposterioruncertaintyisproperlycalibrated)
• Trustinhowthecaptaandconclusionsarepresentedandused(ifoneignoresthishumanelement,then
thebestmethodscanstillleadtoterribleoutcomes),and
• Trustthatlegalandmoralrightsandnotionsoffairnessarenotinfringed(elsesocietywilldisdainthe
powerofdataanalyticsbecauseofconcernsregardingitsabuse).
H2.TrustworthyAnalyticsDelivered43
Newmethodsfordataanalyticsthatofferhighdegreesoftrust,andnewmethodsofdeliveringthese
trustworthymethodswillincreasetheiruse,reduceeconomicfrictionandspeeduptheprocessfrom
inventiontodeployment.Thiswillacceleratescientificdiscovery,businessimprovementandimprove
publicpolicyoutcomes.
Theimpactofnewtechnologiescomesfromtheiruse.Wewillchangethewayanalyticsisdeliveredto
broadenitsuse.Wewillbuildtrustintothecoreofhowwecreateanddeliveranalyticstechnologies:from
themathematicalfoundationsoftrustindata-drivenconclusionsandthequantificationofcertainty;to
embeddedanalyticsatthesourceofdatacapture;and,towebservicesthatallowtheflexiblecomposition
ofanalysismethodsinareproducibleandscalablemanner,andwhichbuildinkeyelementsoftrustfrom
theoutset(provenanceandtraceability,managementoflegalandmoralrights,andmanagementand
preservationofuncertainty)44.
Background:“DataAnalytics”meansthecomputationalprocessingofcaptawiththegoalbeingtoderive
insightssuitableforcomprehension,decisionoraction.Itincludesmathematicaloralgorithmicmethodsas
wellasvisualisationandpresentationoftheresultsinamannersuitableforhumanconsumption.Analytics
isnotonlyusedbya(human)statistician;manysocio-technicalsystemshaveanalyticsembeddedintotheir
coreoperation,andallthepointsmadebelowapplytheretoo.
Presentlyanalyticsisimplementedprimarilyinamannerthatmakesitscomposition(gluingtogether
components)difficult.
|6
Thecurrentmodelleadstovariousproblems:
• Vendorsoflargesoftwarepackageshaveaninterestinlockingincustomerstotheirplatform(sothereis
relativelylittleincentivetoenablecomposabilitywithothersystems)
• Manyoftheimplementationspresumethecaptaisallinoneplace(eitherlocalorinacloud).Much
captacannotbemoved.Itmightbetoolarge(theanalysishastobeactuallydoneatthesource),orthere
isnotthelegalrighttomoveit
• Provenance,traceability,legalandmoralrightsanduncertaintyarepoorlymanaged,resultinginoutputs
ofanalyticsthatlosesightofthereliabilityandtrustworthinessoftheoriginaldata(andthustheresults
arelesstrustworthy)
• Itisdifficulttoredoanalyseswhenmistakesarediscovered(aconsequenceofthepointabove).Often
notallofthe“stateinformation”isstoredtoenablethere-runningofanalyses
• Closedecosystemsmakeithardtoimportnewtechniquesastheyareinvented.
Therearepotentialsolutionstoalloftheseproblems,allofwhichweenvisagedeveloping:
• Byembeddinganalyticsatthesourceofthedata,theburdenofmovinglargeamountsofdatais
removed.Beingabletoreachallthewaybacktotheoriginaldatasource(typicallyembeddedinacyberphysicalsystem)throughcomposabledataingestionschemesallowsbettertrackingofprovenance
• SystemsthatdeliveranalyticsasaRESTfulwebservice,thenitbecomesmorereadilycomposable.This
canremovethedownside(lock-in)ofproprietarysystems
• Bytakingthecomputationtothedata(indatacentersforexample),wecanavoidtheproblemofnot
beingabletomovethedata(forreasonsofscaleorjurisdictionalconstraints).Thisnecessitatesadvances
innotonlythesecureencapsulationofanalyticscode,butalsonecessitatestrustedmeanstocontrol
informationflow(soprivateinformationisnotexfiltratedfromthecaptabases)
• Theultimatedeliveryinvolvespresentationtousers.Byimprovingtheuserexperienceofdataanalyticsit
willbemorewidelyandreliablyused.Thisrequiresdevelopmentofvisualisationasaservicethat
representsuncertaintyandprovenanceasfirstclassobjects
• Composableprovenanceofdata(includinglegalrightssuchaslicenses)andanalyticsacrosswalled
gardensallowsincreasedtrust,reliabilityandrepeatabilityofanalytics
• Systemsthataredesignedtofederatedatafromdifferentsourcescanbypassjurisdictionalandpractical
problemsofextractinginsightsfromdistributedcapta
• Latebindingschemasorontologiesminimisethedeleteriouseffectsofpastdecisionsregardingdata
categorisationandorganisation
• Systemsthatcaptureandre-executeentireworkflowstofacilitatelate-binding,rapidprototypingand
theautomationoftranslationfromexploratorytoproductionsystems
Thecreationoftechnologiesasabovewillnotonlyacceleratetheuseofdataanalyticsforitsownsake,but
willplayacentralroleinourvisionforcyber-security–securingdata-drivenbusinessoperationsthrough
ensuringtrustworthinessinthedata.Thisisespeciallyimportantforcriticalinfrastructureprotection45.
H3.BuildingSoftware46youcanTrust
Wewilldevelopnewwaysofcreatingsoftwarethatwillbetheglobalbenchmarkintermsofquality,
securityandtrust.Widespreadadoptionwillmakesoftwarecompaniesmoreproductive,improve
cybersecurity(byaddressingtherootcauseofoneofthemainproblems)andenablehigherdegreesof
trustindata-centricsystems.
|7
Technologiesfordataareunderpinnedbysoftware,whichisthemeansbywhichdataisprocessedand
transformed.Buildingbettertechnologiesrequiresbuildingbettersoftware.Wewilldevelopthescience
andtechnologystackstobuildsoftwarethatprovablydoeswhatitissupposedtodoandnothingelse–we
willbeabletosaypreciselyandwithstrongevidencewhensoftwarewillbebug-free,provablysecure,and
willdeliverguaranteedresults.Thiswilladdressoneofthemajorcausesofproblemsincyber-security
(vulnerabilitiesthatareintroducedwhensoftwaredoesmorethan,orotherthanwhatitissupposedto
do).Wewillalsodevelopbettermethodstoquantifyrisksassociatedwithsoftwareandunderstandthe
humanfactorsthatcontributetotrustworthysoftware.
Inadditiontoincreasingthereliabilityofsoftwareagainstattacksthatcauseittodothingsotherthan
whichitshould,thesametechnologiescanbeusedtoprovideimprovedguaranteesforthe
trustworthinessofdata,whetheritisthatthedatahasnotbeenmanipulated,orthatsensitiveinformation
hasnotbeenexfiltrated.Thusimprovingthetrustworthinessofsoftwareisnotonlyessentialformaking
technologiesthatworkforyou,butalsoforensuringthatyoucantrustdataandentrustyourdatatosuch
technologicalsystems.
H4.ShapingSocietalTransformations
Technology…isnotdestiny47
–JasonFurman-July2016
Technologiesshapesociety,andtechnologiesfordatawillshapethefutureofAustraliansociety,but
thereistheopportunitytochoosewhattheseeffectsare.Bydevelopingbetterunderstandingsofthe
complexrelationshipsbetweendatatechnologyandpeople,wewillbeabletoinfluencethe
developmentanduseoftechnologiesfordatatoleadtobettersocietaloutcomes.Theresearch
necessarytoattainthisunderstandingcan(andneedsto)bedoneinconcertwiththemorenarrowly
technicalaspectsofourwork.
Newtechnologiesfordatawilltransformsociety,butthereismuchfreedomregardinghow.Ourinterestin
technologydoesnotstopwiththetechnologyitself,butextendstoitsuse.TechnologiessuchasUAVsand
autonomousvehicleswillobviouslyshapesociety,andtheirusewillbeshapedbywhatsocietyfinds
acceptable.Collectively,astechnologistsandscientists,wecannotignorethesocietalimplicationsofour
work.Thesamebasictechnologicalprinciplescanbeusedinmanydifferentways;someofwhicharemore
usable,helpfulandbeneficialtopeoplethanothers.Wewilldevelopnewwaysofenvisagingand
influencingthesesocietaltransformations.
Thiswillinvolvenewapproachestotheethnographyoftechnology(betterunderstandingpeople’s
relationshipwithdata-driventechnology,especiallyintermsoftrust)andderivingtechnologicalforesights.
Thisgoalalignswithstrategy2oftherecentlyreleasedUSNationalArtificialIntelligenceResearchand
DevelopmentStrategicPlan48:“Developeffectivemethodsforhuman-AIcollaboration.Ratherthanreplace
humans,mostAIsystemswillcollaboratewithhumanstoachieveoptimalperformance.Researchis
neededtocreateeffectiveinteractionsbetweenhumansandAIsystems.”
Wewillreimaginewhatitmeanstobehumaninadata-drivenworld.Wewilldevelopnewtechnologiesfor
ensuringrichnotionsofprivacyandtransparencyinadata-drivenandalgorithmicworld.Wewilldevelop
newunderstandingsofthecomplextechnicaltradeoffsbetweenusability,security,privacy,efficiencyand
fairness.Wewillstudyhowtobuilddata-drivensocietalinstitutionsthatcitizenscantrust.Wewilldesign
newcomputationalmechanismstoenhancesocialwelfare,enabledbypervasivetechnologiesfordata.
Wewilldevelopnewmethodologiesthatexploitdata-technologiestobetterunderstandhowdatatechnologiesthemselvesendupbeingused(includingthederivationofqualitativeinsightsfrom
quantitativedata).Thiswillextendthereachofuser-experiencedesigntonewareas,andadvanceitsstate
oftheart.Andwewilldevelopneweconomicandbusinessmodelsenabledbydata-technologiesina
mannerthatseekstomaximisebenefitforAustraliaasawhole.
|8
ScientificChallengesandFoci
Theoriesarenets:onlyhewhocastswillcatch.
–Novalis
Inthissectionarelistedsomescientific49challengesarising
fromtheabovevisions.Thesearenotallthescientific
challengeswewilltrytosolve,buttheycapturemuchofwhat
weaimtodo.Inallcasesthetimelineisroughly5-10years.
Whileeachofthesechallengesismotivatedandinspiredby
broadersocietalchallenges,theparticularimpactsonecan
expectofscientificadvancesarenotoriouslydifficultto
predictonsuchatimescale(impactcanbepredictedmore
reliablyforshortertermprojects).Thus,apartfromsome
rathergeneralstatements,thereisnospecificpredictionof
impactarisingfromthescientificchallenges.
Ihavetriedtostateahighlevelchallenge(inred)followedby
someexplication.Itwouldbeimpossibletooutlineallthe
possibilities,andthoselistedarenotmeanttobetoo
prescriptive.
Inallcasestheyarestatedas“Howto…”.Thisisbotha
scientificchallenge(developmentofnewknowledgeand
understanding)aswellasatechnologicalone(development
oftechniquesandmethodsandsystemsthatachievethe
goal).
AreasofScientificChallenge
•
MaterialsandData
•
Physical/BiologicalSystemsand
Data
•
InstitutionsandData
•
TrustworthySoftwareConstruction
•
•
Architectureforcomposability,
compartmentalisationand
resilience
DistributedTrustMechanisms
•
Analysing,Representingand
ModellingData
•
Quantificationofandreasoning
withriskanduncertainty
S1.MaterialsandData
Howtoturnmaterialsintodatasotheycanbemanipulatedanddesigned?
Tounderstandmaterials(sotheycanbesynthesised,manipulatedandchanged)oneneedstounderstand
themandtrustthatunderstanding(modellingandsynthesis).Materialsarenotsystems(forthepurposeof
thisdocument).Thequestionappliestobothnon-organicandorganicmaterials(includingforexample
food).
Howtodesignmaterialsinadata-drivenmanner–fromquantummonte-carlo(forengineeringmaterials)
throughtofooddesignedinresponsetogeneticinformation?
S2.Physical/biologicalsystemsanddata
Howtoembeddataintophysicalsystems;understandphysicalsystemsthroughdata-drivenmodels;and
design,buildandcontrolphysicalsystemsbyusingdata?
Thisincludeschallengesinroboticsandsensornetworksandintheprocessingofvisualdata–howto
embedtrustedanalyticsintophysical,biologicalandenvironmentalsystems.Howtousedatatoincrease
trustindata-centricsystems(suchastheinternetofthings),forexamplebybettermanagementofprivacy.
Howtobettermodelphysicalsystemsusingdata(ormoreprecisely,howtoimprovethatmodelling,which
isthecorebusinessofallscientists,usingmoderntechnologiesfordata).
|9
Howtocontrolphysicalsystemswithdatainamannerthatyoucantrust?Howtoturnphysicalor
biologicalobjects(egscientificspecimens,oraspectsoflivingsystems)intodatacheaplyandatscaleina
mannerthatcanbetrusted?Howtomaptheworldmorereliably(usingspatialdataasatestbedfor
analyticspipelines)?Howtobuildautonomoussystemsfordatagatheringinthefield.Howtomanagethe
ingestionofsemi-structuredsensordata?Howtomanagetheprovenanceofdatagatheredintheworld?
S3.InstitutionsandData
Howtorepresent,augment,understand,manageandcontrolinstitutionsbetterusingdata?
Iuse“institutions”intheeconomist’ssense50whichincludesgovernment,thelegalsystem(statutelaw,
regulation),businessprocesses,andcontracts,etc.Thechallengeistorepresentthesesocietalsystems
usingdatathatcanbeprocessedandreasonedwithbyamachine.Solvingthisinvolvesadvancingthestate
oftheartofnaturallanguageprocessing(eg,targetedatspecialisedusesofEnglish,asinstatutelawand
contracts)andthedevelopmentoftoolsthatallowthecraftingoflegalinstrumentsinamannersimilartoa
modernprogrammingdevelopmentenvironmentthatwillguaranteepropertiessuchasconsistency,but
willalsoemithumanreadableversionsoftheinstruments.
Anotherchallengeishowtousetechnologiesfordatatoimproveinstitutions,forexamplebydata-driven
experimentationforpolicydevelopment51.Partofthesolutionislikelytobeaidingthechangeofroleof
governmentfromownerofassets,ordelivererofretailservicestowholesalerandarchitectofmodular
systems.
S4.Trustworthysoftwareconstruction
Howtoconstructsoftwarethatdoeswhatitissupposedtoandnothingelse?
Howtomaketechnologiesthatconstructssoftwarethatguaranteesitscorrectness,invulnerabilityand
otherproperties(egrealtimeguarantees).Onecanasksimilarquestionsregardinginteractionand
communicationprotocols.Particularchallengesinclude:mixed-criticality,real-time,multicore,sidechannels;informationflow;concurrentsystemsverification;protocolverification(asameanstodealwith
compositionandbreakthebackofconcurrency);automationofproofeffort.Howtospecifyandquantify
dimensionsofsecurity(turningitfromabinarypropertytoareal-valuedpropertyyoucanreasonabout
fromarisksensitiveperspective)?Howtoensuretrustworthinessofmobilecode(especiallyforanalytics)?
S5.Architectureforcomposability,compartmentalisationandresilience
Howtobuilddata-centricsystemsthatcanbereliablycomposedandcompartmentalisedandwhichare
resilient,robustandtrustworthy?
Data-centricsystemsarethemostcomplexartefactsdesignedbyman.Thechallengeistodesignthem
(includingcyber-physicalandcyber-societal)inamannerthatfacilitatescomposition,
compartmentalisationandresilience.Thisisnecessaryinordertoimprovethereliabilityand
trustworthinessofsuchsystems.
Thischallengeisarchitectural(includingquestionssuchashowtocomposetrust–justbecauseyouhave
trustedcomponentsdoesnotguaranteetheircompositioncanbe)butincludesquestionssuchashowto
monitorandmanagesuchlargesystems(supervisorycontrolanddiagnostics).Examplesthatareworthyof
attackincludehowtoarchitectlargedistributeddataanalyticssystems.Howcantrustinsuchsystemsbe
quantified,measuredandmanaged?
|10
S6.Distributedtrustmechanisms
Howtomanagetrustindistributeddata-centricsystems?
Trustunderpinshumaninteraction,andthusdata-technologiesthatmediatesuchinteractionsmust
managetrust.Thechallengesincludehowtoensuretrustworthyprovenanceofdataandoperationson
data(provenanceisakindofdualtosecurity:provenancetellsyoureliablywherethedatacamefromand
whodidwhattoit;datasecurityreliablyensureswherethedatacangoandwhocandowhatwithit).Thus
wewillstudybothprovenanceandsecuritytogether.Thisneedstobedoneinarisksensitivemanner(see
S8).Howtobuildricher,betterandmoreapplicabledistributedledgersandalliedtechnologies?Howto
understandandquantifytheirsecurityandreliability?Howtobuildsocialchoicemechanismsthatcanbe
trusted?Howtobuildthecommunicationstechnologythatunderpinsdistributedtrust?
S7.Analysing,RepresentingandModellingdata
Howtoderiveinsightfromdatathatcaninformaction?
Howdoyoumakesenseofdata?Howtomakesenseofallthemethodsthatdoso?Howtobuildmodels
thatareusableandre-usable.Howtoexploitcomplex,structureddatawithallofthemessoftheworldin
theway?Howtomodelcomplexphenomena(ecologies,language,societies)usingdata?Howtomake
suchmodelstrustedandreliableandcomposable?Howtobestcommunicatesuchmodelstopeoplefor
action?Howtoactanddecideuponmodelsofdata?Howtomanipulatedatarepresentationsofthe
world?Toolsformanagingmultiplerepresentationsofdataandmanipulatingthem(music,law,biology).
Howtoexploitcomputationalandalgorithmadvancestobuildbettertechnologiesfordataanalysis?
Thisallneedstobedoneinthecontextofthestructureofdata;dataisnotmerelyastringofbits.Manyof
thetypesofdatathatwillhavethelargestimpactsarehighlystructured(naturallanguages,video,social
networks,etc).Advancingthestatedgoalwithrespecttothesedatatypesrequiresdeepscienceand
technologystacks(thatcanbeusedacrossdiverseapplicationdomains).
S8.Quantificationofandreasoningwithriskanduncertainty
Howtoquantitativelyrepresenttherichsourcesofriskanduncertaintyrepresentedbydata,andhowto
reliablyreasonwiththis?
Whilstdatacansometimesreduceuncertainty,itdoesnotremoveit;decisionsstillneedtobemadeinthe
faceofuncertainty.Furthermore,theincreasingcomplexityofdata-drivensystemsmeansthatthe
managementofpartialinformation,uncertaintyandambiguityisessential.Howcanthisbedoneinarisksensitivemanner?Howcanallaspectsofdatatechnologybemaderesilienttouncertainty?Howcan
differentnotionsofuncertaintybecombined(relativetotheinferenceofdecisiontaskathand),andhow
canitbereasonedwithinaneffectivemanner?Howcanuncertaintyandriskbeeffectivelycommunicated
andvisualised?Howcanlegalrights,securityandprivacybemaderisksensitive?
S9.Fundamentallimitsofdata
Howtodeterminethelimitsofwhatcanbedonewithtechnologiesfordata?
Alltechnologiesfordatahavelimits.Howcanthesebedeterminedandcatalogued?Andhowcanwe
approachtheselimits?Withoutknowingwhatthefundamentallimitsareitisnotpossibletoknowwhena
technologymaybreakdownandwheretoputefforttopreventthisfromhappening.
|11
Thischallengecutsacrosseverythingwedo,isafundamentaldifferentiator,andprovidescredibilityforour
statusaspartofascientificresearchorganisation.Italsosetsatargetforother,less“fundamental”,work
bysettingagoldstandardtoapproach.
Challengesincludewhatispossiblewithdataanalytics,optimisation,distributedtrustmechanisms,and
indeedalldatatechnologiesweexamine.Challengesincludecharacterisingthedifficultyoflearningfrom
data,inferringcausality,dealingwithnoise,protectingprivacy,transmittingandsharingdata,andsolving
computationalproblems.
Therearelimitsintermsofdata,knowledge,computation,energy,timeandspace.Aswellaslimitsto
technicalcomponents,therearealsolimits(whichneedtobedetermined)tocompositesystems(suchas
trust,stability,andabilitytocontrol).Therearealsolimitstosocio-technicalsystemsbuiltwithdata
technologies(forexamplecomputationalsocialchoice,limitsto“fairness”andothersyntheticproperties)
andlimitsarisingfromhumanabilitiesorinabilities.
S10.Shapingdata-drivensociety
Howtounderstandwhatitmeanstobehumaninadata-drivenworld?
Whatdoesitmeantobehumaninadata-drivenworld?Howcanourhumanitybeenhancedbydatadriventechnologies;howcanwepreventharm?Howcanwebuilddata-technologiesthataremeaningful
andvaluabletosocietyatlarge?Howcanweencourageandassistcommunitiesintheiradoptionof
technologiesfordatatoimprovetheirlives?
Solvingthischallengewillrequirethedevelopmentofnewethnographicmethodsfordata-centric
technologies.Itwillalsorequireongoingresearchonhowpeopleinteractwithdata-technologiesfromthe
perspectiveofdecisiontheory(socialchoice,boundedrationality,etc.).
Suchnewmethodswillenabletheattackingofchallengessuchashowtodesigndata-technologiesthat
betterprotectusability,privacy,securityandconfidentiality.Itcouldalsoprovidescientificunderpinnings
forthepracticeofUXdesign.
|12
Impacts
Data61’sL-shapedmodel(seepage1)meansthatourimpacts
aretheproductofourscientificcapabilitieswithmarketforces
andopportunities.Theseimpactsaremanagedthroughour
businessdevelopmentandproductmanagementprocesses.A
givenscientificcapabilitycandeliverimpactinmanyend-use
problems52;agivenmarketneedcanbesatisfiedbymany
differentscientificcapabilities53–seetheschematictotheright.
Scientific
Capabilities
MarketDriven
Projects
Thesciencedrivenchallengesareourviewofwheretechnology
needstomove.Theend-useprojectswedowilllargelybedriven
bythemarket’sviewofthis.Itwillbeprimarilythroughthese
projectsthatthesciencewillhaveitslargerimpact.Thisimpact
canbecategorizedinmanyoverlappingways.Threearegiven
below:
Generalcategories:
•
•
•
•
•
ImprovementintheefficiencyofAustralianbusinesses
ImprovementintheefficiencyofAustraliangovernments
Improvedreliability,safetyandsecurityofdata-technologies
Generationofnewindustries,especiallyplatformcentricones
Improvementinthespeedandeffectivenessofscientificdiscovery.
Data61marketfocuscategories(inpartnershipwithotherBUswhere possible):
•
•
•
•
•
•
•
•
•
SafetyandSecurity
Health&Communities
FutureCities
IoT/IndustrialInternet
Agri-business
SpatialIntelligence
Data-drivenGovernment
EnterpriseServices+Fintech
Defence
WholeofCSIROcategories54
•
•
•
•
•
•
Foodsecurityandquality
Cleanenergyandresources
Healthandwellbeing
Conservationanduseofournaturalenvironment
Innovativeindustries
AsaferAustralia
Data61’sresearchinsupportofthescientificvisionofthepresentdocumentwillsupportprojectsinthese
impactareas,andwillthusfindpathwaystoimpactthroughthem.Individualprojectsareresponsiblefor
analysing,shapingandarticulatingwhatthosepathwaysandimpactswillbe.Thisneedstobedoneinan
agilemanner,adaptingtoopportunities,butbuildinguponourfocusedscientificcapability.
|13
Endnotes
1
Itisdeliberatelycalleda“vision”,andnot(metaphorically)a“roadmap”–aroadmapisatwo-dimensionalgraphical
representationofsomethingthatalreadyexists(roads),andisrarelysomethinginspiringandexciting;atbesta
“science/technologyroadmap”itisavisualdepictionoftheexpectedtemporalevolutionofatechnologicalproduct
family(RonaldN.KostoffandRobertR.Schaller,ScienceandTechnologyRoadmaps,IEEETransactionsonEngineering
Management,48(2),132-143(2001);LianneSimonse,JanBuijs&ErikJanHultink,Roadmapgroundedas‘visual
portray’:Reflectingonanartifactandmetaphor,HelsinkiEGOS2012Sub-theme09:(SWG)ArtifactsinArt,Design,and
Organization(2012))whichsuffersbybeingcontrainedtoatwodimensionalvisualform.Conversely,a“vision”canbe
ofsomethingthatdoesnotexist,andcaninspireandexciteandisnotcontrainedtofitanyparticularformat.Ittells
wherewewanttogo,andoutlinesinbroadstrokeshowwemightgetthere,withoutactuallypinningtheexactpath
down.Itisasciencevisioninthegeneralsenseoftheword“science”–systematisedknowledge;seeendnote4.We
expecttodevelopmoretraditionaltechnologyroadmaps(i.e.temporallylinearexpectationsandplans)forparticular
productandserviceofferingswhichwedevelop.
2
Atdifferenttimesincomputing’sevolution,eitherthedemand(market)orthetechnologypushsidehavebeen
dominant;butitisneverjustoneortheother;seeJanvandemEndeandWilfredDolfsma,Technologypush,demand
pullandtheshapingoftechnologicalparadigms–Patternsinthedevelopmentofcomputingtechnology,Journalof
EvolutionaryEconomics15,83-99(2005).Therealityis,ofcourse,complex,andrecombination(themixingupof
differentideas)playsanessentialpart(CristianoAntonelli,JackieKrafft,FrancescoQuatraro.RecombinantKnowledge
andGrowth:TheCaseofICTs,StructuralChangeandEconomicDynamics,Elsevier,21(1),50-69(2010))andthe
“demand-pull”modelseemstobelosingfavorasasatisfactoryexplanation(BenoitGodinandJosephP.Lane,“Pushes
andPulls”:TheHi(story)oftheDemandPullModelofInnovation,ProjectontheIntellectualHistoryofInnovation,
workingpaperNo13(2013);BenoitGodin,InnovationContested:TheIdeaofInnovationovertheCenturies,Routledge
(2015)).
3
Thedocumenthasmultipleintendedaudiences:
•
DATA61talent(existingandpotentialfuture)–toalignwhatwedo,tohelpussay“no”toopportunitiesthat
donotalign,andtoachievelargeimpactmultiplicatively.
• RestofCSIROandexternalpartners–toarticulateourownlongertermresearchgoalstoserveasoneofthe
filterswewillapplyinconsideringengaginginjointprojects.
• Widerpublic–toexplainwhatwedo.
4
Itwouldbeunfortunate,andunhelpful,togethunguponthedistinctionbetweenscience,engineeringand
technology.Thisdocumentpresentsanaspirationforthenewknowledgewewillcreate–novumscientia.While
engineeringknowledgeisdifferentfromscientificknowledge(WalterG.Vincenti,WhatEngineersKnowandHowThey
KnowI:AnalyticalStudiesfromAeronauticalHistory,TheJohnsHopkinsUniversitypress(1990))andtechnologyis
morethanmerescientificknowledge(W.BrianArthur,TheNatureofTechnology:WhatitisandHowitEvolves,Simon
andSchuster(2009)),theessenceofengineeringresearch(theimprovementoftechnology)remainstheproductionof
newknowledge(EdwinT.LaytonJr,TechnologyasKnowledge,TechnologyandCulture15(1),31-41(January1974)).
TheresearchData61doesspansalloftheseheadings,andmore,suchas“design-driveninnovation”–thephraseis
fromRobertoVerganti’sbookDesign-DrivenInnovation:ChangingtheRulesofCompetitionbyRadicallyInnovating
WhatThingsMean,HarvardBusinessPress(2009)–newbusinessmodels,andethnographicapproachestodata
technologies.
Weshouldaspiretoseeknewknowledge(motivatedbyrealproblemsandthedesiretoimproveourcurrent
technologies)whereverittakesus,inthespiritofthegreatresearchersofthepast(LisaJardine,IngeniousPursuits:
BuildingtheScientificRevolution,LittleBrown,London,1999;JennyUglow,TheLunarmen:TheFriendsWhoMadethe
Future,FaberandFaber2002).OurinspirationsandrolemodelsshouldbepolymathssuchasRobertHooke(Lisa
Jardine,TheCuriousLifeofRobertHooke:TheManwhoMeasuredLondon,HarperCollins(2003);StephenInwood,
TheManWhoKnewTooMuch:TheStrangeandInventiveLifeofRobertHooke1635-1703,MacMillan(2002);Robert
D.Purrington,TheFirstProfessionalScientist:RobertHookeandtheRoyalSocietyofLondon,Birkhauser(2009);Jim
Bennet,MichaelCooper,MichaelHunterandLisaJardine,London’sLeonardo–TheLifeandWorkofRobertHooke,
OxfordUniversitypress(2003))orCharlesBabbage(LauraJ.Snyder,ThePhilosophicalBreakfastClub:Four
RemarkableFriendswhoTransformedScienceandChangedtheWorld,BroadwayBooks(2011))bothofwhomfreely
movedbetweenscienceandtechnology.
|14
Asnotedlongago(RobertP.Multhauf,TheScientistandthe“Improver”ofTechnology,TechnologyandCulture1(1),
38-47(1959)),thereisnoperfectwordfortheimproveroftechnology:“engineer”iswidelyused,butitstillprimarily
referstotheexpertpractionerandnotnecessarilytheimprover.Perhapswe,asimproversoftechnologiesfordata,
shouldnotworrywhetherwhatwedoisadequatelydescribedas“science”,“engineering”oranythingelse,andjust
refertoourselvesbyHilaryCinis’elegantneologism:“datanauts”.
5
Itiscommonthatvisionstatementsbecomeall-encompassing,excludingnothing.Thatthepresentvisiondoesnot
aimtocovereverythingcanbetestedbycomparingittothesubstantiallybroadersetofgoalsinFutureScience–
ComputerScience:MeetingtheScaleChallenge,AustralianAcademyofScience(2013),orPresident’sCouncilof
AdvisorsonScienceandTechnology,ReporttothePresidentandCongress.DesigningaDigitalFuture:Federally
FundedResearchandDevelopmentinNetworkingandInformationTechnology,ExecutiveOfficeofthePresident
(December2010).
6
rd
SeeJohnArchibaldWheeler,Information,Physics,Quantum:TheSearchforLinks,inProceedingsofthe3 InternationalSymposiumontheFoundationsofQuantumMechanics,Tokyo,(1989);HectorZenil(Ed.),Acomputable
universe:understandingandexploringnatureascomputation,WorldScientific(2013);RolfLandauer,Uncertainty
principleandminimalenergydissipationinthecomputer,InternationalJournalofTheoreticalPhysics21(3/4),283297,(1982);RolfLandauer,Thephysicalnatureofinformation,PhysicsLettersA,217,188-193(1996);AntonieBerut
etal.,ExperimentalverificationofLandauer’sprinciplelinkinginformationandthermodynamics,Nature483,187-190,
(8March2012);JuanM.R.Parrondo,JordanM.HorowitzandTakahiroSagawa,ThermodynamicsofInformation,
NaturePhysics,11,131-139,(February2015);GillesBrassard,Isinformationthekey?NaturePhysics1,2-4,(October
2005).
7
Jean-MarieLehn,PerspectivesinSupramolecularChemistry—FromMolecularRecognitiontowardsMolecular
InformationProcessingandSelf-Organization,AngewandteChemieInternationalEditioninEnglish,29(11),1304–
1319,(November1990);Jean-MarieLehn,Supramolecularchemistry–scopeandperspectives–molecules–
supermolecules–moleculardevices,NobelPrizeLecture,(8December1987).
8
JohnMaynardSmith,Theconceptofinformationinbiology,PhilosophyofScience67(2),177-194(2000);confer
LadislavKovac,Informationandknowledgeinbiology:timeforreappraisal,PlantSignallingandbehaviour2(2),65-73
(2007).
9
DavidEasleyandJonKleinberg,Networks,crowdsandmarkets:reasoningaboutahighlyconnectedworld,
CambridgeUniversityPress(2010).
10
FriedrichA.Hayek,Theuseofknowledgeinsociety,TheAmericanEconomicReview,35(4),519-530(1945);George
J.Stigler,TheEconomicsofInformation,TheJournalofPoliticalEconomy69(3),213-225(1961);JosephE.Stiglitz,
Informationandthechangeintheparadigmineconomics,NobelPrizeLecture8(December2001).
11
WernerCallebautandDiegoRaskim-Gutman,Modularity:Understandingthedevelopmentandevolutionofnatural
complexsystems,MITPress,(2005);JeffClune,Jean-BaptisteMouretandHodLipson,Theevolutionaryoriginsof
modularity,ProceedingsoftheRoyalSociety(seriesB),280,20122863(2013)
12
DavidLazer,AlexPentland,LadaAdamic,SinanAral,Albert-LazloBarabasi,DevonBrewer,NicholasChristakis,
NoshirContractor,JamesFowler,MyronGutmann,TonyJebara,GaryKing,MichaelMacy,DebRoyandMarshallVan
Alstynr,ComputationalSocialScience,Science323,721-723(2009).
13
CommitteeontheMathematicalSciencesin2025,BoardonMathematicalSciencesandTheirApplications,Division
onEngineeringandPhysicalSciences,NationalResearchCounciloftheNationalAcademies,TheMathematical
Sciencesin2025,TheNationalAcademiesPress,(2013).
14
CristianS.Calude(Ed),RandomnessandComplexity:FromLeibniztoChaitin,WorldScientific,(2007).
15
RichardG.Lipsey,KennethI.CarlawandCliffordT.Bekar,EconomicTransformationsGeneralPurposeTechnologies
andLong-TermEconomicGrowth,OxfordUniversityPress(2005).
16
RobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture:
NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,Australian
CouncilofLearnedAcademies,September2015.
17
NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopment
Subcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).
18
Thesecomplementotherbroaderprinciplesunderpinningeverythingwedo,suchasnationalbenefit;seethe
Data61operatingmodeldocument.
19
|15
“We”herereferstothebroaderData61+network.ThisprincipleimpliesavoidingNIH(NotInventedHere)
syndrome;wedonotneedtoinventeverythingourselves.Weshouldfocusonthethingsthatwe,andwealone,can
do;andthennetworkwithothersinarichandcomplexmanner.Itwouldbesupremelyironicifourorganisationthat
underpinstheinformationsocietydoesnotembraceallofitsimplications(ManuelCastells,TheRiseofNetwork
nd
Society(2 Edition),Wiley-Blackwell(2010)).
20
Thewordispinchedfromasuitablyinspiringinstitution:TheMITmedialab,whichsodescribesitself
https://www.media.mit.edu/about.Theprinciple,ofcourse,impliesmuchcollaborationwithotherdisciplines,but
goesbeyondthetraditional“multi-disciplinary”toastrongerproblem-orientedperspective–“Therearenosubject
matters;nobranchesoflearning–or,rather,ofinquiry:onlyproblemsandtheurgetosolvethem.Asciencesuchas
botanyorchemistry…is,Icontend,merelyanadministrativeunit”(KarlPopper,RealismandtheAimofScience,
RowmanandLittlefield(1983)).Suchastanceimplieswidespreadcollaborationwithoutfearofcrossingboundaries.It
doesnotimplyalackof“canon”orcore;ourcanonisprimarilythatofcyberneticsbroadlyconstrued.
21
Thisviewpointisgiventhefancynameof“technologicaldeterminism”withtheconcomitantfearof“autonomous
technology”(LangdonWinnerAutonomoustechnology:Technics-out-of-controlasathemeinpoliticalthought.MIT
Press,1978).Thecounteristhattechnologiescanbe,andare,shapedbysociety.Therealityisthatwhiletechnologies
doindeedhave“momentum”(ThomasP.Hughes"Theevolutionoflargetechnologicalsystems."Pages51-82in
WiebeE.Bijkeretal.(eds),Thesocialconstructionoftechnologicalsystems:Newdirectionsinthesociologyand
historyoftechnology(1987))and“drivehistory”(MerrittRoeSmithandLeoMarx.Doestechnologydrivehistory?The
dilemmaoftechnologicaldeterminism.MITPress(1994))thereremainsahugefreedomofchoiceintermsofhow
theyareusedandtheirpreciseform.Likealltechnologiesofthepast,technologiesfordatacanalsobeshapedfor
socialandnationalbenefit.
22
RussellHardin,TrustandTrustworthiness,RussellSageFoundation,NewYork,(2002);FrancesFukuyama,Trust:The
SocialVirtuesandtheCreationofProsperity,SimonandSchuster(1995);EricM.Uslaner,TheMoralFoundationsof
Trust,CambridgeUniversityPress(2002).Anexcellentshortsummaryofthesocialsideoftrustischapter21ofJon
Elster,ExplainingSocialBehaviour:MoreNutsandBoltsfortheSocialSciences,CambridgeUniversityPress(2007).
People’strustintechnologyisacomplexmatter(KarenClarke,GillianHardstone,MarkRouncefieldandIan
Sommerville,TrustinTechnology:ASocio-TechnicalPerspective,Springer(2006);MeinolfDierkesandClaudiavon
Grote(eds),BetweenUnderstandingandTrust:ThePublic,ScienceandTechnology,Routledge(2000));andtrustin
technologicalexperts(asopposedtothetechnologyitself)issurprisinglyweaklycorrelatedwithperceptionsofrisk
(LennartSjoberg,LimitsofKnowledgeandtheLimitedImportanceofTrust,RiskAnalysis21(1),189-198(2001)).
23
InthesenseofGeorgeLakhoffandMarkJohnson,MetaphorsweLiveBy,TheUniversityofChicagoPress(1980)–
notasamererhetoricalflourish,butasanessentialwayinwhichtomakesenseofwhatwedo.
24
Trustisaverycomplexnotion,andmeansdifferentthingstodifferentpeople:(D.HarrisonMcKnightandNormanL.
Chervany,TheMeaningsofTrust,UniversityofMinnesota,(1996);DonnaM.Romano,TheNatureofTrust:
ConceptualandOperationalClarification,PhDthesis,LouisianaStateUniversity(2003)).
Thecomplexityisillustratedfollows:
Trusthasnotonlybeendescribedasan“elusive”concept,butthestateoftrustdefinitionshasbeencalleda“conceptual
confusion”,a“confusingpotpourri”,andevena“conceptualmorass”.Forexample,trusthasbeendefinedasbotha
nounandaverb,asbothapersonalitytraitandabelief,andasbothasocialstructureandabehavioralintention.Some
researchers,silentlyaffirmingthedifficultyofdefiningtrust,havedeclinedtodefinetrust,relyingonthereadertoascribe
meaningtotheterm.(D.HarrisonMcKnightandNormanL.Chervany,TrustandDistrustDefinitions:OneBiteataTime,
inR.Falcone,M.Singh,andY.-H.Tan(Eds.):TrustinCyber-societies,LNAI2246,pp.27–54,Springer-Verlag(2001)).
Perhaps,like“culture”(conferKroeber’s164definitionsofculture:AlfredL.KroeberandClydeKluckhorn,Culture:A
criticalreviewofconceptsanddefinitions,PeabodyMuseumofAmericanArcheologyandAnthropology,(1952)or
“technology”(conferRobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,Technologyand
Australia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomic
systems,AustralianCouncilofLearnedAcademies,(September2015)),itmakeslittlesensetoattempttodefinetrust,
butratherweshouldfocusuponthetechnologicalandscientificproblemswewanttosolve(asdoneinthemaintext).
Thenotionoftrustasaconceptincomputinghashadattemptstoformaliseitforsometime,startingatleast20years
ago(StephenPaulMarsh,FormalisingTrustasaComputationalConcept,PhDthesis,UniversityofStirling,(1994)),
withconferencesonthetopicstartingoverdecadeago(SokratisKatsikas,JavierLopezandGuntherPernul(eds),
TrustandPrivacyinDigitalBusiness:FirstInternationalConfernce,Trustbus2004,Springer(2005);ThorstenHolzand
th
SotirisIoannidis,TrustandTrustworthyComputing:7 InternationalConferenceTRUST2014,Springer(2014)).
Onereasonforthecomplexityisbecauseofthemanythreatstotrust(inthesamewaytherearemanythreatsto
security,whichneedtobeexplicitlydeclaredormodelled:AdamShostack,ThreatModelling:DesigningforSecurity,
Wiley(2014)).Butprimarilythecomplexitycomessimplyfromthediverseelementstotrustindata-centricsystems
|16
including,butnotlimitedto:
•
•
•
•
•
•
•
•
•
Trustinthereliabilityofsoftware(neverabsolute:seeDonaldMacKenzie,MechanizingProof:Computing,
RiskandTrust,MITPress(2001);JuanC.BicarreguiandBrianM.Matthews,ProofandRefutationinFormal
rd
SoftwareDevelopment,3 IrishWorkshoponFormalMethods(1999));
Trustinsecurity(e.g.JeffreyJ.P.Tsai,PhilipS.You(eds),MachineLearninginCyberTrust:Security,Privacy,
andReliability,Springer(2009));
Trustindatamanagement(MilanPetkovicandWillenJonker(eds),Security,Privacy,andTrustinModern
DataManagement,Springer(2007));
Trustinthecredibilityofinformation,suchaswhichscientificresultsonecanrelyupon:(ChristineL.
Borgman,ScholarshipintheDigitalAge:Information,InfrastructureandtheInternet,MITPress(2007))and
whatsensormeasurementsonecantrust(J.C.Wallis,C.L.Borgmann,MatthewMayernik,AlbertoPepe,
NithyaRamanathanandMarkHansen,KnowthySensor:Trust,DataQuality,andDataIntegrityinScientific
DigitalLibraries,11thEuropeanConferenceonResearchandAdvancedTechnologyforDigitalLibraries,
September16–21,2007,Budapest,Hungary(2007)).Thisisalreadyfront-of-mindinworksuchas“beeswith
backpacks”thatData61hasdone.Itishardlyanewconcern–the(apparentlysimple)notionofascientific
measurementisdeeplyentangledwithnotionsoftrust,asisevidentfromthehistoryofVictorianscience
(GraemeJ.N.Gooday,TheMoralsofMeasurement:Accuracy,Irony,andTrustinLateVictorianElectrical
Practice,CambridgeUniversityPress(2004)).
Trustthatsocialmechanismsbuiltwithdata-technologiescannotbemanipulated(SeeEricFriedman,Paul
ResnickandRahulSami,Manipulation-ResistantReputationSystems,Chapter27inNoamNisan,Tim
Roughgarden,EvaTardosandVijayV.Vaziriani,AlgorithmicGameTheory,CambridgeUniversityPress
(2007));
Trustthatsensitiveinformationisnotleaked(GuillermoLafuente,Thebigdatasecuritychallenge,Network
security2015.,12-14(2015);
Trustthatdata-analyticsarefair(SolonBarocasandAndrewD.Selbst.Bigdata'sdisparateimpact.California
LawReview104(2016);DanahBoydandKateCrawford,Sixprovocationsforbigdata.InAdecadeininternet
time:Symposiumonthedynamicsoftheinternetandsociety(pp.1-17).OxfordInternetInstitute,
(September2011));
Trustinthecommunicationsystemunderpinningdatatechnologies(WhiteHouse:"Cyberspacepolicy
review:Assuringatrustedandresilientinformationandcommunicationsinfrastructure."WhiteHouse,
UnitedStatesofAmerica(2009)).Thereisnoperfectlytrustablecommunicationsystem,andsolikeallother
elementsofthetrustchain,arisksensitiveapproachwillbewarranted.
Trustthattheoverallsystemsconstructedcanbesufficientlyreliedupon(PiotrCofta,Trust,Complexityand
Control:ConfidenceinaConvergentWorld,JohnWileyandSons(2007)).
25
Thephrasealludestoanadmirablenovelabouttwofamousscientistswhoarefurther(inadditiontoHookeand
Babbage–seeendnote4)greatrolemodelsforData61–AlexandervonHumboldtandCarlFreidrichGauss(Daniel
Kehlman,MeasuringtheWorld,Pantheon(2006)).Humboldtisoneofthemostimportantcreatorsofmodern
science,whoundertookoutstandinglypainstakingdatagatheringandanalysis(AndreaWulf,TheInventionofNature:
TheAdventureofAlexandervonHumboldt,LostHeroofScience,JohnMurray,(2015)).Gaussisfamouslycreditedas
theoriginatorofleastsquaresdataanalysis(StephenM.Stigler,Gaussandtheinventionofleastsquares,TheAnnals
ofStatistics,9(3),465-474(1981))andthusoneofthefathersofmoderndataanalytics.
Inanearlierversionofthisdocument,Iusedtheawkwardpolysyllabicneologism“datafication”,apparentlycoinedin
thearticlebyKennethCukierandViktorMayer-Schoenberger:TheRiseofBigData,ForeignAffairs28–40,May/June,
(2013).Itisalreadywidelyused,butitisanuglywordthatmanyData61folksreactednegativelyto,and,crucially,it
missesthedistinctionbetweendataandcapta(seebelow).
26
Thisdistinctionisquiteold,butrarelyused.SeeRobKitchin,TheDataRevolution:Bigdata,opendata,data
infrastructuresandtheirconsequences,Sage,LosAngeles(2014);thisexplainssomeofthehistoryoftheword;
ChristopherChippindale,Captaanddata:onthetruenatureofarchaeologicalinformation,AmericanAntiquity65(4),
605-612(2000);BettinaBerendt,BigCapta,BadScience?Ontworecentbookson“BigData”anditsrevolutionary
potential,DepartmentofComputerScience,KULeuven,
https://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf(March2015).
27
QuotedfromtheentryforcaptusinALatinDictionary.FoundedonAndrews'editionofFreund'sLatindictionary.
revised,enlarged,andingreatpartrewrittenby.CharltonT.Lewis,Ph.D.and.CharlesShort,LL.D.Oxford.Clarendon
Press(1879).
28
Thetraditionalviewiswidespread;e.g.PaulCooper,Data,informationandknowledge,AnaesthesisaandIntensive
|17
CareMedicine,11(12),505-506(2010).
29
AshleyBraganza,Rethinkingthedata-information-knowledgehierarchy:towardsacasebasedmodel,International
JournalofInformationManagement,24,347-356(2004);IlkkaTuomi,DataismorethanKnowledge:Implicationsof
theReversedKnowledgeHierarchyforKnowledgeManagementandOrganizationalMemory,JournalofManagement
InformationSystems16(3),103-117(1999).
30
Itissometimesclaimedtobeaclearerdistinctionthanitreallyis:SreenivasRanganSukumar,Machinelearningfor
data-drivendiscovery:thoughtsonthepast,presentandfuture,OakRidgeNationalLaboratory,(2014).
31
TonyHey,StewartTansleyandKristinTolle,TheFourthParadigm:Data-intensivescientificdiscovery,Microsoft
Research,(2009).
32
AlonHalevy,PeterNorvigandFernandoPereira,TheUnreasonableEffectivenessofData,IEEEIntelligentSystems
Magazine,8-12(March/April2009).
33
CaroleGobleandDavidDeRoure,Theimpactofworkflowtoolsondata-centricresearch,
http://www.myexperiment.org/files/215/download/workflows-v8-05May2009.pdf(May2009).
34
DavidDeRoureandCaroleGable,AnchorsinShiftingSand:ThePrimacyofMethodintheWebofData,WebScience
Conference,(April2010).
35
This(entirelywrong)phraseisduetoChrisAnderson:“Theendoftheory:thedatadelugemakesthescientific
methodobsolete,”Wired(23June2008).Itdoesnosuchthing!Itsimplyallowsformoresophisticatedmodels.
36
SeanBechhofer,IainBuchan,DavidDeRoure,PaoloMissier,JohnAinsworth,JitenBhagat,PhilipCouchetal.,Why
linkeddataisnotenoughforscientists,FutureGenerationComputerSystems29(2),599-611,(2013).
37
LudwickFleck,GenesisandDevelopmentofaScientificFact,UniversityofChicagoPress(1979);BrunoLatourand
SteveWoolgar,LaboratoryLife:TheConstructionofScientificFacts,SagePublications(1979);KarlPopper,TheLogicof
ScientificDiscovery,Hutchinson,(1959).
38
GeoffreyC.Bowker,MemoryPracticesintheSciences,MITPress,(2005).
39
MarkStalzerandChrisMentzel,Apreliminaryreviewofinfluentialworksindata-drivendiscovery,SpringerPlus
5:1266,(August2016).
40
Thereareotherreasonsthatservetopushforembeddinganalytics,especiallylatencyandbandwidthlimitations.
41
WilliamN.Dunn(ed),TheExperimentingSociety:EssaysinhonourofDonaldT.Campbell,TransactionPublishers,
(1997);DonaldT.Campbell,MethodsfortheExperimentingSociety,AmericanJournalofEvaluation12,223-260,
(1991);DonaldT.Campbell,ReformsasExperiments,AmericanPsychologist,24,409-429,(1969).
42
Asexplainedelsewhereinthisdocument,suchaphrase(“universalcaptafication”)doesnotimplyitisdoneonce,
withoutatheoreticalstance,andthedata“speakforthemselves.”Whatismeanthereissimplythepushtowards
morepervasive(henceapproaching“universal”)translationofthedataintheworldintocaptathatcanbe
manipulated.
43
“Delivered”inthetitleofthisheadlineistherightword–weproposetochangethedeliverymodality,andto
actuallybuildsystemsthatliterallydelivertheresults.
44
Conferstrategy4oftheNationalScienceandTechnologyCouncil,NetworkingandInformationTechnology
ResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategic
Plan,October2016:itarticulatestheneedforexplainableandtransparentsystemsthataretrustedbytheirusers,
performinamannerthatisacceptabletotheusers,andcanbeguaranteedtoactastheuserintended.
nd
45
PatrickMcDanieletal,TowardsaSecureandEfficientSystemforEnd-to-EndProvenance.2 workshoponthe
theoryandpracticeofprovenance(2010).
46
Datatechnologiesaremadeupofhardwareandsoftware,theboundaryofwhichissomewhatblurred.Ourprimary
(butnotexclusive)focushereisonthesoftwarebecauseitiswithregardtothatthatwehaveaglobalcompetitive
advantage.Onecouldusethemoregeneralphrase“systemsyoucantrust”butthatmissesthespecificitythatI
currentlyhave.AndalloftheresearchIamalludingtohereisindeedonsoftware.
47
JasonFurman,IsthisTimeDifferent?TheOpportunitiesandChallengesofArtificialIntelligence,RemarksatAINow:
TheSocialandEconomicImplicationsofArtificialIntelligenceTechnologiesintheNearTerm,NewYorkUniversity,
(July7,2016).
48
NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopment
Subcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016).
49
“Scientific”ismeantinthebroadsensedescribedinendnote4.
|18
50
E.g.NathanRosenbergandL.E.BirdzellJr.,HowtheWestGrewRich:TheEconomicTransformationoftheIndustrial
World,BasicBooks(1986).
51
HuwT.O.Davies,SandraM.NutleyandPeterC.Smith,WhatWorks?Evidence-basedpolicyandpracticeinpublic
services,ThePolicyPress(2000).
52
Pleiotropy(genetically),ornon-injectivityoftheinversemap(mathematically).
53
Genetichetereogeneityornon-injectivityoftheforwardmap.
54
ElizabethEastland,FutureAustralia–MarketVision:Unlockingamoreprosperousandsustainablefutureforall
Australians,Powerpointpresentation(2November2016).
|19
CONTACTUS
t 1300363400
+61395452176
e [email protected]
wwww.data61.csiro.au
ATCSIROWESHAPETHEFUTURE
Wedothisbyusingscienceand
technologytosolverealissues.Our
researchmakesadifferencetoindustry,
peopleandtheplanet.
FORFURTHERINFORMATION
BobWilliamson
ChiefScientist,Data61
t +61262183712
m+61404053877
e [email protected]
wwww.data61.csiro.au
AdrianTurner
CEO,Data61
t +6193724202
m+61475981219
e [email protected]
wwww.data61.csiro.au