Datayoucantrust Technologythatworks foryou DATA61’sFutureScienceVisionv1.4 RobertC.Williamson,2November2016 Preamble ResearchisattheheartofData61.Ourresearchis undertakenwithapurposeinmind–tocreateapositive data-drivenfuture.Thisdocumentoutlinesourvision1 regardingwhatweaimtoachievebyfocusingourresearch onwhattheworldneedsinareaswherewehaveworldleadingcapability. Data61playstwocomplementaryrolesintheAustralian innovationsystem.Weare“L-shaped”(seetheschematicon theright): 1) Weconductmarketdrivenresearch(end-usedriven projects)inarangeofindustrysectors;thesecontribute tothehorizontalpartofData61’smission–solving problemsinotherCSIRObusinessunits(andleveraging theircapabilityandconnections)andthecommunity morebroadly. 2) Wearethehometofundamentalresearchadvancing thescienceandtechnologyofdata(theverticalpartof thepicture). Thesetwopartsmutuallysupporteachother2.Bothare essential.Themarketcomponent,bydefinition,isnotforus toplan,buttoadapttoinanagilemanner.Thescientific, technologicalandengineeringresearchweproposetodois ourstoplanandshape;thatiswhatthisdocumentdoes. 3 Thepurposeofthisdocument istofocusourworkonthe verticalpartoftheL-shapedschematic.Thedocument capturestheboldandambitiousareasofscienceand technologywewishtoadvance4.Itshouldbeseenasaway offocusingwhatwedo,andallowingustosay“yes”or“no” inamoreinformedfashion5.Thegoalisnotsimplyto“put morewoodbehindfewerarrows”butrathertogetmostof thearrowspointinginonedirection,andtodescribethe targettheyareaimingtohit–namelythefourgoalslistedin thecalloutbox.Thiswillhelpshapeourfuturecapability investments. MeasuringtheWorld Improvingthewholelifecycleofdata captureanalysisanduse. DeliveringTrustworthyAnalytics Changingthewayanalyticsisdelivered; guaranteeingtrustintheentireprocess. BuildingSoftwareyoucanTrust Creatingtechnologiesthatallowthe constructionofsoftwarethatcanbetrusted. ShapingSocietalTransformations Developingbetterdatatechnologiesthrough improvedunderstandingoftheirpotential socialimpact. ExplicitlyarticulatingthelargertechnicalchallengesisespeciallyimportantforData61becauseitisoften (mistakenly)believedthatdataandinformationtechnologyresearchmerelysupportsothersciences–a sortofglorifiedIThelpdesk.Infact,thecontraryisarguablythecase,withphysics6,chemistry7,biology8, socialscience9,andeconomics10allhavingthescienceofdataandinformationattheircore,and informationtechnologyprecepts,suchasmodularity,areessentialfortheunderstandingofmanynatural systems11. Ultimately,asrecentlywitnessedbysocialscience,anyfieldimmersedinaproperlyorganisedbathofdata progressivelybecomescomputationallybased,ordevelopsacomputationalsubfield12.Thescienceof informationanddataisarguablythemostfundamentalresearchtopicofthecentury,situatednotonlyat thecentreofmathematicalresearch13,butunderpinningthenatureofrandomnessandcomplexity14,and situatedattheverycoreofallthematuresciences. |2 Context Technologiesfordataaregeneralpurposetechnologies15thatwillhaveatransformativeimpacton Australiansociety,althoughwhatthoseimpactswillbeisneitherpredictablenorpre-determined16.These technologiesareoftendescribedas“artificialintelligence”17andincludemachinelearningandbigdata analytics,automatedreasoning,computervision,naturallanguageunderstanding,androbotics.Data61’s focusisontheadvancementoftechnologiesfordatainamannerthatprovidesnationalbenefit(economic, socialandenvironmental).Thus,adeepunderstandingofthecontextoftechnologyuse,thepotential impactstheycanhave,andshapingwhatthoseimpactsare,isacentralpartofourresearchvision. Data61livesinsideanorganisationdedicatedtothediscoveryofscientificknowledge,knowledge distinguishedbythehighdegreeoftrustonecanplaceinit:trustintheconclusions;trustintheevidence thatisderivedfromdata;and,trustintheprocessestorevisetheknowledgewhenitisfoundtobefalse. Sciencehasalwaysbeendata-drivenandwillremainso.Weproposetoexploitthescientificenterprise withinCSIROasatestbedforideasthatcan,andwill,havemuchbroaderimpact. Generalprinciples Thescientificvisionisinformedbythefollowingfiveprinciples18 • P1.Lead:Striveforagreaterproportionofworldleadingresearch.Weshouldfocusoureffortsonareas whereweare,orrealisticallycouldbe,worldleading. • P2.Multiply:Aimformultiplicative(compositional)effectsratherthanadditive,elsewecannotscale. Thisimpliesclever“platformisation”ofourtechnology. • P3.Unique:Dowhatonlywecando,elseletothersdoit19. • P4.Bold:Aimhigh.Wereallydowanttochangetheworld(throughuse-inspiredfundamentalresearch). • P5.Antidisciplinary20:Datatraversesexistingdisciplineboundaries.Weignoredisciplinaryboundaries andfollowtheproblemswherevertheytakeus. |3 HeadlineVisions Data61’sgoalistocreateourdata-drivenfuture–afuturewheretechnologiesfordatawillplayapositive roleforsocietyatlarge.Newtechnologiesprovokemanyreactions.Fearanduncertaintyiscommon,witha beliefthatthepreciseformsofnewtechnologyareinevitableandnotopentobeingshaped21.Acounterto thisistrust,whichcanbeviewedasbeingatthecoreofallthatwedo.Allofourworkrevolvesaround buildingtrustintechnologiesfordata:inautomation;insecurityandprivacy;thatyoursoftwareonlydoes whatitclaimstodo;thatyourpersonalidentityisnotstolenfromyou;andtrustinallthingsthatmatterto people. Bysaying“datayoucantrust”wedonotmeanthatyoutrustitblindly,andespeciallywedonotmeanthat youtrustitraw–dataneedstobeprocessedandmanipulatedtobeuseful,anditistheprocessesof manipulationthatneedtobetrusted.Thisinvolvesbothdesigningsystemsthatdoindeedfacilitatetrustin data,aswellasbuildingtrustworthytechnologiesfordoingthingswiththedata.Andinallofthis“trust” itselfiscomplex,multidimensional,andisalwaysultimatelygroundedinhumanneedsandsociety22. Weareusingtheapparentlysimplenotionof“trust”metaphorically23.Withoutattemptingtomakea canonicaldefinitionoftrust24,wecansaywehave“trust”astheanchor,orpointofdeparture,formuchof whatweproposetodo,including: • Trustworthysoftware–notsoftwarethatyoutrustabsolutely,butsoftwareinwhichyoucanhave quantifiabledegreesoftrustforsoundreasons • Trustindata–notdatayoutrustwithoutcause,butdatayoucantrustforyourpurposebecauseofthe evidenceprovidedregardingitsmanagement,provenanceandwhatwasdonetoit(analyticsthathas quantifiableeffect) • Trustinsystems–trustthatyouknowtowhatdegreeyoucanrelyondata-centricsystems,including communications,notthatyoutrustitabsolutely • Trustindatatechnologyenabledsocio-technicalsystems–trustthatthesesystemswillbenefityouand thatanyharmsaremanifestandcontrolled. Understandingthecomplexinterfacebetweendata,itsmanagement,manipulationandprocessing,and theimpactsitcanhaveonpeopleiscentraltobuildingtrustarounddataandtechnologiesfordata.Trust indata(anditsassociatedprocesses)canalsounderpintrustininstitutions,interventionsandpolicies. Themeansofmanipulatingandprocessingdataaredatatechnologies.Whenwesay“technologiesthat workforyou”wemeantheydowhattheyaresupposedtodo,theydon’tdoanythingelse,andtheyare usableanduseful(andimplicitlywerecognisetheimportanceofwhothe“you”is–technologiesthathelp onegroupcanharmothers). Whilethesesentimentsmightbetakenforgranted,historyshowstheyareoftenabsent,andimprovingthe degreetowhichthetechnologieswedevelopachievethesegoalshelpstoshapeswhatwedo.Examples are:theconstructionofsoftwarethathasanadequatelyhighguaranteeofsecurelydoingonlywhatitis supposedtodo;or,statisticalmachinelearningmethodsyoutrustbecauseofmathematicaltheoriesthat provideadequateguaranteesregardingtheirbehaviouranduncertainty. Boththeseexamplesillustratethenecessityfordeepscientificandmathematicalknowledgeaswellasa quantitativenotionofperformance.ThisscientificdepthdifferentiateswhatData61doesfrommuchofthe datatechnologyinthewiderworld. Theheadlinevisionsandscientificchallengesserveasarallyingpointfornotonlythescientificresearchwe do,butalsotheshortertermend-usedrivenprojectsdeliveredbyourengineeringteam.Ideallythe majorityofsuchprojects,inadditiontodeliveringoncustomerexpectations,willfurtherthegoalsbelow. |4 H.1MeasuringtheWorld25 Thusisbygeometryemesuredallethingis –WilliamCaxton,MyrrouroftheWorlde(1481) Theworldbecomesbetterunderstood,andthusinterventionsaremoreeffectiveandacceptable, throughthedevelopmentofmethodsfordatacaptureandmodelbuildingthatputtrustatthecenter. Background:Humanstrytoimprovetheworld,butoftenfail.Theirinterventionsdon’twork,orhave unintendedconsequences.Onereasonforthesefailuresispoormodelsoftheworld–itisdifferentfrom whatweexpect.Bymeasuringtheworld(iecapturingdataabouttheworld),onelearnsmoreaboutthe worldandthusinterventionscanbebetterdesigned.Thisisthevisionofempiricalscience.Weproposeto improvehowdataiscapturedandusedtoadvanceourunderstandingoftheworld. Theworldisfullofdata,butonlyasmallfractionisknowntous.Ratherthanbeinggiventous(“data” comesfromtheLatindaremeaning“togive”),itisnecessarytotakethedata–toactivelyselectandgather it,andthen,ofcourse,todosomethingwithit.Itisthususefultodistinguishdatafromcapta26(fromthe Latincaperemeaning“totake,seize,obtain,get,enjoyorreap”27).Thisterminologysignalsthatdata collectionisanactiveprocess,notpassive. Dataistraditionallyseenasthelowestlevelofahierarchythatrunsfromdatatoinformationtoknowledge towisdom28.Implicitinthis,isthatinordertoattainknowledge(orwisdom)oneneedstostartwithdata. Whileclearlytrueatonelevel,thisdoesnotcaptureData61’sperspectivewhichinvertsthehierarchy29, andhasknowledge(orthedecision,actionorinterventionrequiredforaparticularproblem)astheend point,thusfocussingtheneedsofdatacollectionandanalyticsfromthereverseperspective.Databecomes usefulonceitisbothcaptured(capta)andthenmadesenseofthroughmodels.Themodelscanalso provideguidanceregardingdesirablecapta. Modelsandmodellingarecentraltomakinguseofcapta.MuchoftheworkthatData61doesismodelling basedoncapta.Thedistinctionbetweenmodelsanddataorcaptaisblurred30;abstractlyamodelisalways afunctionofthecapta–whetherithasasmallnumberof“parameters”ornotisirrelevant–whatmatters isthestabilityofthemodel(ormoreprecisely,thestabilityandreliabilityoftheconclusionsdrawn,and actionstakenfromthemodel)underdatavariations. Theimportantpointisthatitisthemodelsthatareultimatelymanipulatedandusedforaction.While muchismadeofa“fourthparadigm”31(socalled“data-drivenscience”)and“theunreasonable effectivenessofdata”32,thefactremainsthatalldata-driveninterventionremainsbaseduponmodels; theyarejustmorecomplexthanthemodelsofold. Wethusembracethe“primacyofmethod33”ora“methoddeluge”(withmethodsas“firstclasscitizens”34) overamere“datadeluge”,andcertainlydonotenvisage“makingthescientificmethodobsolete”35.For science,dataalone(howeveritislinkedorpresented)isnotenough36.Neitherdatanorfactsareever entirelyraw–theyareconstructedandtheory-laden37.Itisindeedtruethat“‘Rawdata’isbothan oxymoronandabadidea”38. Someofthegreatestcontributionstotherecentexplosionofinterestindata-driveneverythingcomesfrom newmethods39withrefinednotionsoftrust(betterquantificationoferrors).Theblurredboundary between“data”and“method”driveshowmethods(analysis)arebeingpushedtowardsthedata (embeddedanalytics40),aswellasthepropagationofallaspectsofthedata(suchasitsprovenance) throughtheentiremodellingprocess,inordertobetterinforminterventions. Therealpromiseofadata-drivensocietyisthatitisan“experimentingsociety”41thatallowsdecisions, actionsorinterventionstobecloselytiedtocapta. Wewilldevelopnewmethodsforachievingthisuniversal“captafication”42ofthephysicalworld,the biologicalworldandthesocialworld: • Frommodellingofmaterialsandbiologicalorganismsatthemolecularandmacroleveltothedesignof newmaterialsandfood |5 • Fromsensorsmeasuringanythingthroughtotrusteddatafromthosesensorsandtheassociatedtrusted interventionsandpolicy • Fromallthegeospatialdatainthecountrytotherichsetofservicesthatcanexploitthisinformation • Frompeople’sidentityandreputationtosystemsthatcanguaranteethesecurity,privacyandfairnessof usingthisinformation • Fromthecaptaficationofthelawandpublicpolicytomakethemachineryofgovernmenttransparentto theusertotheverydevelopmentofnewpolicyinatrustworthyevidencedrivenmanner,and • Fromtransforminghowscienceisdone(trackingdataandevidenceandtheanalyticalconclusions drawn)totheempiricisationofbusiness(doingproperexperimentsaidedbytechnologiesfordata). Ourvisionisthatbydevelopingnewandbettermethodswewillbeabletobettermodeltheworld,and thusactbetter.Centraltothisisthenotionoftrust: • Trustinthesourceofthedata(collectedtherightcapta)andthatitwasreliablycaptured,transmitted andnottamperedwith(elseskepticswillchallengetheresult,orworse,wrongactionswillbetaken) • Trustinthemodelsunderpinningthecaptureofthedata(suchmodelsalwaysleavesomethingout– howdoesoneknowiftheomissionsdoharm?) • Trustinthemethodsusedforanalysis(thatitisknownwhatthemethodsactuallydofromauser’s perspectiveandthattheposterioruncertaintyisproperlycalibrated) • Trustinhowthecaptaandconclusionsarepresentedandused(ifoneignoresthishumanelement,then thebestmethodscanstillleadtoterribleoutcomes),and • Trustthatlegalandmoralrightsandnotionsoffairnessarenotinfringed(elsesocietywilldisdainthe powerofdataanalyticsbecauseofconcernsregardingitsabuse). H2.TrustworthyAnalyticsDelivered43 Newmethodsfordataanalyticsthatofferhighdegreesoftrust,andnewmethodsofdeliveringthese trustworthymethodswillincreasetheiruse,reduceeconomicfrictionandspeeduptheprocessfrom inventiontodeployment.Thiswillacceleratescientificdiscovery,businessimprovementandimprove publicpolicyoutcomes. Theimpactofnewtechnologiescomesfromtheiruse.Wewillchangethewayanalyticsisdeliveredto broadenitsuse.Wewillbuildtrustintothecoreofhowwecreateanddeliveranalyticstechnologies:from themathematicalfoundationsoftrustindata-drivenconclusionsandthequantificationofcertainty;to embeddedanalyticsatthesourceofdatacapture;and,towebservicesthatallowtheflexiblecomposition ofanalysismethodsinareproducibleandscalablemanner,andwhichbuildinkeyelementsoftrustfrom theoutset(provenanceandtraceability,managementoflegalandmoralrights,andmanagementand preservationofuncertainty)44. Background:“DataAnalytics”meansthecomputationalprocessingofcaptawiththegoalbeingtoderive insightssuitableforcomprehension,decisionoraction.Itincludesmathematicaloralgorithmicmethodsas wellasvisualisationandpresentationoftheresultsinamannersuitableforhumanconsumption.Analytics isnotonlyusedbya(human)statistician;manysocio-technicalsystemshaveanalyticsembeddedintotheir coreoperation,andallthepointsmadebelowapplytheretoo. Presentlyanalyticsisimplementedprimarilyinamannerthatmakesitscomposition(gluingtogether components)difficult. |6 Thecurrentmodelleadstovariousproblems: • Vendorsoflargesoftwarepackageshaveaninterestinlockingincustomerstotheirplatform(sothereis relativelylittleincentivetoenablecomposabilitywithothersystems) • Manyoftheimplementationspresumethecaptaisallinoneplace(eitherlocalorinacloud).Much captacannotbemoved.Itmightbetoolarge(theanalysishastobeactuallydoneatthesource),orthere isnotthelegalrighttomoveit • Provenance,traceability,legalandmoralrightsanduncertaintyarepoorlymanaged,resultinginoutputs ofanalyticsthatlosesightofthereliabilityandtrustworthinessoftheoriginaldata(andthustheresults arelesstrustworthy) • Itisdifficulttoredoanalyseswhenmistakesarediscovered(aconsequenceofthepointabove).Often notallofthe“stateinformation”isstoredtoenablethere-runningofanalyses • Closedecosystemsmakeithardtoimportnewtechniquesastheyareinvented. Therearepotentialsolutionstoalloftheseproblems,allofwhichweenvisagedeveloping: • Byembeddinganalyticsatthesourceofthedata,theburdenofmovinglargeamountsofdatais removed.Beingabletoreachallthewaybacktotheoriginaldatasource(typicallyembeddedinacyberphysicalsystem)throughcomposabledataingestionschemesallowsbettertrackingofprovenance • SystemsthatdeliveranalyticsasaRESTfulwebservice,thenitbecomesmorereadilycomposable.This canremovethedownside(lock-in)ofproprietarysystems • Bytakingthecomputationtothedata(indatacentersforexample),wecanavoidtheproblemofnot beingabletomovethedata(forreasonsofscaleorjurisdictionalconstraints).Thisnecessitatesadvances innotonlythesecureencapsulationofanalyticscode,butalsonecessitatestrustedmeanstocontrol informationflow(soprivateinformationisnotexfiltratedfromthecaptabases) • Theultimatedeliveryinvolvespresentationtousers.Byimprovingtheuserexperienceofdataanalyticsit willbemorewidelyandreliablyused.Thisrequiresdevelopmentofvisualisationasaservicethat representsuncertaintyandprovenanceasfirstclassobjects • Composableprovenanceofdata(includinglegalrightssuchaslicenses)andanalyticsacrosswalled gardensallowsincreasedtrust,reliabilityandrepeatabilityofanalytics • Systemsthataredesignedtofederatedatafromdifferentsourcescanbypassjurisdictionalandpractical problemsofextractinginsightsfromdistributedcapta • Latebindingschemasorontologiesminimisethedeleteriouseffectsofpastdecisionsregardingdata categorisationandorganisation • Systemsthatcaptureandre-executeentireworkflowstofacilitatelate-binding,rapidprototypingand theautomationoftranslationfromexploratorytoproductionsystems Thecreationoftechnologiesasabovewillnotonlyacceleratetheuseofdataanalyticsforitsownsake,but willplayacentralroleinourvisionforcyber-security–securingdata-drivenbusinessoperationsthrough ensuringtrustworthinessinthedata.Thisisespeciallyimportantforcriticalinfrastructureprotection45. H3.BuildingSoftware46youcanTrust Wewilldevelopnewwaysofcreatingsoftwarethatwillbetheglobalbenchmarkintermsofquality, securityandtrust.Widespreadadoptionwillmakesoftwarecompaniesmoreproductive,improve cybersecurity(byaddressingtherootcauseofoneofthemainproblems)andenablehigherdegreesof trustindata-centricsystems. |7 Technologiesfordataareunderpinnedbysoftware,whichisthemeansbywhichdataisprocessedand transformed.Buildingbettertechnologiesrequiresbuildingbettersoftware.Wewilldevelopthescience andtechnologystackstobuildsoftwarethatprovablydoeswhatitissupposedtodoandnothingelse–we willbeabletosaypreciselyandwithstrongevidencewhensoftwarewillbebug-free,provablysecure,and willdeliverguaranteedresults.Thiswilladdressoneofthemajorcausesofproblemsincyber-security (vulnerabilitiesthatareintroducedwhensoftwaredoesmorethan,orotherthanwhatitissupposedto do).Wewillalsodevelopbettermethodstoquantifyrisksassociatedwithsoftwareandunderstandthe humanfactorsthatcontributetotrustworthysoftware. Inadditiontoincreasingthereliabilityofsoftwareagainstattacksthatcauseittodothingsotherthan whichitshould,thesametechnologiescanbeusedtoprovideimprovedguaranteesforthe trustworthinessofdata,whetheritisthatthedatahasnotbeenmanipulated,orthatsensitiveinformation hasnotbeenexfiltrated.Thusimprovingthetrustworthinessofsoftwareisnotonlyessentialformaking technologiesthatworkforyou,butalsoforensuringthatyoucantrustdataandentrustyourdatatosuch technologicalsystems. H4.ShapingSocietalTransformations Technology…isnotdestiny47 –JasonFurman-July2016 Technologiesshapesociety,andtechnologiesfordatawillshapethefutureofAustraliansociety,but thereistheopportunitytochoosewhattheseeffectsare.Bydevelopingbetterunderstandingsofthe complexrelationshipsbetweendatatechnologyandpeople,wewillbeabletoinfluencethe developmentanduseoftechnologiesfordatatoleadtobettersocietaloutcomes.Theresearch necessarytoattainthisunderstandingcan(andneedsto)bedoneinconcertwiththemorenarrowly technicalaspectsofourwork. Newtechnologiesfordatawilltransformsociety,butthereismuchfreedomregardinghow.Ourinterestin technologydoesnotstopwiththetechnologyitself,butextendstoitsuse.TechnologiessuchasUAVsand autonomousvehicleswillobviouslyshapesociety,andtheirusewillbeshapedbywhatsocietyfinds acceptable.Collectively,astechnologistsandscientists,wecannotignorethesocietalimplicationsofour work.Thesamebasictechnologicalprinciplescanbeusedinmanydifferentways;someofwhicharemore usable,helpfulandbeneficialtopeoplethanothers.Wewilldevelopnewwaysofenvisagingand influencingthesesocietaltransformations. Thiswillinvolvenewapproachestotheethnographyoftechnology(betterunderstandingpeople’s relationshipwithdata-driventechnology,especiallyintermsoftrust)andderivingtechnologicalforesights. Thisgoalalignswithstrategy2oftherecentlyreleasedUSNationalArtificialIntelligenceResearchand DevelopmentStrategicPlan48:“Developeffectivemethodsforhuman-AIcollaboration.Ratherthanreplace humans,mostAIsystemswillcollaboratewithhumanstoachieveoptimalperformance.Researchis neededtocreateeffectiveinteractionsbetweenhumansandAIsystems.” Wewillreimaginewhatitmeanstobehumaninadata-drivenworld.Wewilldevelopnewtechnologiesfor ensuringrichnotionsofprivacyandtransparencyinadata-drivenandalgorithmicworld.Wewilldevelop newunderstandingsofthecomplextechnicaltradeoffsbetweenusability,security,privacy,efficiencyand fairness.Wewillstudyhowtobuilddata-drivensocietalinstitutionsthatcitizenscantrust.Wewilldesign newcomputationalmechanismstoenhancesocialwelfare,enabledbypervasivetechnologiesfordata. Wewilldevelopnewmethodologiesthatexploitdata-technologiestobetterunderstandhowdatatechnologiesthemselvesendupbeingused(includingthederivationofqualitativeinsightsfrom quantitativedata).Thiswillextendthereachofuser-experiencedesigntonewareas,andadvanceitsstate oftheart.Andwewilldevelopneweconomicandbusinessmodelsenabledbydata-technologiesina mannerthatseekstomaximisebenefitforAustraliaasawhole. |8 ScientificChallengesandFoci Theoriesarenets:onlyhewhocastswillcatch. –Novalis Inthissectionarelistedsomescientific49challengesarising fromtheabovevisions.Thesearenotallthescientific challengeswewilltrytosolve,buttheycapturemuchofwhat weaimtodo.Inallcasesthetimelineisroughly5-10years. Whileeachofthesechallengesismotivatedandinspiredby broadersocietalchallenges,theparticularimpactsonecan expectofscientificadvancesarenotoriouslydifficultto predictonsuchatimescale(impactcanbepredictedmore reliablyforshortertermprojects).Thus,apartfromsome rathergeneralstatements,thereisnospecificpredictionof impactarisingfromthescientificchallenges. Ihavetriedtostateahighlevelchallenge(inred)followedby someexplication.Itwouldbeimpossibletooutlineallthe possibilities,andthoselistedarenotmeanttobetoo prescriptive. Inallcasestheyarestatedas“Howto…”.Thisisbotha scientificchallenge(developmentofnewknowledgeand understanding)aswellasatechnologicalone(development oftechniquesandmethodsandsystemsthatachievethe goal). AreasofScientificChallenge • MaterialsandData • Physical/BiologicalSystemsand Data • InstitutionsandData • TrustworthySoftwareConstruction • • Architectureforcomposability, compartmentalisationand resilience DistributedTrustMechanisms • Analysing,Representingand ModellingData • Quantificationofandreasoning withriskanduncertainty S1.MaterialsandData Howtoturnmaterialsintodatasotheycanbemanipulatedanddesigned? Tounderstandmaterials(sotheycanbesynthesised,manipulatedandchanged)oneneedstounderstand themandtrustthatunderstanding(modellingandsynthesis).Materialsarenotsystems(forthepurposeof thisdocument).Thequestionappliestobothnon-organicandorganicmaterials(includingforexample food). Howtodesignmaterialsinadata-drivenmanner–fromquantummonte-carlo(forengineeringmaterials) throughtofooddesignedinresponsetogeneticinformation? S2.Physical/biologicalsystemsanddata Howtoembeddataintophysicalsystems;understandphysicalsystemsthroughdata-drivenmodels;and design,buildandcontrolphysicalsystemsbyusingdata? Thisincludeschallengesinroboticsandsensornetworksandintheprocessingofvisualdata–howto embedtrustedanalyticsintophysical,biologicalandenvironmentalsystems.Howtousedatatoincrease trustindata-centricsystems(suchastheinternetofthings),forexamplebybettermanagementofprivacy. Howtobettermodelphysicalsystemsusingdata(ormoreprecisely,howtoimprovethatmodelling,which isthecorebusinessofallscientists,usingmoderntechnologiesfordata). |9 Howtocontrolphysicalsystemswithdatainamannerthatyoucantrust?Howtoturnphysicalor biologicalobjects(egscientificspecimens,oraspectsoflivingsystems)intodatacheaplyandatscaleina mannerthatcanbetrusted?Howtomaptheworldmorereliably(usingspatialdataasatestbedfor analyticspipelines)?Howtobuildautonomoussystemsfordatagatheringinthefield.Howtomanagethe ingestionofsemi-structuredsensordata?Howtomanagetheprovenanceofdatagatheredintheworld? S3.InstitutionsandData Howtorepresent,augment,understand,manageandcontrolinstitutionsbetterusingdata? Iuse“institutions”intheeconomist’ssense50whichincludesgovernment,thelegalsystem(statutelaw, regulation),businessprocesses,andcontracts,etc.Thechallengeistorepresentthesesocietalsystems usingdatathatcanbeprocessedandreasonedwithbyamachine.Solvingthisinvolvesadvancingthestate oftheartofnaturallanguageprocessing(eg,targetedatspecialisedusesofEnglish,asinstatutelawand contracts)andthedevelopmentoftoolsthatallowthecraftingoflegalinstrumentsinamannersimilartoa modernprogrammingdevelopmentenvironmentthatwillguaranteepropertiessuchasconsistency,but willalsoemithumanreadableversionsoftheinstruments. Anotherchallengeishowtousetechnologiesfordatatoimproveinstitutions,forexamplebydata-driven experimentationforpolicydevelopment51.Partofthesolutionislikelytobeaidingthechangeofroleof governmentfromownerofassets,ordelivererofretailservicestowholesalerandarchitectofmodular systems. S4.Trustworthysoftwareconstruction Howtoconstructsoftwarethatdoeswhatitissupposedtoandnothingelse? Howtomaketechnologiesthatconstructssoftwarethatguaranteesitscorrectness,invulnerabilityand otherproperties(egrealtimeguarantees).Onecanasksimilarquestionsregardinginteractionand communicationprotocols.Particularchallengesinclude:mixed-criticality,real-time,multicore,sidechannels;informationflow;concurrentsystemsverification;protocolverification(asameanstodealwith compositionandbreakthebackofconcurrency);automationofproofeffort.Howtospecifyandquantify dimensionsofsecurity(turningitfromabinarypropertytoareal-valuedpropertyyoucanreasonabout fromarisksensitiveperspective)?Howtoensuretrustworthinessofmobilecode(especiallyforanalytics)? S5.Architectureforcomposability,compartmentalisationandresilience Howtobuilddata-centricsystemsthatcanbereliablycomposedandcompartmentalisedandwhichare resilient,robustandtrustworthy? Data-centricsystemsarethemostcomplexartefactsdesignedbyman.Thechallengeistodesignthem (includingcyber-physicalandcyber-societal)inamannerthatfacilitatescomposition, compartmentalisationandresilience.Thisisnecessaryinordertoimprovethereliabilityand trustworthinessofsuchsystems. Thischallengeisarchitectural(includingquestionssuchashowtocomposetrust–justbecauseyouhave trustedcomponentsdoesnotguaranteetheircompositioncanbe)butincludesquestionssuchashowto monitorandmanagesuchlargesystems(supervisorycontrolanddiagnostics).Examplesthatareworthyof attackincludehowtoarchitectlargedistributeddataanalyticssystems.Howcantrustinsuchsystemsbe quantified,measuredandmanaged? |10 S6.Distributedtrustmechanisms Howtomanagetrustindistributeddata-centricsystems? Trustunderpinshumaninteraction,andthusdata-technologiesthatmediatesuchinteractionsmust managetrust.Thechallengesincludehowtoensuretrustworthyprovenanceofdataandoperationson data(provenanceisakindofdualtosecurity:provenancetellsyoureliablywherethedatacamefromand whodidwhattoit;datasecurityreliablyensureswherethedatacangoandwhocandowhatwithit).Thus wewillstudybothprovenanceandsecuritytogether.Thisneedstobedoneinarisksensitivemanner(see S8).Howtobuildricher,betterandmoreapplicabledistributedledgersandalliedtechnologies?Howto understandandquantifytheirsecurityandreliability?Howtobuildsocialchoicemechanismsthatcanbe trusted?Howtobuildthecommunicationstechnologythatunderpinsdistributedtrust? S7.Analysing,RepresentingandModellingdata Howtoderiveinsightfromdatathatcaninformaction? Howdoyoumakesenseofdata?Howtomakesenseofallthemethodsthatdoso?Howtobuildmodels thatareusableandre-usable.Howtoexploitcomplex,structureddatawithallofthemessoftheworldin theway?Howtomodelcomplexphenomena(ecologies,language,societies)usingdata?Howtomake suchmodelstrustedandreliableandcomposable?Howtobestcommunicatesuchmodelstopeoplefor action?Howtoactanddecideuponmodelsofdata?Howtomanipulatedatarepresentationsofthe world?Toolsformanagingmultiplerepresentationsofdataandmanipulatingthem(music,law,biology). Howtoexploitcomputationalandalgorithmadvancestobuildbettertechnologiesfordataanalysis? Thisallneedstobedoneinthecontextofthestructureofdata;dataisnotmerelyastringofbits.Manyof thetypesofdatathatwillhavethelargestimpactsarehighlystructured(naturallanguages,video,social networks,etc).Advancingthestatedgoalwithrespecttothesedatatypesrequiresdeepscienceand technologystacks(thatcanbeusedacrossdiverseapplicationdomains). S8.Quantificationofandreasoningwithriskanduncertainty Howtoquantitativelyrepresenttherichsourcesofriskanduncertaintyrepresentedbydata,andhowto reliablyreasonwiththis? Whilstdatacansometimesreduceuncertainty,itdoesnotremoveit;decisionsstillneedtobemadeinthe faceofuncertainty.Furthermore,theincreasingcomplexityofdata-drivensystemsmeansthatthe managementofpartialinformation,uncertaintyandambiguityisessential.Howcanthisbedoneinarisksensitivemanner?Howcanallaspectsofdatatechnologybemaderesilienttouncertainty?Howcan differentnotionsofuncertaintybecombined(relativetotheinferenceofdecisiontaskathand),andhow canitbereasonedwithinaneffectivemanner?Howcanuncertaintyandriskbeeffectivelycommunicated andvisualised?Howcanlegalrights,securityandprivacybemaderisksensitive? S9.Fundamentallimitsofdata Howtodeterminethelimitsofwhatcanbedonewithtechnologiesfordata? Alltechnologiesfordatahavelimits.Howcanthesebedeterminedandcatalogued?Andhowcanwe approachtheselimits?Withoutknowingwhatthefundamentallimitsareitisnotpossibletoknowwhena technologymaybreakdownandwheretoputefforttopreventthisfromhappening. |11 Thischallengecutsacrosseverythingwedo,isafundamentaldifferentiator,andprovidescredibilityforour statusaspartofascientificresearchorganisation.Italsosetsatargetforother,less“fundamental”,work bysettingagoldstandardtoapproach. Challengesincludewhatispossiblewithdataanalytics,optimisation,distributedtrustmechanisms,and indeedalldatatechnologiesweexamine.Challengesincludecharacterisingthedifficultyoflearningfrom data,inferringcausality,dealingwithnoise,protectingprivacy,transmittingandsharingdata,andsolving computationalproblems. Therearelimitsintermsofdata,knowledge,computation,energy,timeandspace.Aswellaslimitsto technicalcomponents,therearealsolimits(whichneedtobedetermined)tocompositesystems(suchas trust,stability,andabilitytocontrol).Therearealsolimitstosocio-technicalsystemsbuiltwithdata technologies(forexamplecomputationalsocialchoice,limitsto“fairness”andothersyntheticproperties) andlimitsarisingfromhumanabilitiesorinabilities. S10.Shapingdata-drivensociety Howtounderstandwhatitmeanstobehumaninadata-drivenworld? Whatdoesitmeantobehumaninadata-drivenworld?Howcanourhumanitybeenhancedbydatadriventechnologies;howcanwepreventharm?Howcanwebuilddata-technologiesthataremeaningful andvaluabletosocietyatlarge?Howcanweencourageandassistcommunitiesintheiradoptionof technologiesfordatatoimprovetheirlives? Solvingthischallengewillrequirethedevelopmentofnewethnographicmethodsfordata-centric technologies.Itwillalsorequireongoingresearchonhowpeopleinteractwithdata-technologiesfromthe perspectiveofdecisiontheory(socialchoice,boundedrationality,etc.). Suchnewmethodswillenabletheattackingofchallengessuchashowtodesigndata-technologiesthat betterprotectusability,privacy,securityandconfidentiality.Itcouldalsoprovidescientificunderpinnings forthepracticeofUXdesign. |12 Impacts Data61’sL-shapedmodel(seepage1)meansthatourimpacts aretheproductofourscientificcapabilitieswithmarketforces andopportunities.Theseimpactsaremanagedthroughour businessdevelopmentandproductmanagementprocesses.A givenscientificcapabilitycandeliverimpactinmanyend-use problems52;agivenmarketneedcanbesatisfiedbymany differentscientificcapabilities53–seetheschematictotheright. Scientific Capabilities MarketDriven Projects Thesciencedrivenchallengesareourviewofwheretechnology needstomove.Theend-useprojectswedowilllargelybedriven bythemarket’sviewofthis.Itwillbeprimarilythroughthese projectsthatthesciencewillhaveitslargerimpact.Thisimpact canbecategorizedinmanyoverlappingways.Threearegiven below: Generalcategories: • • • • • ImprovementintheefficiencyofAustralianbusinesses ImprovementintheefficiencyofAustraliangovernments Improvedreliability,safetyandsecurityofdata-technologies Generationofnewindustries,especiallyplatformcentricones Improvementinthespeedandeffectivenessofscientificdiscovery. Data61marketfocuscategories(inpartnershipwithotherBUswhere possible): • • • • • • • • • SafetyandSecurity Health&Communities FutureCities IoT/IndustrialInternet Agri-business SpatialIntelligence Data-drivenGovernment EnterpriseServices+Fintech Defence WholeofCSIROcategories54 • • • • • • Foodsecurityandquality Cleanenergyandresources Healthandwellbeing Conservationanduseofournaturalenvironment Innovativeindustries AsaferAustralia Data61’sresearchinsupportofthescientificvisionofthepresentdocumentwillsupportprojectsinthese impactareas,andwillthusfindpathwaystoimpactthroughthem.Individualprojectsareresponsiblefor analysing,shapingandarticulatingwhatthosepathwaysandimpactswillbe.Thisneedstobedoneinan agilemanner,adaptingtoopportunities,butbuildinguponourfocusedscientificcapability. |13 Endnotes 1 Itisdeliberatelycalleda“vision”,andnot(metaphorically)a“roadmap”–aroadmapisatwo-dimensionalgraphical representationofsomethingthatalreadyexists(roads),andisrarelysomethinginspiringandexciting;atbesta “science/technologyroadmap”itisavisualdepictionoftheexpectedtemporalevolutionofatechnologicalproduct family(RonaldN.KostoffandRobertR.Schaller,ScienceandTechnologyRoadmaps,IEEETransactionsonEngineering Management,48(2),132-143(2001);LianneSimonse,JanBuijs&ErikJanHultink,Roadmapgroundedas‘visual portray’:Reflectingonanartifactandmetaphor,HelsinkiEGOS2012Sub-theme09:(SWG)ArtifactsinArt,Design,and Organization(2012))whichsuffersbybeingcontrainedtoatwodimensionalvisualform.Conversely,a“vision”canbe ofsomethingthatdoesnotexist,andcaninspireandexciteandisnotcontrainedtofitanyparticularformat.Ittells wherewewanttogo,andoutlinesinbroadstrokeshowwemightgetthere,withoutactuallypinningtheexactpath down.Itisasciencevisioninthegeneralsenseoftheword“science”–systematisedknowledge;seeendnote4.We expecttodevelopmoretraditionaltechnologyroadmaps(i.e.temporallylinearexpectationsandplans)forparticular productandserviceofferingswhichwedevelop. 2 Atdifferenttimesincomputing’sevolution,eitherthedemand(market)orthetechnologypushsidehavebeen dominant;butitisneverjustoneortheother;seeJanvandemEndeandWilfredDolfsma,Technologypush,demand pullandtheshapingoftechnologicalparadigms–Patternsinthedevelopmentofcomputingtechnology,Journalof EvolutionaryEconomics15,83-99(2005).Therealityis,ofcourse,complex,andrecombination(themixingupof differentideas)playsanessentialpart(CristianoAntonelli,JackieKrafft,FrancescoQuatraro.RecombinantKnowledge andGrowth:TheCaseofICTs,StructuralChangeandEconomicDynamics,Elsevier,21(1),50-69(2010))andthe “demand-pull”modelseemstobelosingfavorasasatisfactoryexplanation(BenoitGodinandJosephP.Lane,“Pushes andPulls”:TheHi(story)oftheDemandPullModelofInnovation,ProjectontheIntellectualHistoryofInnovation, workingpaperNo13(2013);BenoitGodin,InnovationContested:TheIdeaofInnovationovertheCenturies,Routledge (2015)). 3 Thedocumenthasmultipleintendedaudiences: • DATA61talent(existingandpotentialfuture)–toalignwhatwedo,tohelpussay“no”toopportunitiesthat donotalign,andtoachievelargeimpactmultiplicatively. • RestofCSIROandexternalpartners–toarticulateourownlongertermresearchgoalstoserveasoneofthe filterswewillapplyinconsideringengaginginjointprojects. • Widerpublic–toexplainwhatwedo. 4 Itwouldbeunfortunate,andunhelpful,togethunguponthedistinctionbetweenscience,engineeringand technology.Thisdocumentpresentsanaspirationforthenewknowledgewewillcreate–novumscientia.While engineeringknowledgeisdifferentfromscientificknowledge(WalterG.Vincenti,WhatEngineersKnowandHowThey KnowI:AnalyticalStudiesfromAeronauticalHistory,TheJohnsHopkinsUniversitypress(1990))andtechnologyis morethanmerescientificknowledge(W.BrianArthur,TheNatureofTechnology:WhatitisandHowitEvolves,Simon andSchuster(2009)),theessenceofengineeringresearch(theimprovementoftechnology)remainstheproductionof newknowledge(EdwinT.LaytonJr,TechnologyasKnowledge,TechnologyandCulture15(1),31-41(January1974)). TheresearchData61doesspansalloftheseheadings,andmore,suchas“design-driveninnovation”–thephraseis fromRobertoVerganti’sbookDesign-DrivenInnovation:ChangingtheRulesofCompetitionbyRadicallyInnovating WhatThingsMean,HarvardBusinessPress(2009)–newbusinessmodels,andethnographicapproachestodata technologies. Weshouldaspiretoseeknewknowledge(motivatedbyrealproblemsandthedesiretoimproveourcurrent technologies)whereverittakesus,inthespiritofthegreatresearchersofthepast(LisaJardine,IngeniousPursuits: BuildingtheScientificRevolution,LittleBrown,London,1999;JennyUglow,TheLunarmen:TheFriendsWhoMadethe Future,FaberandFaber2002).OurinspirationsandrolemodelsshouldbepolymathssuchasRobertHooke(Lisa Jardine,TheCuriousLifeofRobertHooke:TheManwhoMeasuredLondon,HarperCollins(2003);StephenInwood, TheManWhoKnewTooMuch:TheStrangeandInventiveLifeofRobertHooke1635-1703,MacMillan(2002);Robert D.Purrington,TheFirstProfessionalScientist:RobertHookeandtheRoyalSocietyofLondon,Birkhauser(2009);Jim Bennet,MichaelCooper,MichaelHunterandLisaJardine,London’sLeonardo–TheLifeandWorkofRobertHooke, OxfordUniversitypress(2003))orCharlesBabbage(LauraJ.Snyder,ThePhilosophicalBreakfastClub:Four RemarkableFriendswhoTransformedScienceandChangedtheWorld,BroadwayBooks(2011))bothofwhomfreely movedbetweenscienceandtechnology. |14 Asnotedlongago(RobertP.Multhauf,TheScientistandthe“Improver”ofTechnology,TechnologyandCulture1(1), 38-47(1959)),thereisnoperfectwordfortheimproveroftechnology:“engineer”iswidelyused,butitstillprimarily referstotheexpertpractionerandnotnecessarilytheimprover.Perhapswe,asimproversoftechnologiesfordata, shouldnotworrywhetherwhatwedoisadequatelydescribedas“science”,“engineering”oranythingelse,andjust refertoourselvesbyHilaryCinis’elegantneologism:“datanauts”. 5 Itiscommonthatvisionstatementsbecomeall-encompassing,excludingnothing.Thatthepresentvisiondoesnot aimtocovereverythingcanbetestedbycomparingittothesubstantiallybroadersetofgoalsinFutureScience– ComputerScience:MeetingtheScaleChallenge,AustralianAcademyofScience(2013),orPresident’sCouncilof AdvisorsonScienceandTechnology,ReporttothePresidentandCongress.DesigningaDigitalFuture:Federally FundedResearchandDevelopmentinNetworkingandInformationTechnology,ExecutiveOfficeofthePresident (December2010). 6 rd SeeJohnArchibaldWheeler,Information,Physics,Quantum:TheSearchforLinks,inProceedingsofthe3 InternationalSymposiumontheFoundationsofQuantumMechanics,Tokyo,(1989);HectorZenil(Ed.),Acomputable universe:understandingandexploringnatureascomputation,WorldScientific(2013);RolfLandauer,Uncertainty principleandminimalenergydissipationinthecomputer,InternationalJournalofTheoreticalPhysics21(3/4),283297,(1982);RolfLandauer,Thephysicalnatureofinformation,PhysicsLettersA,217,188-193(1996);AntonieBerut etal.,ExperimentalverificationofLandauer’sprinciplelinkinginformationandthermodynamics,Nature483,187-190, (8March2012);JuanM.R.Parrondo,JordanM.HorowitzandTakahiroSagawa,ThermodynamicsofInformation, NaturePhysics,11,131-139,(February2015);GillesBrassard,Isinformationthekey?NaturePhysics1,2-4,(October 2005). 7 Jean-MarieLehn,PerspectivesinSupramolecularChemistry—FromMolecularRecognitiontowardsMolecular InformationProcessingandSelf-Organization,AngewandteChemieInternationalEditioninEnglish,29(11),1304– 1319,(November1990);Jean-MarieLehn,Supramolecularchemistry–scopeandperspectives–molecules– supermolecules–moleculardevices,NobelPrizeLecture,(8December1987). 8 JohnMaynardSmith,Theconceptofinformationinbiology,PhilosophyofScience67(2),177-194(2000);confer LadislavKovac,Informationandknowledgeinbiology:timeforreappraisal,PlantSignallingandbehaviour2(2),65-73 (2007). 9 DavidEasleyandJonKleinberg,Networks,crowdsandmarkets:reasoningaboutahighlyconnectedworld, CambridgeUniversityPress(2010). 10 FriedrichA.Hayek,Theuseofknowledgeinsociety,TheAmericanEconomicReview,35(4),519-530(1945);George J.Stigler,TheEconomicsofInformation,TheJournalofPoliticalEconomy69(3),213-225(1961);JosephE.Stiglitz, Informationandthechangeintheparadigmineconomics,NobelPrizeLecture8(December2001). 11 WernerCallebautandDiegoRaskim-Gutman,Modularity:Understandingthedevelopmentandevolutionofnatural complexsystems,MITPress,(2005);JeffClune,Jean-BaptisteMouretandHodLipson,Theevolutionaryoriginsof modularity,ProceedingsoftheRoyalSociety(seriesB),280,20122863(2013) 12 DavidLazer,AlexPentland,LadaAdamic,SinanAral,Albert-LazloBarabasi,DevonBrewer,NicholasChristakis, NoshirContractor,JamesFowler,MyronGutmann,TonyJebara,GaryKing,MichaelMacy,DebRoyandMarshallVan Alstynr,ComputationalSocialScience,Science323,721-723(2009). 13 CommitteeontheMathematicalSciencesin2025,BoardonMathematicalSciencesandTheirApplications,Division onEngineeringandPhysicalSciences,NationalResearchCounciloftheNationalAcademies,TheMathematical Sciencesin2025,TheNationalAcademiesPress,(2013). 14 CristianS.Calude(Ed),RandomnessandComplexity:FromLeibniztoChaitin,WorldScientific,(2007). 15 RichardG.Lipsey,KennethI.CarlawandCliffordT.Bekar,EconomicTransformationsGeneralPurposeTechnologies andLong-TermEconomicGrowth,OxfordUniversityPress(2005). 16 RobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,TechnologyandAustralia’sfuture: NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomicsystems,Australian CouncilofLearnedAcademies,September2015. 17 NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopment Subcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016). 18 Thesecomplementotherbroaderprinciplesunderpinningeverythingwedo,suchasnationalbenefit;seethe Data61operatingmodeldocument. 19 |15 “We”herereferstothebroaderData61+network.ThisprincipleimpliesavoidingNIH(NotInventedHere) syndrome;wedonotneedtoinventeverythingourselves.Weshouldfocusonthethingsthatwe,andwealone,can do;andthennetworkwithothersinarichandcomplexmanner.Itwouldbesupremelyironicifourorganisationthat underpinstheinformationsocietydoesnotembraceallofitsimplications(ManuelCastells,TheRiseofNetwork nd Society(2 Edition),Wiley-Blackwell(2010)). 20 Thewordispinchedfromasuitablyinspiringinstitution:TheMITmedialab,whichsodescribesitself https://www.media.mit.edu/about.Theprinciple,ofcourse,impliesmuchcollaborationwithotherdisciplines,but goesbeyondthetraditional“multi-disciplinary”toastrongerproblem-orientedperspective–“Therearenosubject matters;nobranchesoflearning–or,rather,ofinquiry:onlyproblemsandtheurgetosolvethem.Asciencesuchas botanyorchemistry…is,Icontend,merelyanadministrativeunit”(KarlPopper,RealismandtheAimofScience, RowmanandLittlefield(1983)).Suchastanceimplieswidespreadcollaborationwithoutfearofcrossingboundaries.It doesnotimplyalackof“canon”orcore;ourcanonisprimarilythatofcyberneticsbroadlyconstrued. 21 Thisviewpointisgiventhefancynameof“technologicaldeterminism”withtheconcomitantfearof“autonomous technology”(LangdonWinnerAutonomoustechnology:Technics-out-of-controlasathemeinpoliticalthought.MIT Press,1978).Thecounteristhattechnologiescanbe,andare,shapedbysociety.Therealityisthatwhiletechnologies doindeedhave“momentum”(ThomasP.Hughes"Theevolutionoflargetechnologicalsystems."Pages51-82in WiebeE.Bijkeretal.(eds),Thesocialconstructionoftechnologicalsystems:Newdirectionsinthesociologyand historyoftechnology(1987))and“drivehistory”(MerrittRoeSmithandLeoMarx.Doestechnologydrivehistory?The dilemmaoftechnologicaldeterminism.MITPress(1994))thereremainsahugefreedomofchoiceintermsofhow theyareusedandtheirpreciseform.Likealltechnologiesofthepast,technologiesfordatacanalsobeshapedfor socialandnationalbenefit. 22 RussellHardin,TrustandTrustworthiness,RussellSageFoundation,NewYork,(2002);FrancesFukuyama,Trust:The SocialVirtuesandtheCreationofProsperity,SimonandSchuster(1995);EricM.Uslaner,TheMoralFoundationsof Trust,CambridgeUniversityPress(2002).Anexcellentshortsummaryofthesocialsideoftrustischapter21ofJon Elster,ExplainingSocialBehaviour:MoreNutsandBoltsfortheSocialSciences,CambridgeUniversityPress(2007). People’strustintechnologyisacomplexmatter(KarenClarke,GillianHardstone,MarkRouncefieldandIan Sommerville,TrustinTechnology:ASocio-TechnicalPerspective,Springer(2006);MeinolfDierkesandClaudiavon Grote(eds),BetweenUnderstandingandTrust:ThePublic,ScienceandTechnology,Routledge(2000));andtrustin technologicalexperts(asopposedtothetechnologyitself)issurprisinglyweaklycorrelatedwithperceptionsofrisk (LennartSjoberg,LimitsofKnowledgeandtheLimitedImportanceofTrust,RiskAnalysis21(1),189-198(2001)). 23 InthesenseofGeorgeLakhoffandMarkJohnson,MetaphorsweLiveBy,TheUniversityofChicagoPress(1980)– notasamererhetoricalflourish,butasanessentialwayinwhichtomakesenseofwhatwedo. 24 Trustisaverycomplexnotion,andmeansdifferentthingstodifferentpeople:(D.HarrisonMcKnightandNormanL. Chervany,TheMeaningsofTrust,UniversityofMinnesota,(1996);DonnaM.Romano,TheNatureofTrust: ConceptualandOperationalClarification,PhDthesis,LouisianaStateUniversity(2003)). Thecomplexityisillustratedfollows: Trusthasnotonlybeendescribedasan“elusive”concept,butthestateoftrustdefinitionshasbeencalleda“conceptual confusion”,a“confusingpotpourri”,andevena“conceptualmorass”.Forexample,trusthasbeendefinedasbotha nounandaverb,asbothapersonalitytraitandabelief,andasbothasocialstructureandabehavioralintention.Some researchers,silentlyaffirmingthedifficultyofdefiningtrust,havedeclinedtodefinetrust,relyingonthereadertoascribe meaningtotheterm.(D.HarrisonMcKnightandNormanL.Chervany,TrustandDistrustDefinitions:OneBiteataTime, inR.Falcone,M.Singh,andY.-H.Tan(Eds.):TrustinCyber-societies,LNAI2246,pp.27–54,Springer-Verlag(2001)). Perhaps,like“culture”(conferKroeber’s164definitionsofculture:AlfredL.KroeberandClydeKluckhorn,Culture:A criticalreviewofconceptsanddefinitions,PeabodyMuseumofAmericanArcheologyandAnthropology,(1952)or “technology”(conferRobertC.Williamson,MichelleNicRaghnaill,KirstyDouglasandDanaSanchez,Technologyand Australia’sfuture:NewtechnologiesandtheirroleinAustralia’ssecurity,cultural,democratic,socialandeconomic systems,AustralianCouncilofLearnedAcademies,(September2015)),itmakeslittlesensetoattempttodefinetrust, butratherweshouldfocusuponthetechnologicalandscientificproblemswewanttosolve(asdoneinthemaintext). Thenotionoftrustasaconceptincomputinghashadattemptstoformaliseitforsometime,startingatleast20years ago(StephenPaulMarsh,FormalisingTrustasaComputationalConcept,PhDthesis,UniversityofStirling,(1994)), withconferencesonthetopicstartingoverdecadeago(SokratisKatsikas,JavierLopezandGuntherPernul(eds), TrustandPrivacyinDigitalBusiness:FirstInternationalConfernce,Trustbus2004,Springer(2005);ThorstenHolzand th SotirisIoannidis,TrustandTrustworthyComputing:7 InternationalConferenceTRUST2014,Springer(2014)). Onereasonforthecomplexityisbecauseofthemanythreatstotrust(inthesamewaytherearemanythreatsto security,whichneedtobeexplicitlydeclaredormodelled:AdamShostack,ThreatModelling:DesigningforSecurity, Wiley(2014)).Butprimarilythecomplexitycomessimplyfromthediverseelementstotrustindata-centricsystems |16 including,butnotlimitedto: • • • • • • • • • Trustinthereliabilityofsoftware(neverabsolute:seeDonaldMacKenzie,MechanizingProof:Computing, RiskandTrust,MITPress(2001);JuanC.BicarreguiandBrianM.Matthews,ProofandRefutationinFormal rd SoftwareDevelopment,3 IrishWorkshoponFormalMethods(1999)); Trustinsecurity(e.g.JeffreyJ.P.Tsai,PhilipS.You(eds),MachineLearninginCyberTrust:Security,Privacy, andReliability,Springer(2009)); Trustindatamanagement(MilanPetkovicandWillenJonker(eds),Security,Privacy,andTrustinModern DataManagement,Springer(2007)); Trustinthecredibilityofinformation,suchaswhichscientificresultsonecanrelyupon:(ChristineL. Borgman,ScholarshipintheDigitalAge:Information,InfrastructureandtheInternet,MITPress(2007))and whatsensormeasurementsonecantrust(J.C.Wallis,C.L.Borgmann,MatthewMayernik,AlbertoPepe, NithyaRamanathanandMarkHansen,KnowthySensor:Trust,DataQuality,andDataIntegrityinScientific DigitalLibraries,11thEuropeanConferenceonResearchandAdvancedTechnologyforDigitalLibraries, September16–21,2007,Budapest,Hungary(2007)).Thisisalreadyfront-of-mindinworksuchas“beeswith backpacks”thatData61hasdone.Itishardlyanewconcern–the(apparentlysimple)notionofascientific measurementisdeeplyentangledwithnotionsoftrust,asisevidentfromthehistoryofVictorianscience (GraemeJ.N.Gooday,TheMoralsofMeasurement:Accuracy,Irony,andTrustinLateVictorianElectrical Practice,CambridgeUniversityPress(2004)). Trustthatsocialmechanismsbuiltwithdata-technologiescannotbemanipulated(SeeEricFriedman,Paul ResnickandRahulSami,Manipulation-ResistantReputationSystems,Chapter27inNoamNisan,Tim Roughgarden,EvaTardosandVijayV.Vaziriani,AlgorithmicGameTheory,CambridgeUniversityPress (2007)); Trustthatsensitiveinformationisnotleaked(GuillermoLafuente,Thebigdatasecuritychallenge,Network security2015.,12-14(2015); Trustthatdata-analyticsarefair(SolonBarocasandAndrewD.Selbst.Bigdata'sdisparateimpact.California LawReview104(2016);DanahBoydandKateCrawford,Sixprovocationsforbigdata.InAdecadeininternet time:Symposiumonthedynamicsoftheinternetandsociety(pp.1-17).OxfordInternetInstitute, (September2011)); Trustinthecommunicationsystemunderpinningdatatechnologies(WhiteHouse:"Cyberspacepolicy review:Assuringatrustedandresilientinformationandcommunicationsinfrastructure."WhiteHouse, UnitedStatesofAmerica(2009)).Thereisnoperfectlytrustablecommunicationsystem,andsolikeallother elementsofthetrustchain,arisksensitiveapproachwillbewarranted. Trustthattheoverallsystemsconstructedcanbesufficientlyreliedupon(PiotrCofta,Trust,Complexityand Control:ConfidenceinaConvergentWorld,JohnWileyandSons(2007)). 25 Thephrasealludestoanadmirablenovelabouttwofamousscientistswhoarefurther(inadditiontoHookeand Babbage–seeendnote4)greatrolemodelsforData61–AlexandervonHumboldtandCarlFreidrichGauss(Daniel Kehlman,MeasuringtheWorld,Pantheon(2006)).Humboldtisoneofthemostimportantcreatorsofmodern science,whoundertookoutstandinglypainstakingdatagatheringandanalysis(AndreaWulf,TheInventionofNature: TheAdventureofAlexandervonHumboldt,LostHeroofScience,JohnMurray,(2015)).Gaussisfamouslycreditedas theoriginatorofleastsquaresdataanalysis(StephenM.Stigler,Gaussandtheinventionofleastsquares,TheAnnals ofStatistics,9(3),465-474(1981))andthusoneofthefathersofmoderndataanalytics. Inanearlierversionofthisdocument,Iusedtheawkwardpolysyllabicneologism“datafication”,apparentlycoinedin thearticlebyKennethCukierandViktorMayer-Schoenberger:TheRiseofBigData,ForeignAffairs28–40,May/June, (2013).Itisalreadywidelyused,butitisanuglywordthatmanyData61folksreactednegativelyto,and,crucially,it missesthedistinctionbetweendataandcapta(seebelow). 26 Thisdistinctionisquiteold,butrarelyused.SeeRobKitchin,TheDataRevolution:Bigdata,opendata,data infrastructuresandtheirconsequences,Sage,LosAngeles(2014);thisexplainssomeofthehistoryoftheword; ChristopherChippindale,Captaanddata:onthetruenatureofarchaeologicalinformation,AmericanAntiquity65(4), 605-612(2000);BettinaBerendt,BigCapta,BadScience?Ontworecentbookson“BigData”anditsrevolutionary potential,DepartmentofComputerScience,KULeuven, https://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf(March2015). 27 QuotedfromtheentryforcaptusinALatinDictionary.FoundedonAndrews'editionofFreund'sLatindictionary. revised,enlarged,andingreatpartrewrittenby.CharltonT.Lewis,Ph.D.and.CharlesShort,LL.D.Oxford.Clarendon Press(1879). 28 Thetraditionalviewiswidespread;e.g.PaulCooper,Data,informationandknowledge,AnaesthesisaandIntensive |17 CareMedicine,11(12),505-506(2010). 29 AshleyBraganza,Rethinkingthedata-information-knowledgehierarchy:towardsacasebasedmodel,International JournalofInformationManagement,24,347-356(2004);IlkkaTuomi,DataismorethanKnowledge:Implicationsof theReversedKnowledgeHierarchyforKnowledgeManagementandOrganizationalMemory,JournalofManagement InformationSystems16(3),103-117(1999). 30 Itissometimesclaimedtobeaclearerdistinctionthanitreallyis:SreenivasRanganSukumar,Machinelearningfor data-drivendiscovery:thoughtsonthepast,presentandfuture,OakRidgeNationalLaboratory,(2014). 31 TonyHey,StewartTansleyandKristinTolle,TheFourthParadigm:Data-intensivescientificdiscovery,Microsoft Research,(2009). 32 AlonHalevy,PeterNorvigandFernandoPereira,TheUnreasonableEffectivenessofData,IEEEIntelligentSystems Magazine,8-12(March/April2009). 33 CaroleGobleandDavidDeRoure,Theimpactofworkflowtoolsondata-centricresearch, http://www.myexperiment.org/files/215/download/workflows-v8-05May2009.pdf(May2009). 34 DavidDeRoureandCaroleGable,AnchorsinShiftingSand:ThePrimacyofMethodintheWebofData,WebScience Conference,(April2010). 35 This(entirelywrong)phraseisduetoChrisAnderson:“Theendoftheory:thedatadelugemakesthescientific methodobsolete,”Wired(23June2008).Itdoesnosuchthing!Itsimplyallowsformoresophisticatedmodels. 36 SeanBechhofer,IainBuchan,DavidDeRoure,PaoloMissier,JohnAinsworth,JitenBhagat,PhilipCouchetal.,Why linkeddataisnotenoughforscientists,FutureGenerationComputerSystems29(2),599-611,(2013). 37 LudwickFleck,GenesisandDevelopmentofaScientificFact,UniversityofChicagoPress(1979);BrunoLatourand SteveWoolgar,LaboratoryLife:TheConstructionofScientificFacts,SagePublications(1979);KarlPopper,TheLogicof ScientificDiscovery,Hutchinson,(1959). 38 GeoffreyC.Bowker,MemoryPracticesintheSciences,MITPress,(2005). 39 MarkStalzerandChrisMentzel,Apreliminaryreviewofinfluentialworksindata-drivendiscovery,SpringerPlus 5:1266,(August2016). 40 Thereareotherreasonsthatservetopushforembeddinganalytics,especiallylatencyandbandwidthlimitations. 41 WilliamN.Dunn(ed),TheExperimentingSociety:EssaysinhonourofDonaldT.Campbell,TransactionPublishers, (1997);DonaldT.Campbell,MethodsfortheExperimentingSociety,AmericanJournalofEvaluation12,223-260, (1991);DonaldT.Campbell,ReformsasExperiments,AmericanPsychologist,24,409-429,(1969). 42 Asexplainedelsewhereinthisdocument,suchaphrase(“universalcaptafication”)doesnotimplyitisdoneonce, withoutatheoreticalstance,andthedata“speakforthemselves.”Whatismeanthereissimplythepushtowards morepervasive(henceapproaching“universal”)translationofthedataintheworldintocaptathatcanbe manipulated. 43 “Delivered”inthetitleofthisheadlineistherightword–weproposetochangethedeliverymodality,andto actuallybuildsystemsthatliterallydelivertheresults. 44 Conferstrategy4oftheNationalScienceandTechnologyCouncil,NetworkingandInformationTechnology ResearchandDevelopmentSubcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategic Plan,October2016:itarticulatestheneedforexplainableandtransparentsystemsthataretrustedbytheirusers, performinamannerthatisacceptabletotheusers,andcanbeguaranteedtoactastheuserintended. nd 45 PatrickMcDanieletal,TowardsaSecureandEfficientSystemforEnd-to-EndProvenance.2 workshoponthe theoryandpracticeofprovenance(2010). 46 Datatechnologiesaremadeupofhardwareandsoftware,theboundaryofwhichissomewhatblurred.Ourprimary (butnotexclusive)focushereisonthesoftwarebecauseitiswithregardtothatthatwehaveaglobalcompetitive advantage.Onecouldusethemoregeneralphrase“systemsyoucantrust”butthatmissesthespecificitythatI currentlyhave.AndalloftheresearchIamalludingtohereisindeedonsoftware. 47 JasonFurman,IsthisTimeDifferent?TheOpportunitiesandChallengesofArtificialIntelligence,RemarksatAINow: TheSocialandEconomicImplicationsofArtificialIntelligenceTechnologiesintheNearTerm,NewYorkUniversity, (July7,2016). 48 NationalScienceandTechnologyCouncil,NetworkingandInformationTechnologyResearchandDevelopment Subcommittee,TheNationalArtificialIntelligenceResearchandDevelopmentStrategicPlan,(October2016). 49 “Scientific”ismeantinthebroadsensedescribedinendnote4. |18 50 E.g.NathanRosenbergandL.E.BirdzellJr.,HowtheWestGrewRich:TheEconomicTransformationoftheIndustrial World,BasicBooks(1986). 51 HuwT.O.Davies,SandraM.NutleyandPeterC.Smith,WhatWorks?Evidence-basedpolicyandpracticeinpublic services,ThePolicyPress(2000). 52 Pleiotropy(genetically),ornon-injectivityoftheinversemap(mathematically). 53 Genetichetereogeneityornon-injectivityoftheforwardmap. 54 ElizabethEastland,FutureAustralia–MarketVision:Unlockingamoreprosperousandsustainablefutureforall Australians,Powerpointpresentation(2November2016). |19 CONTACTUS t 1300363400 +61395452176 e [email protected] wwww.data61.csiro.au ATCSIROWESHAPETHEFUTURE Wedothisbyusingscienceand technologytosolverealissues.Our researchmakesadifferencetoindustry, peopleandtheplanet. FORFURTHERINFORMATION BobWilliamson ChiefScientist,Data61 t +61262183712 m+61404053877 e [email protected] wwww.data61.csiro.au AdrianTurner CEO,Data61 t +6193724202 m+61475981219 e [email protected] wwww.data61.csiro.au
© Copyright 2025 Paperzz