Innovation in a Complex World: Examples and Challenges www.microsoft.com/science Dr Daron Green Senior Director, Microsoft Research Overview • Context • Innovation in action – Data deluge – Data visualization – Data sharing • Challenges/impediments – Things we haven’t worked out – What’s stopping us making progress – Areas of concern Microsoft Research At A Glance Redmond, Washington San Francisco, California Cambridge, United Kingdom Beijing, China Silicon Valley, California Bangalore, India Cambridge, Massachusetts Sep, 1991 Jun, 1995 July, 1997 Nov, 1998 July, 2001 Jan, 2005 July, 2008 MSR India Microsoft Research Mission Statement • Expand the state of the art in each of the areas in which we do research • Rapidly transfer innovative technologies into Microsoft products • Ensure that Microsoft products have a future Context: Science @ Microsoft Earth Sciences Multidisciplinary Research Computer & Information Sciences Life Sciences Social Sciences New Materials, Technologies & Processes Math and Physical Science A Data Deluge in Science • Data collection – Sensor networks, satellite surveys, high throughput laboratory instruments, astronomical telescopes, supercomputers, LHC … • Data processing, analysis, visualization SensorMap Functionality: Map navigation Data: sensor-generated temperature, video camera feed, traffic feeds, etc. – Legacy codes, workflows, data mining, indexing, searching, graphics … • Archiving – Digital repositories, libraries, preservation, … Scientific visualizations NSF Cyberinfrastructure report, March 2007 Emergence of a New Research Paradigm? • • • • Thousand years ago – Experimental Science – Description of natural phenomena Last few hundred years – Theoretical Science – Newton’s Laws, Maxwell’s Equations… Last few decades – Computational Science – Simulation of complex phenomena Today – eScience or Data-centric Science – Unify theory, experiment, and simulation – Using data exploration and data mining • Data captured by instruments • Data generated by simulations • Data generated by sensor networks Scientists over-whelmed with data… Computer Scientists and IT companies have technologies that will help innovate . a a 2 4 G 3 c2 a2 Implications • Data management along research pipeline: •Capture (inc metadata) •Processing •Visualization •Storage •Retrieval •Sharing •Publication •Archival Handling the data deluge… Three examples: • Machine Learning and HIV/AIDS research • Advanced Database technologies and Environmental Science • Oceanographic Workflows Fighting HIV with Computer Science • A major problem: Over 40 million infected – Drug treatments are effective but are an expensive life commitment • Vaccine needed for third world countries – Effective vaccine could eradicate disease • Methods from computer science are helping with the design of vaccine – Machine learning: Finding biological patterns that may stimulate the immune system to fight the HIV virus – Optimization methods: Compressing these patterns into a small, effective vaccine Computational Biology Web Tools Better vaccine design through improved understanding of HIV evolution Goals • Use machine learning and visualization tools developed at Microsoft, which require HPC, to build maps of within-individual evolution of the HIV virus Progress so far • Discovered ‘decoy epitopes’ that could have predicted recent failure of Merck vaccine • Algorithms and medical results published in Science and Nature Medicine • MSR Computational Biology Tools published (Open Source on CodePlex) 11 Handling the data deluge… Two examples: • Machine Learning and HIV/AIDS research • Advanced Database technologies and Environmental Science • Oceanographic Workflows Carbon-Climate Data • What is the role of photosynthesis in global warming? – Measurements of CO2 in the atmosphere show 16-20% less than emissions estimates predict – The difference is either due to plants or ocean absorption. • Communal field science – each investigator acts independently. LaThuile_NEE (gC m-2 yr-1) • Cross site studies and integration with modeling increasingly important 1500 1000 500 0 -500 -1000 -1500 -1500 -1000 -500 0 500 Pub_NEE (gC m-2 yr-1) 1000 1500 Ameriflux Data In collaboration with Berkeley Water Center • 149 Ameriflux sites across the Americas reporting minimum of 22 common measurements • Carbon-Climate Data published to and archived at Oak Ridge • Total data reported to date on the order of 192M half-hourly measurements since 1994 14 • Sharepoint site www.fluxnet.org – 921 site-years of data from 240 sites around the world; 80+ siteyears now being added – 60+ paper writing teams – American data subset is public and served more widely – Summary data products greatly simplify initial data discovery • Used modern Relational Database technologies – Scientists can access data through Data Cubes – Allows simple data viewing without need for knowledge of SQL language Brazil -- Tapajos (Santarem,Km Brazil -- Tapajos (Santarem,Km Canada - Boreas 1850 Canada -- BOREAS NSA - 1930 bu Canada -- BOREAS NSA - 1963 bu Canada -- BOREAS NSA - 1981 bu Canada -- BOREAS NSA - 1989 bu Canada -- BOREAS NSA - 1998 bu Canada -- BOREAS NSA - Old Bla Canada -- British Col., Campbe Canada -- Lethbridge USA -- AK Atqasuk, Alaska USA -- AK Barrow, Alaska USA -- AK Happy Valley, Alaska USA -- AK Upad, Alaska USA -- AZ Audubon Research Ran USA -- CA Blodgett Forest, Cal USA -- CA Sky Oaks, Old Stand, USA -- CA Sky Oaks, Young Stan USA -- CA Tonzi Ranch, Califor USA -- CA Vaira Ranch, Ione, C USA -- CO Niwot Ridge Forest, USA -- CT Great Mountain Fores USA -- FL Florida-Kennedy Spac USA -- FL Florida-Kennedy Spac USA -- FL Slashpine-Austin Car USA -- FL Slashpine-Donaldson, USA -- FL Slashpine-Mize,clear USA -- FL Slashpine-Rayonier,m USA -- IL Bondville, Illinois USA -- IN Morgan Monroe State USA -- KS Walnut River Watersh USA -- MA Harvard Forest EMS T USA -- MA Harvard Forest hemlo USA -- MA Little Prospect Hill USA -- ME Howland Forest (main USA -- MI Sylvania Wilderness USA -- MI Univ. of Mich. Biolo USA -- MO Missouri Ozark Site USA -- MS Goodwin Creek, Missi USA -- MT Fort Peck, Montana USA -- NC Duke Forest - loblol USA -- NC Duke Forest-hardwood USA -- NE Mead - irrigated con USA -- NE Mead - irrigated mai USA -- NE Mead - rainfed maize USA -- OK Little Washita Water USA -- OK Ponca City, Oklahoma USA -- OK Shidler, Oklahoma USA -- OK Southern Great Plain USA -- OR Metolius-first young USA -- OR Metolius-intermediat USA -- OR Metolius-old aged po USA -- SD Black Hills, South D USA -- SD Brookings, South Dak USA -- TN Walker Branch Waters USA -- WA Wind River Crane Sit USA -- WI Lost Creek, Wisconsi USA -- WI Park Falls/WLEF, Wis USA -- WI Willow Creek, Wiscon USA -- WV Canaan Valley, West Scientific Data Servers for Hydrology 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 Ameriflux Data Availability : All Data Mashup of Ameriflux Sites Handling the data deluge… Two examples: • Machine Learning and HIV/AIDS research • Advanced Database technologies and Environmental Science • Oceanographic Workflows Trident – Scientific Workbench Trident Scientific Workflow Workbench What it provides to the scientists • Visually program workflows, through a web browser. • Libraries of activities and workflows, to save and reuse workflows. • Abstract parallelism for HPC, to test on desktop and then run on cluster. • Adaptive workflows, to detect and respond to events in real-time. • Automatic provenance capture, for all workflows and data products. • Costing model, estimating resources required to run a workflow. • Integrated data storage and access, allows researcher to store data on a SQL database, local files or in the cloud (Microsoft SDS, Amazon S3). • Fault tolerance, facilitate smart reruns, what-if analysis • Reproducible research However…Challenges/Impediments • Three dominant issues: – People: lack of alignment in benefits, incentives and budget…or, put another way, the way we respond to money, process, metrics, measurement and recognition… – Technology: Transition to many/multi-core – Privacy: risk of exposing personal information Remote management of long-term conditions The underlying challenge… • Thousands of successful(?) pilots but none ‘make it big’ • Many, many papers published • It has been shown† that: – Largely no motivation for adoption by health practitioners because there is… – no alignment of benefits, incentives and budgets • Or, stated another way, it is dangerous to assume people will adopt an innovation just because it is ‘obviously’ the right thing to do. • Consider the whole context for the innovation (people, money, metrics, reward structures, process, skills etc) it’s not just the technology. • Sometimes the key innovation is in the business design †Dr Daron G Green and Prof Terry Young; Value Propositions for Information Systems in Healthcare HICSS - Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences p257, 2008 Challenges/Impediments • Three dominant issues: – People: lack of alignment in benefits, incentives and budget…or, put another way, the way we respond to money, process, metrics, measurement and recognition… – Technology: Multi-Core Transition – Privacy: inadvertently exposing personal information CPU Architecture • Heat becoming an unmanageable problem Sun’s Surface Power Density (W/cm2) 10,000 Rocket Nozzle 1,000 Nuclear Reactor 100 Pentium® 10 4004 8008 1 ‘70 8086 8085 286 Hot Plate 386 486 8080 ‘80 Intel Developer Forum, Spring 2004 - Pat Gelsinger ‘90 ‘00 ‘10 The End of Moore’s Law as We Know It • Future of silicon chips – “100’s of cores on a chip in 2015” (Justin Rattner, Intel) • Challenge for IT industry and Computer Science community – How can we make parallel computing on a chip easy for developers of consumer applications? • Challenge for the Scientific Community – How will the Multi-Core transition affect scientific computing? Challenges/Impediments • Three dominant issues: – People: lack of alignment in benefits, incentives and budget…or, put another way, the way we respond to money, process, metrics, measurement and recognition… – Technology: Multi-Core Transition – Privacy: inadvertently exposing personal information Challenge: Data for Open Innovation • With web users becoming producers of information… • We leave the footprint of our lives in digital trails… • It is becoming easier for “data snoopers” to reconstruct the identity of an individual or an organization by cross-linking information from different sources. 28 A face is exposed for searcher no. 4417749 • “Search query data can contain the sum total of our work, interests, associations, desires, dreams, fantasies, and even darkest fears.” The New York Times, Aug 2006: Thelma Arnold's identity was betrayed by the records of her Web searches 29 Online Privacy • We leave our traces online at multiple sites such as social networks, blogs, forums etc. – Re-identify users from movie mentions in forums to user ratings of movies *Frankowski’06+ • However, researchers seek to gain insights, undertake experiments with real-world data and businesses need tools and analysis to understand market trends and needs… 30 In need of a framework for open innovation • Research and Innovation is inhibited due to the lack of a framework to disseminate information in a safe way • Open innovation roadblocks due to shortcomings in – Data confidentiality/privacy – Different data regulations per country • More research needed on technical (semantics), legal, societal solutions and processes to enable open innovation in an information-based society 31 Challenges/Impediments • Three dominant issues: – People: lack of alignment in benefits, incentive and budget…what is the business design that underpins your innovation? – Technology: Multi-Core Transition…just how will this work out? – Privacy: inadvertently exposing personal information…what personal/business risks are we prepared to accept? Context: Science @ Microsoft Earth Sciences Multidisciplinary Research Computer & Information Sciences Life Sciences Social Sciences New Materials, Technologies & Processes Math and Physical Science www.microsoft.com/science Starting point Comprehensive analysis of: 3) …and needed to - NHS Stakeholder vs benefitunderstand what functionality/value was required - NHS Stakeholder vs incentives 2) …then we aspired to be here… - NHS Stakeholder vs budget availability 1) BT originally tried to sell here - defining the scope of the service Simplified benefits Plays into political agenda: - Access - Choice - Increased private sector involvement in patient care - New role of pharmacies No significant benefit to these care providers PCT sees benefit and dis-benefit: - Benefits of service are extremely diffuse - Medication and strips costs ↑ - GP visits and A&E admissions ↓ over time - Compliance increases: Yr 1 <£10k benefit growing to £225k by Yr 10 (payback over v long timescales) - Near term: BT CDM solution roughly cash neutral to PCT Patients clearly benefit provided they are motivated to use service Incentives summary Incentives dominated by financial imperatives Current incentives operate against adoption of service Implementation of service largely irrelevant given current incentives Requires regular updates to ensure personal motivation Budget availability summary Lack of incentives and appropriate metrics lead to no real acknowledgement of the problem and no defined budget Patients see costs for diabetes (and other LTCs) as being responsibility of NHS Summary overlay [benefits/incentives/budget] Accrual of benefits at upper levels in NHS/DoH encourages nationalscale service...however all management of long-term conditions is devolved to ‘lower’ levels of the NHS Alignment of benefits, incentives and budget availability does not appear at lower levels of stakeholder stack. Explains why many hospital/PCT/SHA pilots and other initiatives in this area have failed. This is a ‘no profit zone’ for a CDM service in UK.
© Copyright 2026 Paperzz