Unlocking the UK`s Data Science Potential with the Alan Turing

Unlocking the UK’s Data
Science Potential with the Alan
Turing Institute
Anthony Lee
Strategic Programme Director
Intel—Turing Strategic Partnership
The Alan Turing Institute
The UK’s National Institute for Data Science
‘We will found the Alan Turing Institute
to ensure Britain leads the way again in
the use of big data and algorithm research’
George Osborne
Budget Speech, March 2014
The Alan Turing Institute
2
Faculty Fellows
Spread of expertise across the Joint
Venture Universities
OTHER
20
APPLICATIONS OF DATA SCIENCE
36
DIGITAL HUMANITIES
8
ETHICS FOR DATA SCIENCE
8
SOCIAL DATA SCIENCE
21
12
PURE AND APPLIED MATHEMATICS
STATISTICAL METHODOLOGY
30
STATISTICAL THEORY
18
8
PRIVACY AND SECURITY
SYSTEMS
9
DATA-CENTRIC ENGINEERING
5
ARCHITECTURES
5
ALGORITHMS
16
NUMERICAL METHODS AND OPTIMIZATION
12
33
MACHINE LEARNING
0
The Alan Turing Institute
5
10
15
20
25
30
35
40
Strategic priorities
The Alan Turing Institute
Strategic partnerships
•  Data-centric engineering
•  Defence & security
•  High-performance computing & data analytics
•  Health & life science
•  Economics & finance
The Alan Turing Institute
5
Strengths of the Institute:
Bridging the gap between industry and academia
We connect academics with real-world industry problems
A research community without disciplinary boundaries
Many academic disciplines, plus a team of software engineers
and industry partners, all working together in a shared space
to drive data science
National leadership
A national institute operating in a complex eco-system, with a
mandate to provide leadership in this emerging science
The Alan Turing Institute
6
Why “Big Data”?
•  We want computers to recognize whether an
image should be assessed by a pathologist?
•  Building an effective algorithm from scratch
using information about biology, imaging
equipment, the means of digitization, etc. is
almost impossible.
•  Instead, we can take a statistical approach:
learn a function mapping images to “yes” or
“no” by considering huge numbers of labelled
examples.
•  One very popular approach now is to model
functions using deep neural networks.
The Alan Turing Institute
8
Why “High Performance Computing”?
•  In order to successfully learn a good function, one needs to process
millions of examples.
•  The algorithms used to train are computationally intensive: some of the
recent resurgence of interest in neural networks is due to the emergence
of many-core processors, like graphics processing units and Intel’s Xeon
Phi series of processors.
•  In many situations, one will have data collected by wearable devices and
processed or analyzed in data centers.
•  In order to maximize efficiency, HPC systems need to be optimized for
the types of algorithms that are run, and take all aspects into account:
processors, memory, network, etc.
The Alan Turing Institute
9
Why is this so important now?
•  We are at a critical point where data science is becoming increasingly
important to society, and requires significant computational resources.
•  Classical statistics: simple model. Collecting data is what is hard and
inference is trivial. Emphasis is on statistical rather than computational
efficiency.
•  Modern statistics / machine learning: sophisticated models for complex
data. Data still expensive and model hard to derive. Inference is nontrivial and computationally challenging. Emphasis starts to shift from
statistical to computational efficiency.
•  Huge datasets: data is plentiful and we want to answer fairly difficult
questions. Models are flexible and inference is very computationally
expensive. Motivates / necessitates algorithm & architecture co-design to
improve speed and energy consumption.
The Alan Turing Institute
10
turing.ac.uk
@turinginst
28/01/17
The Alan Turing Institute
11