Gid in a combinatorial laboratory

X-ray single
Mol
Raman
Jeremy Frey
STM
From e-Science to
Publication@Source
School of Chemistry
Comb-e-Chem
Sept 2003
University of Southampton, UK
Ocean
Monolayer
Jeremy Frey
e-Science
• ‘e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it.’
• ‘e-Science will change the dynamic of the way
science is undertaken.’
John Taylor, DG of UK OST
• ‘[The Grid] intends to make access to computing
power, scientific data repositories and
experimental facilities as easy as the Web makes
access to information.
Tony Blair, 2002
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
The Collaboratory Concept
• In 1989, William Wulf, then with the U.S.
National Science Foundation, defined a
collaboratory as
"a center without walls, in which the nation's
researchers can perform their research without
regard to geographical location, interacting with
colleagues, accessing instrumentation, sharing
data and computational resources, and accessing
information in digital libraries."
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
The Comb-e-Chem Project
• The exponential world of Combinatorial
Synthesis and High throughput analysis
meets the exponentially growing power
of computing
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
•Comb-e-Chem Partners
•IBM
•IT
•Innovation
•NCS
•CCDC
•ECS
•Chemistry
•Stats
•Combi
•Centre
•Pfizer
•Bristol
•Chemistry
•GSK
•Southampton
•AZ
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
•IUPAC
•RSC
The Comb-e- Chem Vision
Structure + Properties
Structures
DB
Automation & Remote
interaction
19 Feb 2004 OAI Meeting
Knowledge + Prediction
Properties
DB
Simulation and
calculation
Jeremy G. Frey & Mike Hursthouse
Co-Laboratory
Interaction between
users & “Dark Labs”
Comb-e-Chem Project Automation
Video
Simulation
Diffractometer
Properties
Analysis
Structures
Database
X-Ray
e-Lab
Properties
e-Lab
Grid
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Scientist at the Centre of an Information Web
By access variable and difficult
HPC
Experiment
Analysis
Storage
HPC
Scientist
Experiment
Computing
Storage
Analysis
HPC
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
The Future
The Grid Model - Information Utilities
Uniform access
Scientist
M
I
D
L
E
W
A
R
E
Experiment
Analysis
Computing
Storage
Storage
Experiment
Analysis
Computing
Computing
Storage
Remember that you contribute to other people’s information web
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
End - to - end connectivity
• Provide the smooth connection between
the sources of data & information
• From literature to the laboratory bench
and back via all stages of analysis and
discussion
• Thus the need for a Data Grid or Grids
• Al steps need to be Grid aware
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Generate information within & for the grid context
Goal
Knowledge
Literature
Plan &
COSHH
Report
Smart Laboratory
Information
Integration
Digital Model
Analysis
Synthesis
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
1.2
1
0.8
0.6
0.4
O O
S
N
239
197
211
Variety of data
N
H
H2N
19 Feb 2004 OAI Meeting
183
155
169
141
127
113
99
57
71
85
29
43
1
15
0
225
0.2
Jeremy G. Frey & Mike Hursthouse
The Grid
• Grid is needed because
– Complexity of data
– Volume of data (real time data, images,
video)
– Scale of computation (analysis, simulation)
– Complexity of process (automation)
– Variable demands on computation
– Provenance (audit trials, timestamps,
process)
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Dissemination & Publication
• A different approach is required to provide
data to the community
• The grid provides the necessary medium
• What & How do we want to make
available
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Journals:
Publication @ source
Database
Journal
Journal
Paper
Materials
Laboratory Data
Multimedia
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
“Full” record
Data Trail
• Drill down through the analysis path
• Look at increasingly raw data
• Often large expansion in quantity and
variety at each stage
• Need URIs for everything
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Publication@Source
• Must be able to track back to the original data
• Primary reason is to allow new analysis in the
•
•
future by other researchers.
In a university environment this may be viewed
as a public responsibility in business
environment ensuring maximum value from
investment.
Does have implications for provenance and even
fraud!
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Publication Chain
Bibliography
Student
Journal
Professional
Body Archive
19 Feb 2004 OAI Meeting
Institution
Jeremy G. Frey & Mike Hursthouse
Laboratory
Smart Labs
Synthesis
Sample
NCS
Raw images
Processed diffraction
pattern
Structure
CIF
metadata
Automated structure
determination
19 Feb 2004 OAI Meeting
Archive
Jeremy G. Frey & Mike Hursthouse
CCDC
Validation
Database
Journal
Chemical Crystallography: A Suitable Case for OA
Therapy
Mike Hursthouse
Department of Chemistry and Combinatorial Centre of Excellence,
EPSRC National Service for Crystallography
University of Southampton, UK
Comb-e-Chem
Sept 2003
Jeremy Frey
ChemCryst
•
•
•
•
•
•
•
•
Characterisation technique for Chemical Structure.
Use XRD.
Provides high level of chem knowledge
Structure – molecular or crystal
Previously focussed on molecular structure – chemical props
Now focus on crystal structure – physical props
Change in interest facilitated by availability of database archive.
However, woefully incomplete
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
ChemCryst
• Database Archive – ca 300000 entries – all
• published structures
• >10M chemical compounds known
• Probably 1.5M structures known
• Why shortfall? Archaic publishing methods.
• Solution?
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
ChemCryst
• ChemCryst results New dissemination strategy
• E-Prints of “Structure Reports”
•
•
•
•
Can be created automatically.
Work can be validated automatically.
All data (raw, processed, meta…) included.
Hence bypass Journal sponsored “refereeing
• Still need to decide on “publication” of “science”
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
e-Bank Project JISC project with
UKOLN
• Link comb-e-chem and other semantic
grid science projects to the e-print system
at Southampton
• Provide dissemination and provenance
19 Feb 2004 OAI Meeting
Jeremy G. Frey & Mike Hursthouse
Changing the way we work
E-Lab:
X-Ray
Crystallography
Samples
Quantum
Mechanical
Analysis
Data
Provenance
Authorship/
Submission
19 Feb 2004 OAI Meeting
Samples
Laboratory
Processes
Laboratory
Processes
Structures
DB
E-Lab:
Combinatorial
Synthesis
Properties
Prediction
Data Mining,
QSAR, etc
E-Lab:
Properties
Measurement
Laboratory
Processes
Properties
DB
Design of
Experiment
Data Streaming
Visualisation
Agent Assistant
Jeremy G. Frey & Mike Hursthouse