Assembling the EVOp Infrastructure

Assembling the EVOp
Infrastructure
Yehia El-khatib, Gordon S. Blair
School of Computing & Communications
Lancaster University
Outline
•
•
•
•
•
EVOp: An introduction
Assembling the infrastructure
Developing the web interface
Why cloud?
Issues influencing cloud uptake
– What could be done
• Summary
EVOp
• Environmental Virtual Observatory pilot.
– 2 years from the start of 2011.
• Funded by the UK Natural Environment Research Council
(NERC) to help tackle big environmental science questions
through:
– Enabling the integration of a variety of information sources at different
granularities and scales.
– Facilitating the handling of large data sets from different sources.
– Providing simple access tools to increase engagement from policy
makers, local communities and the general public.
• Focus on hydrology.
 Grant reference NE/I002200/1
EVOp: 4 main user groups
Scientists
• Worry less about some of the repetitive tasks.
• Share and reuse datasets and workflows.
Policy Makers
• An open decision support system.
Local
Communities
• Explore the impact of different practices
(farming, water management, etc.).
General Public
• Raise awareness of current environmental
issues and encourage a wider discussion.
EVOp: 4 main user groups
Scientists
Policy Makers
Local
Communities
General Public
Web Interface
Web Services
Models
Virtual Resources
Processes, not design.
Hardware Resources
Infrastructure
• Hybrid model where private resources are normally
used, and public resources are used at times of
increased load.
• Developed our own load balancer to manage
resource usage to reduce costs while maintaining
user experience.
• Might adjust in the future to run experimental
services (e.g. tailored workflows) on private
resources, and move more streamlined services (e.g.
models) to run on public resources.
Infrastructure
• Public cloud: Amazon Web Services.
• Private cloud managed by Eucalyptus Community Cloud.
– Provides an open source alternative to EC2 and S3 (similar interfaces).
– However, moving between Eucalyptus and AWS is not always easy, as
images need a lot of preparation beforehand.
– Moreover, recent versions (1.6+) had stability issues.
– Also, community support is weak.
• Currently testing OpenStack as an alternative, also AWScompatible.
• The use of jClouds is very important to us to minimalise
portability overheads (prevent being locked in to one cloud
provider).
Multifaceted Web Interface
• We cater to different user groups of varying backgrounds and
experience levels.
– Users (including scientists) are not IT experts, or at least would rather
not be!
– They do not want to tussle with compatibility issues, security
restrictions, stringencies about citing/sharing, etc.
• Developed an intuitive user interface that is tested repeatedly
with stakeholders to ensure a low entry barrier for all targeted
user groups.
– Easy to use (find your way around)
– Easy to understand (comprehend what this offers)
– Easy to relate to (tweak-ability, reproducibility, reuse, sharing)
Multifaceted Web Interface
• General interface allows users to do things like:
– Learn about the risk of flooding in their local area.
– Explore how different farming practices affect such risk.
• Authenticated government / local council officials could:
– Learn about polluting nutrients diffused from different catchments.
– Examine how policies would affect pollution levels at different scales.
• An “advanced path” allows scientists to compose workflows:
– A workflow is a pipeline of basic execution units (executables, scripts,
web services, etc.).
– Done in the browser. No programming prerequisites.
– Allows the sharing of workflows and datasets to promote reuse, citing
and collaboration.
Why Cloud?
• Flexibility (Virtualisation)
– Allows the dynamic provisioning of bespoke environments.
– Everything from the hardware, platform, libraries, etc. can
be customised to suit the exact needs of an application.
– Very little limitations on what the application should be.
 Build what we want!
– To draw a comparison: Grid users are tied in to too many
specifications of the grid environment: hardware
architecture, runtime environment, scheduling interface,
and supported application interface. As such, only certain
types of jobs can be submitted, where precompiling is
sometimes needed to ensure compatibility.
Why Cloud?
• Versatile resource management (SOA)
– All resources have a uniform view.
– Allows us to support data assets of different origins: from
in situ gauging stations, warehoused data stores, user
provided, and external sources.
– Facilitates sharing and reuse (e.g. workflows) which
promotes a culture of collaboration.
– Provides abstraction so that data can be used in models
and simulations without necessarily giving it away.
– Provides transparency details of where and how the data is
held are hidden without affecting user experience.
Why Cloud?
• Easy access to resources (IaaS)
– IaaS: hardware resources as a utility.
– Allows the infrastructure to scale to meet user demand
and maintain quality of service.
– Ease of mind: issues of reliability, performance, and
security at that hardware level are outsourced.
– Allows us focus on solving domain-specific problems.
– No usage quotas (unless you want to).
– Very few AAA hoops to jump through.
…as long as you can pay for that!
Issues Influencing Cloud Uptake
• Users see the advantage straight away.
– Previously a scientist needed to have the data on their computer,
develop & calibrate a model, run it. Check output. Rinse & repeat.
– Identify ease of use, universal access, abundance of resources.
• Some data producers are reluctant to provide their data
through what they perceive as new, untested means.
– Some communities are more advanced than others.
• Easier to get funding based on the PAYG economic model.
– Cut upfront costs. Reduce money spent on unused resources.
• Funding bodies still perceive security to be a concern.
– A cloud is just a computer system.
– Public could service providers have whole teams working on security.
What more could be done?
• Introduce national (or even regional) initiatives to
regulate cloud service provisioning.
– This should ease a lot of the worry about trust.
• Educating data owners about cloud computing.
– Difficult.
– Hopefully success stories (e.g. NGS cloud, EduServ) could
alter attitudes.
• Educating research communities about available
cloud solutions.
– Teach students cloudnumbers.com rather than MATLAB.
Summary
• Cloud resources are easy to steer in order to serve
the needs of a scientific application without imposing
development restrictions, integration boundaries or
deployment difficulties.
• Public cloud is convenient, but a hybrid one offers
more options.
• There are concerns surrounding trust and security
(such as data licensing) that affect the uptake of
cloud computing in research communities.
– Some measures could be taken to alleviate such concerns.
Distributed Computing Paradigms
HPC
Grid
P2P
Cloud
(public)
Cloud
(private)
Ownership
(management)
My university
Our universities
Our partners
3rd party
My university
Trust
Very High
High
 Trust in
partners
Perceived by some
as problem
Very High
Reliability
High
High
Depends on size
& partners
Very high
High?
Accounting
Individual
quotas
Individual / Org.
quotas
Difficult…
Pay per use
Homebrewed
Access Control
Customisation
Very bad
Bad
Fairly flexible
Very flexible
Very flexible
Access
Easy
Complicated
Complicated
Easy
Easy
Support
Local
sysadmin
Remote
sysadmin
Local/Remote
sysadmin
24x7 support
Local
sysadmin
Thank you!
Questions
http://www.comp.lancs.ac.uk/~elkhatib/
Yehia El-khatib
Gordon S. Blair
@yelkhatib
http://www.comp.lancs.ac.uk/department/staff.php?name=gordon
[email protected]
http://www.evo-uk.org/
@EVOpilot