Governance Models for Research Computing in the UC

University of California
Governance Models
for Research Computing
Western Educause Conference
San Francisco
May 2007
Copyright University of California 2007. This work is the
intellectual property of the Regents of the University of
California. Permission is granted for this material to be shared
for non-commercial, educational purposes, provided that this
copyright statement appears on the reproduced materials and
notice is given that the copying is by permission of the
Regents. To disseminate otherwise or to republish requires
written permission from the Regents.
Presenters
David Walker, UC Office of the President
Director, Advanced Technologies
Information Resources & Communications
Heidi Schmidt, UC San Francisco
Director, Customer Support Services
Office of Academic & Administrative Information Systems
Ann Dobson, UC Berkeley
Associate Director, Client Services
Information Services & Technology
Jason Crane, PhD, UC San Francisco
Programmer Analyst
Radiology
Perspectives

System-wide initiatives

Campus models

Central campus IT services

Shared research computing facility
UC-Wide Activities in Support of
Research and Scholarship
David Walker
Office of the President
University of California
[email protected]
Information Technology Guidance
Committee (ITGC)



Identify strategic directions for IT investments that
enable campuses to meet their distinctive needs more
effectively while supporting the University’s broader
mission, academic programs and strategic goals.
Promote the deployment of information technology
services to support innovation and the enhancement of
academic quality and institutional competitiveness.
Leverage IT investment and expertise to fully exploit
collective and campus-specific IT capabilities.
Planning Inputs



Broad consultation with:
•
UC stakeholders
•
Campus and system wide governing bodies
Coordination with related academic and
administrative planning processes
Environmental scans and competitive analysis
ITGC Timetable

Launch the ITGC
Feb, 2006

Interim work group reports
Nov, 2006

Summary report to Provost
Jun, 2007

Review and comment
Oct, 2007
Presentations to President,
COC, Regents, Academic Council
Nov, 2007

Areas Addressed by the ITGC

Research and Scholarship

Teaching, Learning, and Student Experience


University-Wide Administrative and Business
Systems
Critical Success Factors (e.g., common
architecture, end-user support, collaboration
infrastructure)
Potential Recommendations for
Research and Scholarship

Advanced Network Services

UC Grid

Academic Cyberinfrastructure
Advanced Network Services



Upgrade all campus routed Internet
connections to 10 Gbps
Pilot new network services
•
Non-routed interconnects
•
Lightpath-based, application-dedicated
bandwidth
End-to-end performance tools,
instrumentation, and support
UC Grid

Enable resource sharing, based on UCTrust

Implement comprehensive storage services
•

Large-scale computation, project collaboration,
(very) long-term preservation
Explore UC-provided resources
•
Base-level compute and storage
•
Data center space
•
Support services
Academic Cyberinfrastructure

Ubiquitous access to services critical to
research, scholarship, and instruction
•
Collaboration tools and services
•
Tools for creation and dissemination of electronic
information
•
Digital preservation
•
Grant application / administration tools
•
End-user support services
UCTrust


A unified identity and access management
infrastructure for the University of California
Based on InCommon and Shibboleth
More Information

IT Guidance Committee
•

www.universityofcalifornia.edu/itgc
UCTrust
•
www.ucop.edu/irc/itlc/uctrust
Campus Governance Models
Heidi Schmidt
University of California, San Francisco
Office of Academic & Administrative
Information Systems
University of California = Diversity
Campus governance bodies that may influence
research computing include:

Academic Senate

IT advisory boards & committees

Research advisory boards & committees

Discipline-based governance groups
Campus-wide Perspectives
Ann Dobson
University of California, Berkeley
Information Services & Technology
UC Berkeley



Desire to provide central services
Desire to meet needs of less technical, less
resource-rich researchers (e.g. social
scientists)
Tension between one-time grant funding and
ongoing expenses

Desire to optimize use of resources

Need commodity model (one size fits all)

If we build it, will they come?
Requirements for LBNL Clusters
Systems in the SCS Program must meet the following requirements to be eligible for
support:

IA32 or AMD64 architecture

Participating cluster must have a minimum of 8 compute nodes

Dedicated cluster architecture. No interactive logins on compute nodes

Red Hat Linux operating system

Warewulf cluster implementation toolkit

Sun Grid Engine scheduler

All slave nodes only reachable from master node
Clusters that will be located in the Computer room must meet the following
additional requirements


Rack mounted hardware required. Desktop form factor hardware not allowed
Equipment to be installed into APC Netshelter VX computer racks. Prospective
cluster owners should include the cost of these racks into their budget
General Purpose Cluster

Hardware donated by Sun

29 Sun v20z servers (1 head node, 28 compute
nodes)

2 cpu, 2 gig RAM, 1 73-gig hard drive

Gigabit ethernet interconnect

NFS file storage on SAN

Housed in central campus data center
General Purpose Cluster (cont.)


Cluster management provided by LBNL’s
Cluster Team

Operating system: Centos-4.4 (x86_64)

MPI Version: Open-MPI-1.1

Scheduler: Torque-2.1.6

Compiler: GCC-3.4.6 and GCC-4.1.0
Cluster provided on a recharge basis
Collocation and Support

Collocation in central data center



$8/RU/month + power charge
Cluster management support

Varies based on number of nodes

About $1500/month for 30-node cluster
Assistance in preparing grant requests
Audience Poll



Does your campus provide central research
computing facilities?
Are these services provided on a recharge
basis?
Are these services centrally funded?
Departmental Clusters

Survey revealed 24 clusters

Half are in EECS



Data center space provided at no cost

Grant funding for support FTE

Hardware from donations or from grants

Charge for storage, network connections
Others a variety: biology, geography, statistics, space
science, optometry, seismology
Intel or AMD, many flavors of Linux
Departmental Clusters (cont.)

Chemistry Model

Chemistry provides machine room space

Chemistry FTE helps configure and get started

PI must have grad student Sys Admin

4-5 clusters owned by faculty, support the
research of 5-10 grad students

All Linux running ROCS or Warewulf
Audience Poll

Do departments on your campus provide
research computing support to their PI’s?

On a recharge basis?

Subsidized?
Other UC Campuses/Labs

UC San Francisco


Completely decentralized
UC Irvine

Research Computing Support Group (1.6 FTE)

Data center space ($200/month/rack)

Shared clusters for researchers and grad students

High speed networking

Backup service

System administration on recharge basis
Other UC Campuses/Labs (cont.)

UCLA

Data center space

High speed networking

Shared clusters

Storage

Cluster hosting and management

Charge for in-depth consulting, long-term
projects, and nominal one-time node charge
Other UC Campuses/Labs (cont.)

LBNL

Data center space

High speed networking

3 FTE

Pre-purchase consulting and procurement assistance

Setup and configuration

System administration and cybersecurity

Charge for incremental costs to support clusters
Other UC Campuses/Labs (cont.)

UC Riverside

Data center space

Funds for seed clusters



Researchers without funds to buy own
Researchers with ability to purchase but will use
central service
Ongoing support of systems will be recharged
Other UC Campuses/Labs (cont.)

UC San Diego

Decentralized Services on Recharge Basis





Central Services: Network Infrastructure


Server room space
Hosting
System Administration
Consulting
Supported by “knowledge worker” fee
San Diego Supercomputer Center
Audience Poll

Does your campus have a “knowledge
worker” fee?
Challenges

Provide a useful central resource

Optimize use of clusters


Encourage PIs to use central resources even if
it costs money
Develop funding model that works well with
grants
Shared Research Computing
Jason Crane, PhD
University of California, San Francisco
Department of Radiology
Case Study: UCSF Radiology Department Shared
Research Computing Resources
• UCSF Radiology computing
• Center for Quantitative Biomedical Research (QB3
Institute) computational cluster
• Incentives and disincentives for sharing computing
resources
• Advice about building collaborations and consensus
UCSF Radiology Department Computing
Radiology Research Computing Recharge
Group A
Group B
…
Organization and Structure
– Ownership: individual research groups (~150 desktop workstations + Linux cluster)
– Administration: Radiology Research Computing Recharge
– Cost Structure: hardware + support recharge from PIs research grant directs
Computational Needs and Problems
–
–
–
–
–
Underutilized CPU’s
Some researchers have computationally demanding problems
Serial processing on individual desktop machines takes hours-days
Manual cycle stealing
Embarrassingly parallel problems
UCSF Radiology Department Computing
Solution:
– Deploy resource management software (RMS, Sun Grid Engine) to enable
parallel computing on idle desktop machines supported by recharge.
– Group specific queues: users submit parallel processing jobs to idle machines
within their research group.
Radiology Research Computing Recharge
Group A
…
Group B
RMS - Job Scheduler
Group A
Group B
Research Group Users
…
UCSF Radiology Department Computing
Observations
Clustering
– Increased intra-group CPU utilization
– Increased adoption of computationally demanding software
– Improved research capabilities and throughput
However,
– Inter-group CPU sharing was under-utilized
– Higher-end storage needed to support IO requirements
– Recharge cost for underutilized dedicated cluster doesn’t
scale well
– Time sharing is more cost effective than dedicated partially
utilized cluster
Interdepartmental Shared Computational Cluster
Organization and Structure
– Users: Interdepartmental within QB3 institute
– Cost Structure:
• PIs research grant directs: compute nodes (time share), fits well with one time
sources of funding.
• Institute grants and endowments: shared admin., high-end shared hardware
– Governance:
• Technical: cluster admin, technical users
• Policy: committee of representative PIs
– Hardware: 1200 cores (Linux), 13TB NAS
Radiology’s Requirements
–
–
–
–
Real-time & interactive apps. benefit from large number of CPUs for short bursts
Access to shared high-end storage for IO intensive apps
Lower cost structure for cluster support via utilizing institute supported administration
HIPAA compliance
Interdepartmental Shared Computational Cluster
Experiences to date
– High-end cost-effective resource for institute’s research
– Varied use patterns benefit all users
– Frees research group time for research
– Radiology’s unique requirements (HIPAA, workflow, accessibility)
slow to be implemented
– Evaluate requirements, consider application interoperability: Use of
Grid standards may have eased the transition for Radiology (cluster
design/software porting).
Incentives for Sharing
– Reduce Costs
• Share administrative costs
• Leverage bulk buying power
• Increase hardware utilization
– Increase Performance and QOS
• Justify high-end hardware: shared cost, efficient utilization
• Greater hardware redundancy
• Design input from larger expertise pool
Disincentives for Sharing
– Sharing isn’t equitable
– Use cases vary from norm
– Sharing may impact my resources
Advice for Sharing
– Establish guidelines for collaboration:
• Equitable cost structure
• Voting rights/governance
– Develop applications/services to support accepted Grid standards