Compute Canada Technology Briefing

Compute Canada
Technology Briefing
November 2016
Overview
This technology briefing is intended for Compute Canada stakeholders and suppliers. It provides
a snapshot of status for the technology refresh program resulting from CFI’s cyberinfrastructure
initiative, planned for implementation from 2015-2019. It also anticipates planning for future
growth.
About Compute Canada
Compute Canada, in partnership with regional organizations ACENET, Calcul Québec, Compute
Ontario and WestGrid, leads the acceleration of research innovation by deploying state-of-theart advanced research computing (ARC) systems, storage and software solutions. Together we
provide essential digital research services and infrastructure for Canadian researchers and their
collaborators in all academic and industrial sectors. Our world-class team of more than 200
experts employed by 37 partner universities and research institutions across the country provide
direct support to research teams and industrial partners.
Advanced research computing accelerates research and discovery and helps solve today’s grand
scientific challenges. Using Compute Canada resources, research teams and their international
partners work with industry giants in the automotive, ICT, life sciences, aerospace and
manufacturing sectors to drive innovation and new products to market. Canadian researchers
leverage their access to expert support and infrastructure to participate in international
initiatives. Researchers using Compute Canada’s advanced research computing resources rate
significantly higher in citations than the average from Canada’s top research universities and
any international discipline average.
Technology Investment Key Facts
ņņ The “Stage 1” investment, valued at $75 million in funding from the Canada Foundation
for Innovation (CFI), provincial and industry partners, is underway. These investments are
addressing urgent and pressing needs and replacing aging high performance computing
systems across Canada.
ņņ Planning is underway for the outcomes of the “Stage 2” proposal for a further $50 million,
which will continue to address capacity needs, as well as providing expansions of secure
cloud and other services.
ņņ Compute Canada and its regional partners have more than 18 years of experience in
accelerating results from industrial partnerships in advanced research computing and
Canada’s major science investments.
ņņ Compute Canada currently manages more than 20 legacy systems, which are being
replaced in 2017-2018 by new systems and storage. Compute Canada operates these
resources and supports all of Canada’s major science investments and programs.
ņņ With the implementation of the Stage 1 and 2 combined technology deployments,
Compute Canada anticipates capacity of over 100 petabytes of persistent storage and 20
petaflops of computing resources.
Investment Impacts on Canada’s Research Community
ņņ Stage 1 and Stage 2 improvements will allow Compute Canada to continue to support
the full range of excellent Canadian research. The purchase of significantly more storage,
deployed as part of an enhanced national data cyberinfrastructure, will accelerate dataintensive research in Canada. The ability to purchase a single Large Parallel (LP) machine
of over 65,000 cores will provide Canada’s largest compute-intensive users with a new
resource that far exceeds any machine in the Compute Canada fleet today.
ņņ Investments in technology refresh are more than an opportunity to increase the size of
storage systems and the number of cores. The new systems replace old technology with
new, and will be deployed with national services, coherent policies and a new operational
model for the organization. This enhanced service level will allow more researchers to
exploit the new systems in an efficient and effective way.
Compute Canada Technology Briefing - November 2016
3
New Systems at Four Stage 1 National Hosting
Sites
Through a formal competition among Compute Canada member institutions, four sites were
selected to host the Stage 1 systems and associated services. They are the University of Victoria
(UVic), Simon Fraser University (SFU), the University of Waterloo (Waterloo), and the University
of Toronto (UofT). System specification, procurement, and deployment is ongoing, from 20162017.
SYSTEM
RFP ISSUED
National Data
Cyberinfrastructure
ARBUTUS - UVic Cloud
CEDAR - SFU
General Purpose
GRAHAM - Waterloo
General Purpose
NIAGARA- UofT
Large Parallel
RFP CLOSED
DELIVERED
IN PRODUCTION
(Ongoing
delivery)
Fall 2016
Fall 2016
Early 2017
Spring 2017
Late 2017
Table 1: Stage 1 procurement status
Stage 1 Computational Systems
University of Victoria: The ARBUTUS system (previously known as “GP1”) is an OpenStack
cloud, with emphasis on hosting virtual machines and other cloud workloads. The system,
provided by Lenovo, has 6,944 CPU cores across 248 nodes, each with on-node storage and 10Gb
networking. It accesses 1.6PB of persistent storage, primarily via Ceph in a triple-redundant
configuration. The system became operational in September 2016, as an expansion to the
Compute Canada “Cloud West” system.
Figure 3.1: The ARBUTUS system in operation at the University of Victoria (September 2016)
Simon Fraser University: The CEDAR system (previously known as “GP2”) is a heterogeneous
cluster, suitable for a variety of workloads. The system will be liquid cooled, and will be installed
in the newly renovated SFU Water Tower data centre. It is expected to include over 20,000 CPU
cores. Node types are anticipated to include “base” and “large” compute nodes with 128GB
and 256GB of memory, as well as bigmem nodes with 512GB, 1.5TB and 3TB of memory. GPUs
will be included in approximately 15% of nodes. When deployed in early 2017 this will be one of
Canada’s most powerful research computing systems.
University of Waterloo: The GRAHAM system (previously known as “GP3”) will have a similar
design to CEDAR, and it is anticipated that CEDAR and GRAHAM together will provide features
for workload portability and resiliency. Both will have a small OpenStack partition, and both
include local storage on nodes. Anticipated specifications for GRAHAM include over 20,000 CPU
cores across a diverse set of node types, including GPU nodes. The system is anticipated for
deployment in early 2017.
University of Toronto: The NIAGARA system (previously known as “LP”) will be deployed by
approximately mid-2017, anticipated to have some 66,000 CPU cores1. This will be a balanced,
tightly coupled high performance computing resource, designed mainly for large parallel
workloads.
1
All future plans for nodes, CPUs and other specifications are intended as conservative estimates. CPU core counts are
based on “Haswell” technology
Compute Canada Technology Briefing - November 2016
5
Figure 3.2: Stage 1 National Hosting Sites
National Data Cyberinfrastructure
A new national data cyberinfrastructure (NDC) will span all Stage 1 and Stage 2 sites, providing
a variety of data access mechanisms and performance levels. Major components of the NDC
were purchased in late 2016, and will be expanded over time.
A. Storage Building Blocks (SBBs). Commodity storage systems that are flexible,
configurable, and will evolve over time as technology improves.
a. aProvider: Scalar Decisions, Inc. (Toronto).
b. Technologies: SBB systems from Seagate and Dell.
c. Configurations to be provided: Multiple, for different performance tiers and
capacities.
B. Object Storage Software. To provide automated, efficient data replication across the widearea network, S3-compatible interface to data objects, and POSIX-style access to object
storage.
a. Provider: DDN Storage.
b. Technologies: Web Object Storage (WOS) software.
c. Configurations to be provided: Software to be installed at all four Stage 1 sites, and
other future technology hosting sites.
C. Backup capabilities. To provide cost-efficient bulk storage of data copies, including
archives and near-line storage.
a. Provider: IBM Canada and Glasshouse
b. Technologies: Spectrum Protect software; TS3500 tape silos and LTO7 tapes+drives;
supporting infrastructure systems
c. Configurations to be provided: Multi-site redundant backups to SFU & Waterloo;
other configurations and uses as needed.
D. Parallel filesystem software. To provide persistent filesystem-based capacity on SBBs.
a. Provider: TBD (RFP closed in October 2016).
Per-site deployed usable online capacity that is
likely to be based on software-defined storage
software and SBBs:
2016
2017
2018
2019
2020
Total deployed online capacity (PB) across sites
40
62
100
150
225
Number of national hosting sites
4
4
TBD
TBD
TBD
Table 3.1: Persistent online storage capacity projections
The national data cyberinfrastructure capacity does not include temporary storage on
computational systems (i.e., /scratch). NDC systems will be linked via a high-speed network,
described below. In addition to capacity growth, other components are under consideration
for the NDC. These may include capacity management systems, to automatically migrate
data among performance tiers, from online to nearline and back again. Mechanisms for data
management, information lifecycle management, and data resiliency may also be of interest.
Compute Canada strives to achieve the maximum value from its investments, and may seek to
balance purchased solutions with self-developed or self-supported solutions.
Networking
Wide-area connectivity among sites is undergoing major upgrades in capacity and features.
Each Stage 1 and Stage 2 site will be connecting to the CANARIE wide-area research network,
via regional area networks, at 100Gb or greater speeds. A Science DMZ design will be used to
enable rapid and reliable transit of data among sites. Key uses of the new network will include:
ņņ Stage-in and stage-out of data sets for HPC computations;
ņņ Backups, including redundant multi-site backups;
ņņ Data replication, notably via WOS for access via S3;
ņņ Cross-mounted filesystems for ease of access to persistent data, such as /project;
Compute Canada Technology Briefing - November 2016
7
ņņ Workload portability, including for virtual machine migration among hosts, and for
metascheduler placement of HPC jobs;
ņņ Guaranteed quality of service and availability for license servers and other critical services.
A single provider of networking equipment will be identified by the end of 2016, to support the
Science DMZ and other networking needs for the hosting sites.
Figure 2: Science DMZ Concept with National Data Cyberinfrastructure Components for Stage 1 Hosting
Institution
Status and Planning for Stage 2 Investments
Compute Canada submitted a proposal on May 20, 2016 to the Canada Foundation for
Innovation (CFI) for the Cyberinfrastructure Challenge 2 Stage 2 competition. Stage 2 has a
similar structure to Stage 1, with a total value of $50 million (including $20M from CFI, $20M
from provinces and partners, and $10M in vendor in-kind). The results of Stage 2 have not yet
been announced publicly. In this section, highlights of the submission are described. The Stage
2 program is planned for implementation over approximately a two-year period, beginning by
mid-2017.
Stage 2 Hosting Site Selection
As with Stage 1, an open solicitation invited proponents for national hosting sites. It is
anticipated that several hosting sites will join the four Stage 1 sites. In addition, Stage 1 sites
were eligible to seek additional Stage 2 funding. Final selection of Stage 2 sites, along with
the specific technology mix and level of investment for each site, will occur during the Stage 2
finalization process.
Stage 2 Components
The systems, storage and software which will comprise Stage 2 build directly on Stage 1, with
enhancements identified during user needs analysis and consideration of Stage 2 proponent site
strengths.
GP1x: OpenStack cloud system. Building on the successes of Cloud East (Sherbrooke) and
ARBUTUS(Cloud West at UVic), the goal is to extend the federated on-premises private research
cloud across Compute Canada. The cloud primarily provides Infrastructure as a Service (IaaS)
and Platform as a Service (PaaS), to a rapidly growing constituency. Software as a Service (SaaS)
is also available, and expected to grow. Cloud federation will benefit users with workload and
storage portability and resiliency, single sign-on and namespace, and a common software stack.
The GP1x component was updated during the hosting site evaluation process and proposal
writing, to instead refer to Elastic Secure Cloud (ESC) systems, which are described in more
detail below.
GPx: Heterogeneous cluster with elastic OpenStack partitions. Clusters with a variety of node
types, including nodes suitable for OpenStack, big memory nodes, and nodes with GPUs;
most nodes will have local storage. The high-performance interconnect might not be fully nonblocking for all nodes, but will have some partitions suitable for multiple jobs of at least 1024
cores. The systems will grow over time as funding allows, including via contributed systems.
The Stage 2 site selection RFP solicited GP4/GP5 hosts, as well as possible expansion of Stage 1
systems at SFU and Waterloo.
Experimental systems: Compute Canada’s strives to provide access to new resource types,
with this in mind, it is envisioned that a number of relatively small experimental systems will
be deployed. These may be purchased, loaned, or developed, over different durations. Some
experimental systems may become production resources, or guide future procurements of larger
systems. Stage 2 hosting proponents could select this as an additional option to the main three
system types. Compute Canada has been in communication with numerous vendors who may
participate in an experimental system program.
A component of experimental systems is commercial cloud hosting. Compute Canada is
often asked about outsourcing to commercial hosting services. To explore this area, Compute
Canada may run an open RFP to select one or more in-Canada cloud providers, and then
work to develop easy mechanisms for users to span their workload among Compute Canada
resources and commercial clouds. Companies with cloud offerings based entirely in Canada
have expressed interest in working with Compute Canada on this initiative. The cost/node for
individual purchases of cloud computing is high (at least 4x greater than Compute Canada’s
in-house systems for retail pricing); therefore, this resource must be deployed carefully. this
may be mitigated via an RFP for bulk purchase and partnership. At the same time, this will add
capabilities of interest from commercial clouds, which tend to be more feature-rich than our
OpenStack environment. Emphasis would be on providing ease-of-use for constituents who wish
to move between Compute Canada’s cloud and a commercial cloud, or in the other direction.
This will include situations where users pay for the commercial cloud capacity themselves, but
Compute Canada enables workload and feature portability.
Compute Canada Technology Briefing - November 2016
9
Deep storage and persistent storage: One new site will be identified to augment SFU
and Waterloo in hosting backups and other nearline storage. These will consist mainly of
tape libraries and associated software and infrastructure, building on the National Data
Cyberinfrastructure procurement outcomes.
Local/regional data caches: There will be relatively small resources to provide local/regional
access to the National Data Cyberinfrastructure. These would be distributed roughly in
proportion to storage need (as readers or as writers), and would expand the efficiencies of largescale procurement and operations from the National Data Cyberinfrastructure to smaller sites.
Services infrastructure: Further investments in the service infrastructure development efforts
funded through Stage 1 are envisioned. Stage 1 investments are focused on personnel to
develop or adapt common services. The philosophy is that if multiple users/groups express a
need for a service, as identified via user surveys or white papers, then Compute Canada should
consider making it a national offering. Investments to-date have included a software partnership
with Globus, to develop Globus data publication services to better serve the needs of the
Canadian research community for data curation, preservation and discovery. Major areas of
effort for services infrastructure include:
ņņ Identification and Authorization Service: Provide common login across systems.
ņņ Software Distribution Service: Version-controlled software distribution to multiple sites.
ņņ Data Transfer Service: To move datasets among collaborators and their repositories.
ņņ Monitoring Service: Track uptime and availability of services and platforms.
ņņ Resource Publishing Service: Current information about available resources.
Elastic Secure Cloud Services
In Compute Canada’s Stage 2 proposal, notions of GP1x, federated cloud sites, and local/
regional data caches were expanded to incorporate elastic secure cloud services. Stage 2 site
selection RFP responses indicated strong need, as well as existing capabilities, for secure
cloud. The main current use case for these services is hosting of health information, including
personally identifiable information (PII). PII is reflected in one of the largest data growth areas
(genetic sequences and brain imaging, which are also major elements of other CFI-funded
projects). Emergent use cases are in the social sciences, where controlled access to datasets is
the norm. Researchers in criminology, labour statistics, and other areas have similar needs.
Compute Canada plans to build a secure multi-tenant environment based on the concepts of
OpenStack cloud and local/regional data caches. The intention is that the same OpenStack
cloud environment as other Compute Canada cloud resources (with the same storage
environment) will implement logical partitioning such that the needed levels of isolation for
data and compute are enforced. This design is informed by the highly successful HPC4Health
implementation by Compute Canada member institutions in Ontario (www.hpcforhealth.ca).
This model will be expanded and enhanced to meet the needs of other provinces. It is proposed
that secure cloud capabilities will be part of all OpenStack systems or partitions on Stage 1 and
Stage 2 sites.
The “elastic secure cloud services” label is chosen to convey several qualities. First, any of
the cloud partitions on GPx systems are intended to be resized as needed in response to
user demand, with allocation of appropriate computational/storage resources. As mentioned
above, all cloud systems will be able to provide a secure environment, via logical partitioning
of compute and storage resources. Such logical partitioning is used by HPC4Health and some
other current implementations by Compute Canada members, and is in most cases adequate
(i.e., physical partitioning and air gaps are not necessary, but separate filesystem mount points
and VLANs are). The secure partitions within a cloud will, generally, be assigned to a particular
tenant (such as a hospital department, or a data analysis research portal). The tenant would
have the needed control over authentication, authorization, logging, etc. Those secure partitions
would also be elastic as needed over time, so that they can expand, shrink, or gain access to a
different resource mix.
Anticipated Level of Investment
The Stage 2 investment is anticipated to have approximately the following financial investment
levels:
SYSTEM/SERVICE TYPE
CASH EXPENSE
NOTES
Deep storage
$5,000,000
One additional deep storage site, plus additional
capacity for the current two sites.
Experimental systems
$1,500,000
Small experimental systems at some Stage 2
sites; modest investment in commercial cloud.
Services infrastructure
Elastic secure cloud (ESC)
GPx
TOTAL
$500,000
$1,500,000
1 FTE for 2 years, plus small purchases of existing
software and/or services.
One standalone ESC site.
$31,500,000
Expansion of one or more GPx systems, and
addition of one or more new GPx systems. All GPx
systems will have ESC partitions.
$40,000,000
Value includes provincial/partner match; does not
include vendor in-kind, which brings the value to
$50M.
Compute Canada Technology Briefing - November 2016
11
ESTIMATED CAPACITY
STAGE 1
Ncores (Elastic Secure Cloud)
(GP1) 8,500.2
5,486.
13,986.
(LP) 66,000.
-
66,000.
(GP2+3) 52,000.
89,250.
141,250.
126,500.
94,736.
221,236.
62.
38.
100.
Ncores (LP)
Ncores (GPx)
Total cores
New persistent storage (PB online)
STAGE 2
TOTAL
In this planning, investment is focused on general purpose computing (i.e., GPx-type systems,
with multiple nodes types), with the addition of a standalone ESC site. The GPx systems
will address the needs of the majority of users/projects, adding needed capacity. The node
configurations of new GPx systems and expansion of Stage 1 GP systems will be adjusted to
reflect early experiences with the Stage 1 systems - for example, it may be desirable to have
larger partitions for tightly-coupled workloads, or to have larger bigmem nodes, or different
configurations for local storage, alternate GPU configurations or quantities, or variations on the
cloud partition sizes or node configurations.2
The strength of the ESC addition is to develop a new model for local/ provincial/ regionallyfocused systems, at a relatively low cost but with very high value. ESC systems highlight
capabilities of CC’s systems and staff, give needed features to stakeholders, and provide onramps to larger computational and storage resources.
Compute Canada’s Need for New Infrastructure
Compute Canada supports a vibrant program of research spanning all disciplines and regions
of Canada. This support is delivered by providing Canadian researchers access to world-class
cyberinfrastructure and expert personnel.
The advanced research computing (ARC) needs of the Canadian research community
are growing. Growth comes from new scientific instruments and experiments, from
cyberinfrastructure use by a broadening list of disciplines, from generation and access to
new datasets and the innovative analysis and mining of those datasets, and from the mutual
reinforcement of technological and scientific advances that inspire researchers to construct
ever-more precise models of the world around us. Canada’s ARC infrastructure requires constant
updating, to keep pace with the needs of its researchers.
2
Includes planned Arbutus expansion in 2017.
Existing Usage Information
Compute Canada has studied usage information from the past 5 years. For example, the chart
below shows CPU usage from 2010 through the end of 2015. CPU usage and allocations of
computational resources are measured in core years, representing a single CPU core’s utilization
for one calendar year.
The different colours show the usage broken down by discipline. The decrease in 2015
was expected, due to a reduction in available compute resources as Compute Canada
decommissioned older systems that exceeded their normal life span (the largest single system
contributing to this supply is now 7 years old). This chart illustrates that a significant number of
different disciplinary areas share the Compute Canada facility, each bringing their own resource
needs.
Figure 5.1: CPU usage by discipline as a function of time
The Compute Canada federation supports a wide range of computational needs on its shared
infrastructure. One way to examine this is through the number of cores used in a single batch
job, which is the dominant method for use of these resources, and the types of jobs for which
resources allocations are granted (contributed systems, cloud systems, platforms & portals, and
other modalities not included here). The chart shows the number of core years used in Compute
Canada resources per year. The colours illustrate the fractions of those core years in bins of coresper-job. It shows, for example, that the largest single category in 2015 is serial or low-parallel
computation (fewer than 32 cores), which represents about 30% of the total. Meanwhile, nearly
50% of CPU consumption in 2015 was by jobs using at least 128 cores.
Compute Canada Technology Briefing - November 2016
13
Figure 5.2: CPU usage binned by number of cores used per job as a function of time.
It should be noted that the size and configuration of Compute Canada’s current systems limits
the ability of Canadian researchers to submit jobs at the largest scales, and this has limited
the growth of the highly parallel bins. Even for the larger resources, queue wait times (via the
“fairshare” workload management policy in effect for most systems) create challenges for
completing large multi-job computational campaigns.
As noted, the overall capacity within Compute Canada is currently inadequate to meet the
growing need of the Canadian research and innovation community. After technical validation,
for 2016 Compute Canada was only able to allocate 54% of the requested computational
resources (down from 85% in 2012) and 20% of GPU requests. With respect to storage, 93% of
requests were granted in 2016, although this was enabled by deferred allocation of storage to asyet-uninstalled Stage 1 resources. Without Stage 1 storage, the allocation rate for storage would
have been 65%.
Projecting Future Needs of the Canadian ARC Community
Extensive community consultation was undertaken to ensure that the Stage 2 proposal was
anchored in the anticipated future needs of the Canadian ARC community. Consultation
included in-person community meetings, online surveys, collection of community white papers,
and user interviews.
The aggregated community need for computational resources has been projected based on this
input. Survey analysis predicts 12x growth in computational need over 5 years, while the white
paper analysis predicts a 7x increase over the same period, with different annual increase rates
among submissions. The chart below shows these need projections assuming an exponential
growth profile, in units of allocatable Haswell-equivalent core years. The shaded band covers
the range between the 7x and 12x 5-year projections. Three supply curves are shown: 1) (red)
assuming only the Stage 1 award, 2) (light blue) assuming the Stage 1 award and success of the
Stage 2 proposal, and 3) (dark blue) projecting a $50M Stage 3 award in 2018.
This chart shows that Stage 1 alone leads to a short-term increase in core count (by about 50%),
followed by a marked decrease based on the decommissioning of older pre-existing systems by
2018. Stage 2 funding will lead to an approximate doubling of allocatable cores by 2019 with
respect to the baseline. Stage 3 funding would be required to allow the supply to approach the
need curve in 2019.
Figure 5.3: Supply and Need projections for compute (in core-years/year) and storage (in PBs).
The aggregated need for storage resources has also been projected. The survey analysis
predicts 19x growth in storage need over 5 years, while the white paper analysis predicts a 15x
increase over the same period. The storage projection chart below shows the 15x-19x range for
the three stages of investment described above. The storage supply and need both represent
allocatable disk storage. However, replication factors, potential future object storage adoption
rates and usage of tape (for nearline storage) to alleviate disk need are difficult to predict prior
to significant Stage 1 storage experiences. As a result, in the chart below we scale raw storage
supply downward by a factor of 1.4 (i.e., we assume approximately 70% disk usage efficiency).
Compute Canada Technology Briefing - November 2016
15
In addition to aggregated need information, the user survey responses revealed specific requests
for additional features, new architectures and special node types. They include requests for:
ņņ Overall increased compute capacity,
ņņ Better support for Big Data use-cases,
ņņ Encrypted cloud storage and other steps to enable research on sensitive datasets,
ņņ Increased access to large memory nodes,
ņņ Specialized resources to support bioinformatics,
ņņ Greater accelerator (e.g. GPU) capacity,
ņņ Better support for interactive and visualization-focused use-cases,
ņņ Better support for long-term data storage and enterprise-class data backup,
ņņ Platforms to support new hardware development (IT and computer engineering-related
research),
ņņ Increased training and improved documentation.
Figure 5.4: Supply and Need projections for compute (in core-years/year) and storage (in PBs).
White paper submissions also revealed a number of emerging trends that help to drive the
technology choices laid out in this proposal. These include need for:
ņņ Large data storage driven by improved instrumentation in genomics, neuroimaging,
astronomy, light microscopy and subatomic physics,
ņņ Large memory nodes (at least 512GB) from astronomy, theoretical subatomic physics,
quantum chemistry, some use-cases in bioinformatics, humanities, some use-cases in
AMO physics, and institutional responses,
ņņ Expanded accelerator capacity (primarily GPUs) from subatomic physics, chemistry,
artificial intelligence,
ņņ Robust, secure storage options from the digital humanities,
ņņ Expanded cloud services from digital humanities and astronomy,
ņņ Expanded capacity for tightly coupled processing, including jobs that exceed 1,024 cores.
There is evidence of need for systems with far larger homogeneous partitions than reflected in
Stage 1 planning for the LP system. This includes researchers who have offshored or outsourced
their computation away from Compute Canada resources. In 2016, the SCINET consortium
(Ontario) contacted 58 Canadian faculty members who run large parallel jobs. Respondents
were primarily users who had submitted at least one job requiring at least 1,024 cores, either
on Compute Canada resources, the 66,000 core Blue Gene/Q at SOSCIP, or international
facilities. Of these, 26 were interviewed to discuss their usage patterns and future needs. If
resources were available today, in total they would use approximately 250,000 cores per year on
a homogeneous, tightly-coupled, large parallel machine - with much larger jobs, and requiring
many more cores, than the LP system planned via Stage 1 for mid- to late-2017 (below). One
individual within the group had already run a 330,000 core job on the Tianhe-2 machine (China),
and expressed need to scale to 1M cores in the future.
Due to lack of current availability in Compute Canada, it can be reasonably assumed that the LP
demand is significantly underestimated and researchers are tailoring their areas of investigation
and ARC usage to maximize their productivity. It is envisioned by Compute Canada that, over
time, larger systems with larger homogeneous partitions will be provided, thereby enhancing
the ability for users to pursue larger-scale investigations.
Compute Canada Technology Briefing - November 2016
17
Information Not Guaranteed
Plans described here will be modified as needed, based on discussions among the hosting
institutions, CFI, Compute Canada and its partners, and provincial funding agencies. There
will be ongoing assessment of anticipated user demand, including for new technologies or
configurations. Consultation will be via the SPARC process described above, as well as through
discussions with funding agencies and their researchers. Planning will also be responsive to any
new information concerning additional funding, the selection of additional hosting sites, shifts
to Canada’s digital research infrastructure strategy, or other factors.
Procurement Processes
All hosting institutions are working with Compute Canada to ensure open and fair acquisition
processes. Resources will be purchased and owned by each site. Formation of specifications,
and evaluation of bids, will be by Compute Canada’s national teams with full engagement by site
procurement officers.
Vision 2020
Compute Canada, as a leading provider of digital research infrastructure (DRI), is taking an
integrated approach to data and computational infrastructure in order to benefit all sectors of
society. As a result of the technology refresh and modernization supported by CFI’s Challenge 2
Stages 1 and 2, world-class Canadian science will benefit from modern and capable resources
for computationally-based and data-focused research.
Compute Canada is cooperating with government funding agencies and with other digital
research infrastructure (DRI) providers to provide the world’s most advanced, integrated and
capable systems, services and support for research. Future researchers will have seamless
access to DRI resources, integrated together for maximum efficiency and performance, without
needing to be concerned with artificial boundaries based on different geographical locations or
providers.
By 2020, Compute Canada will offer a comprehensive catalog of resources to support the full
data research cycle, allowing researchers and their industrial and international partners to
compete at a global scale. In cooperation with Canada’s other DRI providers, Compute Canada’s
systems and services will facilitate workflows that easily span different resources: from the lab
or campus, to national computational resources, analytical facilities, publication archives, and
with collaborators. Local support and engagement will remain a hallmark of delivering excellent
service to all users. The pathway to this future has begun, with the modernization of Compute
Canada’s national data cyberinfrastructure through the CFI Challenge 2 investments.
155 University Avenue, Suite 302, Toronto, Ontario, Canada M5H 3B7
www.computecanada.ca | www.calculcanada.ca | @ComputeCanada
416-228-1234 | 1-800-716-9417