From Photons to Petabytes

Astronomy of the Next Decade:
From Photons to Petabytes
R. Chris Smith
AURA Observatory in Chile
CTIO/Gemini/SOAR/LSST
Classical Astronomy
still dominates new facilities
•  Even new large facilities (VLT, Gemini,
ALMA, GMT, E-ELT) are and will be
scheduled for “individual projects”
Ø  In units of nights, sometimes hours!
•  But methods are changing…
Ø  Sloan Digital Sky Survey led the way
•  Statistical analyses -> new discoveries
Ø  Surveys and science with massive datasets are
growing, filling an important need
Photons to Petabytes 2014
2
Today’s BIG Questions:
Dark Energy & Dark Matter
Dark Energy is the dominant
constituent of the Universe.
Dark Matter is next.
95% of the Universe is in Dark
Energy and Dark Matter, for which
we have little or no detailed
understanding.
1998 and 2003 Science
breakthroughs of the year,
2011 Nobel Prize
Photons to Petabytes 2014
3
Attacking the Question of
Dark Energy & Others
•  “Classical” approach won’t work
Ø  Not enough telescope time in 2-5 night “chunks”
•  LARGE SURVEYS
Ø  Goal: Provide large, uniform, well calibrated,
controlled, and documented datasets to allow for
advanced statistical analyses
Ø  Larger and broader collaborations provide both
manpower and diverse expertise
•  NEED…
Ø NEW INSTRUMENTS
Ø NEW TELESCOPES
Ø NEW METHODS
Photons to Petabytes 2014
4
Sociology of Dark Energy
•  While Dark Energy is pushing the universe
APART •  But it is pulling the Astronomy, Physics,
Mathematics, and Computer Science
communities TOGETHER
Ø New physics
Ø New facilities, creating LARGE datasets
Ø New access methods (fast networks, databases)
Ø New processing capabilities (h/w & s/w)
Ø New analysis methods, New algorithms AURA Introduction 2014
5
Selected Examples:
Coming soon to nearby mountaintops…
New Instruments (DECam)
New Telescopes (LSST)
Photons to Petabytes 2014
6
Dark Energy Survey (DES)
•  5 year project to improve our understanding of
Dark Energy
Ø  Key DOE/NSF collaboration: Fermilab/NOAO/NCSA
Ø  International collaboration: Brazil, UK, Spain, Germany
•  Characterize Dark Energy with four methods
Ø Supernovae
Ø Weak Lensing (also measure Dark Matter)
Ø Galaxy clustering
Ø Baryon Acoustic Oscillations
•  All depend on careful statistical analyses of large
datasets
Photons to Petabytes 2014
7
Dark Energy Camera
CAMERA:
•  62 2048 x 4096 pixel CCDs
•  570 Megapixel camera
•  The largest focal plane for
astronomy in S. Hemisphere
Optical
Lenses
Photons to Petabytes 2014
8
DECam is here TODAY
•  First light images:
September 12, 2012
•  Fornax galaxy
cluster
Photons to Petabytes 2014
9
Photons to Petabytes 2014
10
A “modest” data challenge
•  Each image 1GB; up to ~1 TB of raw data/night
Ø Data must be moved from Chile to NCSA before next
night begins (<18 hours), preferably in real time
Ø YEAR 1: Each image transferred in <120 sec
Ø Data must be processed within <24 hours to inform
next night’s observing: using NCSA resources
Ø YEAR 1: Real-time pipeline processing on Tololo with
LIneA QuickReduce pipeline: robust and reliable Ø Initial processing completed at NCSA in <24 hours,
still with only limited data quality specifications •  TOTAL 5-year project dataset will be ~5 PB
Photons to Petabytes 2014
11
The next step…
ca. 1950 POSS
(Photographic)
ca. 2000 SDSS
(Digital)
ca. 2012 DES
(Digital + Depth)
Photons to Petabytes 2014
ca. 2020 LSST
(Digital Sky
+Time Domain)
12
Next Step = LSST:
Creating a “Digital Universe”
•  8.4 M Telescope
–  3.5 Degree Field Of View
–  Telescope Located in Chile on
Cerro Pachón
•  3.2 Billion Pixel Camera
•  ~40 Second Cadence
–  Two 15 second exposures
–  Full sky coverage every few
nights
•  Advanced Data
Management Systems
•  Public Data
–  Alerts of new events
–  Catalogs of object
–  Archives of images
LSST is designed to image the whole sky
every few nights for 10 years, giving us
a movie-like window into our dynamic
Photons to Petabytes 2014
Universe.
13
The Large Synoptic Survey Telescope –
Massively Parallel Astrophysics
Survey the entire sky every 3-4 nights, to
simultaneously detect and study: Ø  Dark Matter via Weak gravitational lensing
Ø  Dark Energy via thousands of SNe per year
Ø  Potentially hazardous near earth asteroids
Ø  Tracers of the formation of the solar system
Ø  Fireworks in the heavens – GRBs, quasars… Ø  Periodic and transient phenomena
Ø ...…the unknown
Photons to Petabytes 2014
14
Why is the LSST so unique?
Primary Mirror Diameter
Field of
View
0.2 degrees
Gemini South
Telescope
8 m
3.5 degrees
(Full moon is 0.5 degrees)
LSST
8.4 m
Photons to Petabytes 2014
15
Telescope and Site
30 m diameter dome
1.2 m diameter
atmospheric telescope
Control room and heat
producing equipment
(lower level)
1,380 m2 service and
maintenance facility
Base Facility
350 ton telescope
Includes the facilities, and hardware to collect the
light, control the survey, calibrate conditions, and
support all LSST summit and base operations.
Photons to Petabytes 2014
17
Camera
•  3.2 Gigapixel science array – 10 square degree FOV!
•  Wavefront and guide sensors
•  2 second readout
•  5 filters in camera
Utility Trunk—houses
support electronics and
utilities
Cryostat—contains focal
plane & its electronics
L3 Lens
1.65 m
(5’-5”)
Filter
Focal plane
L2 Lens
L1 Lens
Camera ¾ Section
Photons to Petabytes 2014
18
Petascale Data Management
•  Each image roughly 6.5GB
•  Cadence: ~1 image every 18s
•  15 to 18 TB per night, 30TB “reduced”!
Ø  ALL must be transferred to NCSA archive center
•  within image timescale (17s), >>10 Gbps
•  REAL TIME reduction, analysis, & alerts
Ø  Send out alerts of transient sources within 60s
•  ~2 million events per night every night for 10 years
Ø  Provide automatic data quality evaluation, alert to
problems
Ø  Change survey observing strategy on the fly based on
conditions, last field visited, etc.
Photons to Petabytes 2014
19
LSST:
“Data Science” in real time
TRANSIENT SCIENCE (Data Stream)
Ø  >3 Terabytes per hour (reduced) that must be mined in real
time for alerts.
Ø  20 billion objects will be monitored for important
variations in real time.
Ø  ~2 million events per night every night for 10 years
New approaches must be developed for knowledge
extraction in real time
NON-TRANSIENT SCIENCE
Ø  >1010 objects in a 20 PB final database catalog, backed by
a 100 PB final image archive
New approaches to data mining needed to sift through data
to identify samples, or individual objects, of interest
Photons to Petabytes 2014
20
Data Management Sites and Centers
HQ Site
Archive Site
Archive Center
HQ Facility
French Site
Proposed Center
Alert Production
Data Release Production
Calibration Products Production
EPO Infrastructure
Long-term Storage (copy 2)
Observatory Management
Science Operations
Education and Public
Outreach
Data Access Center
Data Access and User Services
Summit Site
Summit Facility
Base Site
Telescope and Camera
Data Acquisition
Crosstalk Correction
Base Facility
Long-term storage (copy 1)
Data Access Center
Data Access and User Services
Photons to Petabytes 2014
21
LSST Data Management:
Baseline Solutions
•  High-speed connectivity
Ø  Mountain to Base: >100 Gbps
Ø  Base to Archive: >10 Gbps (hopefully 100Gbps)
Ø  Archive to User: variable, UI challenge
•  Supercomputer processing & storage
Ø  Base in La Serena, NCSA, Others? (France?, Brazil?)
Ø  100 PB final image archive
Ø  Distributed (Grid) analysis facilities
•  Petascale DB (~20 PB final catalog)
Ø  Based on open source RDBMS
Photons to Petabytes 2014
22
LSST:
Strategic Partnerships
•  Distributed Computing Systems
Ø  Supercomputer center(s) to provide bulk storage, large
scale processing (e.g., NCSA, NLHPC in Chile)
Ø  Grid processing, storage, advanced DB
Ø  Data Access for member countries/institutions
•  Connectivity
Ø  High-speed Chilean bandwidth (REUNA)
Ø  International bandwidth (AmLIGHT, RedCLARA)
•  Scientific Analysis Challenges: Data Mining &
Astro-Informatics or Astro-Statistics
Ø Separating small signals from systematic effects
Ø  Automatically finding unique objects: one in billions
Photons to Petabytes 2014
23
LSST Outreach Data will be used in
classrooms, science museums, and online
Classroom Emphasis on:
•  Data-enabled research
experiences
•  Citizen Science •  College classes
•  Collaboration through
Social Networking
Photons to Petabytes 2014
24
Integrated Project Schedule with Key Milestones
FUNDING STARTS
NOW!
The Science of Big Data
•  Data growing exponentially, in all sciences
•  Changes the nature of science
from hypothesis-driven to data-driven discovery
•  Cuts across all sciences
•  Industry and government face the same challenges
•  Convergence of physical and life sciences through Big Data (statistics and computing)
•  A new scientific revolution
Photons to Petabytes 2014
26
27
Construction NOW
First light in 2019
Operations in 2022
DOE/NSF Joint Interface and Management Review • Tucson, Arizona • May 30-­‐June 1, 2012