Measuring Galaxy Shapes for Cosmology

Laboratoire d'Astrophysique
Ecole Polytechnique Fédérale de Lausanne
Switzerland
http://lastro.epfl.ch
Measuring Galaxy Shapes for Cosmology
Impact of Big Data & Machine Learning
Marc Gentile, Sept. 29, 2016
Standard cosmology: successes and challenges
standard model is very successful in reproducing the
• The
dynamics and observed structures of the Universe
• But it is confronted to the evidence for a “dark” Universe
Planck Collaboration
Credit: NASA, ESA and R. Massey
2
Weak Gravitational Lensing and the Cosmic Shear
from distant sources is slightly deflected by the
• Light
gravitational field of intervening matter along the line of sight
main applications
• TwoStudy
dark matter on various scales
-
Constrain cosmology: using “Cosmic Shear”
Shear”: weak lensing of background galaxies by
• “Cosmic
foreground large-scale structures
NASA/ESA
...measure the slight
modification of observed
galaxy shapes for ~2 billions of galaxies
Unknown
Original galaxy
Magnification
Shear
3
Measuring the Cosmic Shear: difficult!
Measuring gravitational shear
fr
From H. Hoekstra
Cartesian components of galaxy ellipticity
Observed galaxies: small, blurred, noisy, undersampled!
• 'noise bias' in ellipticity estimates
→ need to calibrate on simulations
4
Measuring the Cosmic Shear: challenges
shear measurement is difficult
• Accurate
Billions of galaxies / large sky areas to reduce statistical errors
- Very weak signal, easily corrupted by many systematic effect
=> requires extremely tight control on systematic errors
lowand
S/N
ratio
- Observed galaxies are faint, under-sampled,
4.1. Principles
challenges
- An inverse problem: no unique solution
Intrinsic Galaxy
(shape unknown)
Gravitational Lensing
causes a shear g
Image is convolved
with the PSF
Image produced by
detector is pixelated
Various sources of noise
degrade the image
Bridle et al., 2008
5
Shear measurement methods: main approaches
• Estimation from second-brightness moments
-
KSB/KSB+
• Analytical decomposition of galaxy images/shapes
-
Shapelets, Reglens
• Fit parametric model to galaxy profile
- im3shape, lensfit, StackFit, gFit
community: STEP, GREAT initiatives to help
• Lensing
improve algorithms through blind simulations
6
Typical shear measurement issues
Mandelbaum, Rowe, et al.
•
Shear measurement biases
•
Applying the method on real data
, Heymans et al. 2012; Hamana et al. 2013). In cont,
the PSF due
to et
theal.optics varies relatively slowly
ndelbaum,
Rowe,
time. The optical PSF is commonly described as
mbination
of di↵raction plus aberrations (possibly
n conoslowly
quite high order). Both the atmospheric and opPSF
bed ashave some spatial coherence, qualitatively like
ng shear, though the scaling with separation is not
ossibly
tical.
nd ophe
ly e↵ect
like of the PSF on the galaxy shapes that we
to
measure isMandelbaum
twofold: first,
et al., applying a roughly ciris not
r blurring kernel tends to dilute the galaxy shapes,
ing them
hat
we appear rounder by an amount that depends
he ratio
Fig. 3.— Real galaxies from the HST as observed by the Adhly
cir- of galaxy and PSF sizes. Correction for this
tion
can easily be a factor of 2 for typical galaxies,
Camera for Surveys (ACS) in the COSMOS survey (KoekeKacprzak vanced
et al, 2013
hapes,
moer
et
al. 2007; Scoville et al. 2007b,a). The top left shows a
which we wish to measure shears to 1%. Second,
epends
Figure 1. A demonstration of the concept
of noise and
modelis
biaswell-fit
interaction. by
Left-hand
side postage
stamps show the images
which
are going
to be fit with
galaxy
that
a simple
parametric
model
from
Lackner
small
PSF anisotropiesFigure
can1. Aleak
intoofRight-hand
demonstration
the conceptside
of noise
andstamps
modelare
biastheinteraction.
postage
stamps
show
the
imagesthe
which
areof
going
to
be fit with
a parametric
galaxy model.
postage
images
ofLeft-hand
best fittingside
models.
In
this work
wegalaxy
compare
results
two
simulations.
&
Gunn
(2012).
The
bottom
left
shows
a
that
is
reasonor
thisbut coherent
Fig.
3.—
Real
galaxies
from
the
HST
as
observed
by
the
AdaSimulation
parametric1 galaxy
model.
Right-hand
postage
stamps are
the bias,
images
of best
models.
In this jointly
work we
the results
of two simulations.
uses
the
real galaxy
imageside
directly
to measure
noise
model
biasfitting
and their
interaction
in compare
a single step
(blue dot-dashed
ellipse).
galaxy
shapes
if
not
removed,
mimicking
a
lensing
ably
well-fit
but
with
additional
substructure
evident.
The
right
Simulation
uses
the real
image
directly (Koeketo
measure
bias, model
and
theirgalaxy
interaction
jointly
a single
step (blue
dot-dashed
ellipse).
laxies,
vanced Camera for Surveys (ACS)
in 12the
COSMOS
survey
Simulation
first finds
the galaxy
best fitting
parametric
model
to a noise
real galaxy
image bias
(black
solid
cartoon),
and in
create
it’s model
image
(dashed magenta
side
aaareal
true
“irregular”
galaxy
that
iscreate
not
by
al.
Simulation
2 first
findsintroduces
the
best fitting
parametric
model
image
(blackbias
solid
galaxy
cartoon),
it’swell-fit
model
image
(dashed
magenta
ellipse).
This
process
the
model
bias.shows
The
next to
step
is
togalaxy
measure
the noise
using
this best
fittingand
image
as the
true
image.
The
fitsimple
to the
noisy
moer
et
al.
2007;
Scoville
et
al.
2007b,a).
The
top
left
shows
econd,
ellipse).
process by
introduces
theellipse.
model
bias.
next step
is toand
measure
the
noise
bias
thisare
best
fittingThe
image
as the true
image.
The
to the noisy
image sources
isThis
represented
red dotted
In
thisThe
simulation
noise
model
bias
interaction
terms
absent.
difference
of the
results
of fit
Simulations
1
parametric
models
with
⇠
10 using
parameters.
ars in the images
areise↵ectively
begalaxy that
well-fit by apoint
simple
parametric
model
from
Lackner
image
is represented
by red dotted
In thisbias
simulation
noise
and model bias interaction terms are absent. The difference of the results of Simulations 1
and 2 measures
the strength
of noiseellipse.
and model
interaction
terms.
k into
Gunn
(2012).
The bottom
left
shows
a the
galaxy
that
and
2 measures
theof
strength
of noise and
modelisbiasreasoninteraction terms.
blurring by &the
PSF,
and hence
are measures
ensing
ably
well-fit
but
with
additional
substructure
evident.
The
right
. However, the PSF must be estimatedIt from
them
shear
couples
moments
to interaction
the higher-order
sions for the
noise bias, including
terms with the model
is worth
that some
shear
measurement
methodsthe
ap- second
side shows a true “irregular” galaxy
thatto note
is not
well-fit
by simple
sions for the noise bias, including interaction terms with the model
is worth
toFor
note
that
methods
apthen interpolated
to the
positions
of
galaxies.
a some
bias.
ply aItfully
Bayesian
formalism
andshear
use
ameasurement
full posterior(Massey
shear probamoments
et
al.
2007a;
Bernstein
2010;
Zhang &
parametric
models
with
⇠
10
parameters.
es be-of some common methods of PSF
bias.
ply
a fully
Bayesian
formalism 2013)
and use
full posterior
shear probability
(Bernstein
& Armstrong
in asubsequent
analyses.
These
mary
estimation
2011).
bility
(Bernstein
& Armstrong
2013)
in subsequent
analyses.
TheseThis issue is particularly pressing given
methods
are considered
to be free
ofKomatsu
noise
and model
related
biases
of the
interpolation,
see Kitching et al. (2013).
methods
are aconsidered
be free ofthat
noise
and
model
related
biases
and present
promisingtoalternative
approach
to problems
studied
several
state-of-the-art
shape measurement methods
them
present
approach to problems studied
2.1 Shear and ellipticity
inmoments
this
paper.a promising
shear couples the secondand
to alternative
the higher-order
(see Appendix A) are
fitting relatively simple
2.1 based
Shear andon
ellipticity
in thisThis
paper.
2.5. Summary
of e↵ects
paper is organised as follows. Section 2 contains the prinFor a
Cosmic shear as cosmological observable can be related to the
moments
(Massey
et al. 2007a;
Bernstein
Zhang
&thethe
paper
isshear
organised
as2010;
follows.
Section
2models
contains
pringalaxy
or
are
based
on
a
decomposition
into
ciplesThis
of cosmic
analysis.
In Section
3 we present
analytic
Cosmic shearpotential
as cosmological
observable
can be
relatedbato the
gravitational
between the
distant source
galaxy
and
an
mation
ciples
ofparticularly
cosmic
In interaction,
Section 3 weasgiven
present
the analytic
g. 1 summarizes
the main
e↵ects
goisinto
a shear
weak
Komatsu
2011).
Thisthat
issue
pressing
formulae
for noise
andanalysis.
model bias
a generalisation
of
gravitational
potential
between
the
distant
source
galaxy
and an
observernecessarily
(see Bernstein & Jarvis
2002, for reviews).
Ellipticity
is
sis
functions
that
cannot
describe
galaxy
proformulae
for
and model
bias
as a generalisation
of
the noise
biasnoise
equations
derived
ininteraction,
R12. We
present
a toy model for
observeras(see
Bernstein
& Jarvis 2002, for reviews). Ellipticity is
ng observation.
galaxy
image is distorted
as
it
is
defined
a complex
number
that The
several
state-of-the-art
shape
measurement
methods
files
in
detail
(Voigt
&
Bridle
2010;
Melchior
et
al.
2010).
the noise
bias
equations
derived
in
R12.
We
present
a
toy
model
for
problem in section 4. In Section 5 we use the COSMOS sample
defined as a complex number
ected by mass
along
the line-of-sight
from
thethe
galaxy
the
problem
infitting
section
4. Inmodel
Section
5 weand
use simple
the
COSMOS
sample
(see
Appendix
A) are based
on
relatively
to evaluate
noise and
biases,
show
the significance
a − b functions
2iφ
More
complex
decompositions
intoe =basis
often(1)
to
the noise
andWe
model
biases,
show
the significance
a+
− bb e 2iφ,
of evaluate
theon
interaction
terms.
conclude
in and
Section
6.baa
s. This is the
desired
signal.
It isbased
then
further
disgalaxy
models
or are
a
decomposition
into
e = but
eat, the expense(1)
can
describe
more
complex
galaxies,
of the interaction terms. We conclude in Section 6.
a+b
aedweak
by the atmosphere
(forthat
a ground-based
telescope),
where a and b are the semi-major and semi-minor axes, respec- 7
sis functions
cannot necessarily
describe of
galaxy
prointroducing
many
tens
or
>
100
parameters,
making
where and
a and
areangle
the semi-major
and semi-minorbetween
axes, respectively,
φ isb the
(measured anticlockwise)
the xas
it isoptics,files
cope
andin
pixelation
on the
these
efdetail (Voigt
& detector;
Bridle 2010;
Melchior
et al. 2010).
•
-
Images are never perfectly clean, sensitivity to artefacts
Account for color, eliminate galaxy blends, etc.
Processing time per galaxy
-
About 1sec / galaxy, too large even on a parallel cluster
Next-generation Weak Lensing Surveys
•
•
In a few years time massive WL surveys will begin
From Space:
-
ESA Euclid, 1.2m telescope (~2020)
NASA WFIRST, 2.4m telescope (~2020)
-
LSST 8.4m Telescope (~2023)
SKA (Square Kilometre Array) Radio Telescope (~2020)
• From the ground:
•
•
Goal: measure cosmological parameters within 1% error
Will shear measurement methods be ready?
8
Bias Requirements for Euclid-like surveys
true
•
and that the error on the shear, i
i , and the true
obeys a linear relationship of the form
Additive and Multiplicative Biases
i
•
•
true
i
= mi
true
i
+ ci
(i = 1, 2)
ESA Euclid:
Bias requirements
to this
estimate
cosmological
Although
approximate,
metric
is generally
parameters
~1%
literaturewithin
to quantify
the accuracy of shear measure
ods. Table 1 gives approximate upper limits of al
for current, upcoming and planned weak lensing sur
on the calculations done in e.g. Amara & Réfrég
Kitching
et al. (2009); Cropper et al. (2013); Ma
(Euclid)
(2013).
Examples of surveys in the “current” ca
A tenfold improvement in bias required till the launch of
(CFHTLenS) (Heymans et al. 2012), the Sloan D
Euclid telescope in ~2020…
Survey (SDSS) (Lin et al. 2012; Mandelbaum et9 al.
Big Data in Weak Lensing
•
•
•
•
Volume
-
Euclid imaging survey: 15’000 deg2 ~4 times in mult. bands
~10 Petabytes raw space data
Velocity
-
Large Synoptic Telecope (LSST)
20’000 deg2 in 6 bands, ~15 Terabytes / day raw data rate
SKA radio survey: ~3 Terabytes / second
Variety
-
Heterogeneous datasets, ground+space, variable depth,
multiple bands, different cameras, etc.
Veracity
-
Low S/N, missing/incomplete/abnormal data, etc.
10
Big Data challenges for shear measurement
•
•
•
Algorithms currently too slow / high volumes
-
Typical method processes ~1 galaxy per second
Either speed-up algorithms or apply massively distributed
computing techniques (e.g. map-reduce)
Algorithms must cope with data variety
-
Handle missing/incomplete/abnormal information
Exploit value in data from different source/nature
But, Big Data also provides opportunities
-
Accessing more data can potentially yield better accuracy
More high S/N deep data, better calibration => improved bias correction
Applying new technologies/paradigms - Lessons from industry
11
Machine Learning (ML) can help
•
•
•
All methods rely on calibrations for correcting biases
-
Requires expert knowledge in astrophysics
Requires expert knowledge of the method itself
Potential for improvements poorly explored so far
Possibly the use of Machine Learning can help
-
Entirely new approaches for shape measurement
Help improve existing methods: speed, accuracy
Machine Learning for calibration shown to work
Mapping Dark
12
Machine Learning (ML) can help
• Example: “Pure” ML approach based on simulations
Machine Learning as Direct Measurement
13
Machine Learning (ML) can help
• Example: “Hybrid” ML approach: automated bias calibration
Machine Learning as Calibration Scheme
14
Summary
potential of weak lensing and shear measurement for
• High
cosmology
• But accurate shear measurement is extremely difficult
algorithms are still insufficiently accurate to help
• Current
discriminate cosmological model within 1% precision
are too slow, especially with upcoming massive
• Algorithms
surveys (Big Data)
• One can learn lessons from the industry or other fields
Learning techniques can help improving algorithms
• Machine
accuracy and speed
15