Chapter 4 - UvA-DARE - University of Amsterdam

UvA-DARE (Digital Academic Repository)
Reading the maps: Organization and function of chromatin types in Drosophila
Braunschweig, U.
Link to publication
Citation for published version (APA):
Braunschweig, U. (2010). Reading the maps: Organization and function of chromatin types in Drosophila
Amsterdam: Nederlands Kanker Instituut - Antoni van Leeuwenhoek Ziekenhuis
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),
other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating
your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask
the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,
The Netherlands. You will be contacted as soon as possible.
UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)
Download date: 17 Jun 2017
Chapter 4
Systematic protein location mapping reveals
five principal chromatin types in Drosophila cells
Guillaume J. Filion1,*, Joke G. van Bemmel1,*, Ulrich Braunschweig1,*,
Wendy Talhout1, Jop Kind1, Wim Brugman2, Ines de Castro Genebra de Jesus1, 3,
Ron M. Kerkhoven2, Bas van Steensel1,$
Accepted for publication in Cell
*
Equal contribution
1
Division of Gene Regulation and
2
Central Microarray Facility,
Netherlands Cancer Institute, Plesmanlaan 121,
1066 CX Amsterdam, the Netherlands
3
Present address: Genome Function Group,
MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital Campus, Du Cane Road,
London W12 0NN, United Kingdom
$
Corresponding author: [email protected]
111
Abstract
The local protein composition of chromatin is important for the regulation of transcription and
other functions, yet the diversity of chromatin composition and the distribution along chromosomes is still poorly characterized. By integrative analysis of genome-wide binding maps of
53 broadly selected chromatin components in Drosophila cells, we show that the genome is
segmented into five principal chromatin types that are defined by unique, yet overlapping combinations of proteins, and form domains that can extend over >100 kb. We identify a novel silent
chromatin type that covers about half of the genome and lacks classic heterochromatin markers. Furthermore, transcriptionally active euchromatin consists of two distinct types that differ in
molecular organization and H3K36 methylation, and regulate distinct classes of genes. Finally, we
find that the five chromatin types are ordered in a highly non-random fashion along the chromosome arms. These results provide a global view of chromatin diversity and domain organization
in a metazoan cell. .
Introduction
Chromatin consists of DNA and all associated
proteins. The scaffold of chromatin is formed by
nucleosomes, which are histone octamers in a
tight complex with DNA. This scaffold serves as
the docking platform for hundreds of structural
and regulatory proteins. Furthermore, histones
carry a variety of post-translational modifications
that form recognition sites for specific proteins
(Berger 2007; Rando and Chang 2009). The local
composition of chromatin is a major determinant
of the transcriptional activity of a gene; some chromatin proteins enhance transcription, while others
have repressive effects.
Traditionally, chromatin is divided into heterochromatin and euchromatin. This division
was originally based on differential staining by
DNA-binding dyes, visualized by light microscopy.
Extensive biochemical, molecular and genetic
studies over the past two decades have indicated
that a more refined classification is warranted. For
example, in Drosophila at least two types of heterochromatin exist that have distinct regulatory
functions and consist of different proteins.
The first type is marked by Polycomb Group (PcG)
proteins and methylation of lysine 27 of histone
H3 (H3K27). PcG chromatin forms large continuous domains (sometimes more than 100 kb in
length) that encompass one or multiple genes; it is
a repressive type of chromatin that primarily regulates genes with developmental functions (Spar-
112
mann and van Lohuizen 2006; Ringrose 2007).
The second type is marked by Heterochromatin
Protein 1 (HP1) and several associated proteins,
combined with methylation of H3K9. This type
of heterochromatin can also cover large genomic
segments, particularly around centromeres.
Reporter genes integrated in or near HP1 heterochromatin tend to be repressed, but paradoxically
many genes that are naturally bound by HP1 are
transcriptionally active (Hediger and Gasser 2006;
Yasuhara and Wakimoto 2006). Direct comparison
of genome-wide binding maps indicates that PcG
and HP1 heterochromatin are non-overlapping
(de Wit et al. 2007).
HP1 and PcG chromatin illustrate two important
principles of chromatin organization: each type is
marked by unique combinations of proteins, and
can cover long stretches of DNA. But are there
other major types of chromatin that follow these
same principles? For example, is euchromatin
also organized into domains with distinct protein
compositions? Are there perhaps additional types
of repressive chromatin that have remained unnoticed?
In order to construct a “big picture” of chromatin
type diversity and domain organization, we conducted a systematic survey of 53 broadly selected
chromatin proteins and four key histone modifications in Drosophila cells.
We generated genome-wide location maps of each
component, providing a rich description of chromatin composition along the genome. By integrative computational analysis of this large dataset
we identified, besides PcG and HP1 chromatin,
three additional principal chromatin types, which
are defined by unique combinations of proteins.
One of these is a novel type of silent chromatin
that covers ~50% of the genome. In addition, we
identified two types of transcriptionally active
euchromatin that are bound by different proteins
and harbor distinct classes of genes.
A
We constructed a database of high-resolution
binding profiles of 53 chromatin proteins in the
Drosophila melanogaster cell line Kc167 (Fig. 1A;
Supplementary Data File 1). This embryo-derived
cell line has a gene expression signature that resembles that of embryos (Greil et al. 2003).
16200
16400
16600
B
16800
17000
Principal component analysis
MRG15
SU(VAR)3−7
SU(VAR)3−9
HP6
HP1
LHR
CAF1
ASF1
MUS209
TOP1
RPII18
SIR2
RPD3
CDK7
DSP1
DF31
MAX
PCAF
ASH2
HP1c
CtBP
JRA
BRM
ECR
BCD
MED31
SU(VAR)2−10
LOLAL
GAF
CG31367
ACT5C
TIP60
MNT
SIN3A
TBP
DWG
PHOL
PROD
BEAF32b
SU(HW)
LAM
D1
H1
SUUR
EFF
IAL
GRO
PHO
CTCF
PC
E(Z)
PCL
SCE
Genes
PC2
−5
0
5
10
15
20
−15 −10
PC3
0
−10
−5
0
5
10
−20
−10
PC1
10
4
−15
53 chromatin proteins
Genome-wide location maps of 53 chromatin
proteins
Position on chr2L (kb)
16000
C
Results
+
-
−15 −10
−5
0
5
10
15
PC2
type
PC1
PC2
PC3
16000
Hidden Markov model
16200
16400
16600
16800
17000
Position on chr2L (kb)
Figure 1
Overview of protein binding profiles and derivation of the 5-type chromatin segmentation. (A) Sample plot of all
53 DamID profiles (log2 enrichment over Dam-only control). Positive values are plotted in black, negative values
in grey for contrast. Below the profiles, genes on the top and bottom strand are depicted as lines with blocks
indicating exons. (B) Two-dimensional projections of the data onto the first three principal components. Colored
dots indicate the chromatin type of probed loci as inferred by a 5-state HMM. (C) Values of the first three principal
components along the region shown in (A), with domains of the different chromatin types after segmentation by
the 5-state HMM highlighted by the same colors as in (B).
113
For a representative cross-section of the chromatin
proteome, we selected proteins from most known
chromatin protein complexes, including a variety
of histone-modifying enzymes, proteins that bind
specific histone modifications, general transcription machinery components, nucleosome
remodelers, insulator proteins, heterochromatin
proteins, structural components of chromatin, and
a selection of DNA binding factors (DBFs)
(Supplementary Table 1).
For ~40 of these proteins, full-genome highresolution binding maps have not previously been
reported in any Drosophila cell type or tissue.
While chromatin immunoprecipitation (ChIP)
is widely used to map protein-genome interactions (Collas 2009), large-scale application of this
method is hampered by the limited availability of
highly specific antibodies.
Moreover, at least for some chromatin proteins,
ChIP results can greatly depend on the choice of
crosslinking reagents (Wang et al. 2009) and can be
unreliable for proteins with short residence times
(Gelbart et al. 2005; Schmiedeberg et al. 2009).
We therefore used the DamID technology, which
does not require crosslinking or antibodies. With
DamID, DNA adenine methyltransferase (Dam)
fused to a chromatin protein of interest deposits
a stable adenine-methylation ‘footprint’ in vivo at
the interaction sites of the chromatin protein (van
Steensel et al. 2001; Greil et al. 2006), so that even
transient interactions may be detected (Wolffe
and Leblanc 2000; Ringrose and Paro 2007). Note
that the fusion protein is expressed at very low levels, averting overexpression artifacts. The DamID
profiles of all 53 proteins were generated in duplicate under standardized conditions and detected
using oligonucleotide microarrays that query the
entire fly genome at ~300 bp intervals.
Comparisons to published and new ChIP data
confirm the overall reliability of the DamID data
(Supplementary Figure 1A: ChIP-DamID comparisons; associated with Fig 1), which was also
reported in previous comparative studies (Moorman et al. 2006; Negre et al. 2006).
For reference purposes, we also generated ChIP
maps of histone H3 and the histone marks
H3K4me2, H3K9me2, H3K27me3 and H3K79me3
on the same array platform.
114
Most of the fly genome interacts with non-histone chromatin proteins
Comparison of the DamID profiles for all 53
proteins shows a variety of binding patterns (Fig.
1A). Nevertheless, several sets of proteins exhibit
profiles that are locally or globally highly similar.
Some similarities were anticipated, such as for
PC, PCL, SCE and E(Z), which are all PcG proteins
(Sparmann and van Lohuizen 2006); and for HP1,
SU(VAR)3-9, LHR and HP6, which are part of classic HP1-type heterochromatin (Greil et al. 2007).
We also observe extensive colocalization of Lamin
(LAM), histone H1 (H1), Effete (EFF), Suppressor
of Underreplication (SUUR) and the AT-hook
protein D1, which have not been linked previously
except for LAM and SUUR (Pindyurin et al. 2007).
Prominent is the overlap in the binding patterns of
a large set of approximately 30 proteins including
histone modifying enzymes (e.g. RPD3 and SIR2),
components of the basal transcription machinery
(e.g., CDK7, TBP), and others detailed below.
In order to identify target and non-target loci
for each protein, we applied a 2-state Hidden
Markov Model (HMM) to each individual binding
map (Supplementary Data File 2; Supplementary Methods). This method identifies the most
likely segmentation into “bound” and “unbound”
probed loci. According to the resulting binary
classifications, the genome-wide occupancy by
individual proteins varies broadly, ranging from
about 2% (GRO) to 79% (IAL).
Interestingly, 99.99% of the probed loci is bound
by at least one protein, and 99.6% by at least
three proteins. This indicates that, at least at the
resolution of our maps, essentially no part of the
fly genome is permanently in a configuration that
consists of nucleosomes only. Approximately 1%
of the genome shows extremely high protein occupancy, being bound by 36 to 44 of the 53 mapped
proteins. This observed overlap does not necessarily derive from the simultaneous presence of all
overlapping proteins; it indicates that all overlapping proteins can at least occupy identical sites.
Principal chromatin types defined by combinations of proteins
Next, we used a computational classification
strategy to identify the major types of chromatin,
defined as distinct combinations of proteins that
are recurrent throughout the genome. To identify
such combinations, we initially performed Principal Component Analysis on the 53 quantitative
DamID profiles to reduce the dimensionality of
the data.
We then focused on the first three principal
components, which together account for 57.7% of
the total variance. By projecting the genomic sites
on the principal components, we could distinguish
five distinct lobes in the three-dimensional scatter
plot (Fig 1B). No additional distinct lobes could
be observed upon further inspection of higherlevel principal components. Importantly, the five
groups were also clearly separated when using
the previously defined binary target definitions
(Supplementary Figure 1B), showing that this result is robust to different quantification methods.
We therefore concluded that five major types of
chromatin can be distinguished, and that further
sub-classification is not supported by the data
structure.
Having established that classification into five
types properly summarizes the data, we fitted a
5-state HMM onto the first three principal components. Thus, every probed sequence in the genome was assigned one of five exclusive chromatin
types (Supplementary Methods).
To avoid semantic confusion, and in line with the
Greek word chroma (which means color), we
labeled each of the five protein signatures with a
color (BLUE, GREEN, BLACK, RED and YELLOW).
The HMM classification produced a mosaic pattern of chromosomal domains that vary widely in
length (Fig 1C). We emphasize that this segmentation is purely data-driven, without using any other
knowledge besides the 53 DamID profiles.
The segmentation is generally robust: removal
of any of the proteins except for PC still yields a
5-state classification that is on average 96.7% identical to the model obtained with all 53 proteins. A
detailed analysis of the robustness is summarized
in Supplementary Figure 1C.
Domain organization of chromatin types
The five types of chromatin differ substantially in
their genome coverage, numbers of domains, and
numbers of genes (Fig 2A). We identified a total
of 8,428 domains that typically range from ~1 to
52 kb (5th-95th percentiles) with a median length
of 6.5 kb, although the size distribution depends
on the chromatin type (Fig 2B). 441 domains are
larger than 50 kb, and 155 are larger than 100
kb, with the largest domain being 737 kb. Many
individual domains include multiple neighboring
genes (Fig 2C); the largest number of which within
a single domain is 139 (for a centromere-proximal
GREEN domain). Taken together, these data indicate that the fly genome is generally organized into
large regions that are covered by specific combinations of proteins.
BLUE and GREEN chromatin correspond to
known heterochromatin types
Visualization of the protein occupancy in each of
the five chromatin types (Fig 3A) shows that most
proteins are not confined to a single chromatin
type. Rather, the five chromatin types are defined
by unique combinations of proteins. Importantly,
BLUE and GREEN chromatin closely resemble
previously identified chromatin types. GREEN
chromatin corresponds to classic heterochromatin
that is marked by SU(VAR)3-9, HP1, and the HP1interacting proteins LHR and HP6. As described
previously (Ebert et al. 2006; Greil et al. 2007), this
type of chromatin is prominent in pericentric
regions and on chromosome 4 (Supplementary
Figure 2A, associated with Fig 3).
To further validate this classification, we conducted genome-wide ChIP of H3K9me2, a
histone mark that is predominantly generated
by SU(VAR)3-9 and bound by HP1 (Schotta et al.
2003). Indeed, H3K9me2 is highly and specifically
enriched in GREEN chromatin (Fig 3B).
BLUE chromatin corresponds to PcG chromatin
as shown by the extensive binding by the PcG proteins PC, E(Z), PCL and SCE. Indeed, well-known
PcG target loci such as the Hox gene clusters
are localized in BLUE domains (Supplementary
Figure 2B). Furthermore, genome-wide ChIP of
H3K27me3, the histone mark that is generated
by E(Z) and recognized by PC (Sparmann and
115
4
van Lohuizen 2006) is highly enriched in BLUE
chromatin (Fig 3B). We emphasize that these histone modification profiles serve as independent
validation because they were not used in the
5-state HMM classification. The fact that two major well-known chromatin types were faithfully
recovered indicates that our chromatin classification strategy is biologically meaningful.
both. For example, moderate degrees of occupancy of the histone deacetylase (HDAC) RPD3
occur in both BLUE and GREEN chromatin, in
accordance with known biochemical and genetic
interactions of RPD3 with PcG proteins as well as
SU(VAR)3-9 (Czermin et al. 2001; Tie et al. 2003).
The presence of EFF in BLUE chromatin is
consistent with a reported role of this protein in
PcG-mediated silencing (Fauvarque et al. 2001).
Furthermore, LAM interacts extensively with
Interestingly, we identified several additional
proteins that mark BLUE or GREEN chromatin, or
Genome coverage
Number of domains
All genes
Silent genes
117 Mb
8428 domains
15145 genes
4229 silent genes
B
C
D
Number of genes per domain
mRNA expression
200
600
400
0
0
400
0
100
0
500
0
10
20
30
40
Domain length (kb)
>50
1500
gene count
0
0
300
count
0
0 200
200
0
0 250
0
0
40
0
200
0
200
0
count
20 40
80
Length of domains
0
1
2
3
4
5
6
7
8
9 10 >10
Number of genes / domain
no tags
A
−1
0
1
2
3
log10(RNA tag count)
Figure 2
Characteristics of the five chromatin types. (A) Coverage and gene content of chromatin domains of each type.
The chromatin type of a gene is defined as the chromatin type at its transcription start site (TSS). Grey sectors
correspond to genes whose TSS maps at the transition between two chromatin types. Silent genes have an average
RNA tag count below 1 per million total tags (see (D)). (B) Length distribution of chromatin domains, i.e. genomic
segments covered contiguously by one chromatin type. (C) Distribution of the number of genes per chromatin
domain. Because some genes overlap with more than one domain, genes are assigned to a chromatin type based
on the type at the transcription start site. (D) Histogram of mRNA expression determined by RNA tag profiling.
Data are represented as log10 (tags per million total tags). Dashed vertical lines in (B)-(D) indicate medians.
116
4
B
H3K27me3
0
2
1
1
H3K9me2
0
4
−3
log2(H3K9me2 ChIP / H3 ChIP)
SU(VAR)3−9
HP1
LHR
HP6
SCE
PCL
E(Z)
PC
SU(HW)
EFF
LAM
SUUR
D1
H1
IAL
CTCF
BRM
ECR
LOLAL
CG31367
GAF
SU(VAR)2−10
BCD
MED31
PHO
GRO
ACT5C
PROD
SU(VAR)3−7
PHOL
DWG
MNT
TBP
TIP60
MUS209
CAF1
CTBP
JRA
HP1C
CDK7
PCAF
DSP1
ASF1
MAX
ASH2
SIN3A
TOP1
BEAF32B
RPII18
DF31
RPD3
SIR2
MRG15
−1
A
By mRNA high-throughput sequencing we detected no transcriptional activity (< 1 mRNA molecule
per 10 million) for 66% of the genes in BLACK
chromatin, while the remaining 34% have very low
−1
BLACK chromatin covers 48% of the probed
genome and is thus by far the most abundant type
(Fig 2A). With a median size of 17 kb and with 134
−2
BLACK chromatin is the prevalent type
of silent chromatin
domains larger than 100 kb, BLACK chromatin domains tend to be longer than domains of the four
other types (Fig 2B). BLACK chromatin is overall
relatively gene-poor (Fig 2A; compare genome
coverage and number of genes), but it nevertheless
harbors 4,162 genes.
log2(H3K27me3 ChIP / H3 ChIP)
BLUE chromatin (consistent with the reported
overlap of Lamin binding and H3K27me2 in
human cells (Guelen et al. 2008)), but not with
GREEN chromatin.
H3K4me2
2
1
0
−1
log2(H3K79me3 ChIP / H3 ChIP)
2
1
0
−1
−2
−2
log2(H3K4me2 ChIP / H3 ChIP)
3
3
H3K79me3
0.0
−0.5
−1.0
log2(H3 ChIP / input)
0.5
Histone H3
0
0.5
1
Fraction of bound loci
Figure 3
Chromatin types are characterized by distinctive protein combinations and histone modifications. (A) Fraction of
all probed genomic loci within each chromatin type that is bound by each protein. Bound loci were determined
separately for each protein as described in the text. (B) Levels of histone H3 and four histone modifications as
determined by genome-wide ChIP. The distribution of values is shown as “violin plots”, which are symmetrized
density plots of binding values per chromatin type: the wider the violin, the more data points are associated to
that value. Dashed horizontal lines indicate the median binding value for each chromatin type. Histone modification ChIP data were normalized to H3 occupancy.
117
activity (Fig 2D). This is in agreement with the low
coverage of BLACK chromatin by RPII18, a subunit
shared by all three RNA polymerases (Fig 3A) and
a lack of the active histone marks H3K4me2 and
H3K79me3 as detected by ChIP (Fig 3B).
We note that the majority of silent genes in the
genome is located in BLACK chromatin (Fig 2A).
Thus, BLACK chromatin is a distinctively silent
type of chromatin that covers a large part of the
genome.
BLACK chromatin is almost universally marked by
four of the 53 mapped proteins: histone H1, D1,
IAL and SUUR, while SU(HW), LAM and EFF are
also frequently present (Fig 3A). We analyzed the
fine distribution of these proteins in more detail.
Close-up views show that H1, D1, IAL, SUUR and
LAM have a broad distribution within BLACK
domains, while SU(HW) exhibits a distinct, more
focal pattern (Fig 4A).
Averaged profiles centered around the 5’ ends of
genes show that within BLACK chromatin these
proteins display an overall enrichment without a
clear preference for upstream regions, promoters,
or transcription units (Fig 4B). In the combined
other chromatin types these proteins are depleted
from promoter regions and to a lesser extent from
gene bodies.
This is less pronounced for IAL, which may be
related to the fact that this protein binds primarily
to mitotic chromosomes (Giet and Glover 2001).
In comparison, TBP, a protein that is not enriched
in BLACK chromatin, is depleted at genes within
BLACK chromatin and exhibits local enrichment
at promoter regions (Fig 4B). Together, these results indicate that BLACK domains consist of relatively homogeneous stretches of chromatin bound
by H1, D1, LAM, IAL, SUUR and EFF, interspersed
by focal sites occupied by SU(HW).
Given that genes in BLACK chromatin in embryonic Kc167 cells are expressed at very low levels,
we investigated whether these genes are perhaps
activated later in development. Indeed, a survey
of expression profiling data (Stolc et al. 2004)
indicates that most genes in BLACK chromatin become active at a later stage (Fig 4C). This suggests
that BLACK chromatin is under strong developmental control.
Consistent with this notion, we found that BLACK
118
chromatin is particularly rich in highly conserved
non-coding elements (HCNEs) (Fig 4D), which are
thought to mediate gene regulation (Engstrom et
al. 2007). The density of HCNEs in BLACK chromatin is comparable to that in BLUE chromatin,
which harbors many developmentally regulated
genes (Tolhuis et al. 2006), and is much higher than
in the other three chromatin types. Together, this
suggests that BLACK chromatin is under strong
developmental control.
YELLOW and RED chromatin are two distinct
types of euchromatin
In contrast to BLACK and BLUE chromatin,
RED and YELLOW chromatin have hallmarks of
transcriptionally active euchromatin: Most genes
in these two chromatin types produce substantial
amounts of mRNA (Fig 2D), and levels of RNA
polymerase (Fig 3A), H3K4me2 and H3K79me3
are typically high, whereas levels of H3K9me2 and
H3K27me3 are low (Fig 3B).
RED and YELLOW chromatin share various
chromatin proteins (Fig 3A). Among these are the
HDACs RPD3 and SIR2, as well as the RPD3-interacting protein SIN3A. HDACs have recently also
been found in transcriptionally active chromatin
in human cells (Wang et al. 2009).
Other proteins that are highly abundant in both
RED and YELLOW chromatin include DF31, a
little-studied protein that drives chromatin decondensation in vitro (Crevel et al. 2001); ASH2, a
homolog of a subunit of a H3K4 methyltransferase
complex in yeast and vertebrate cells (Nagy et al.
2002) and MAX, a DBF that is part of the MYC
network of regulators of growth and proliferation
(Orian et al. 2003).
Besides these similarities, RED and YELLOW chromatin display striking differences. RED chromatin
is abundantly marked by several proteins that
are mostly absent from the four other chromatin
types (Fig 3A). Among these are the nucleosome
remodeling protein Brahma (BRM); the regulator of chromosome structure SU(VAR)2-10; the
Mediator subunit MED31; the 55 kDa subunit
of CAF1, which participates in various histonemodifying complexes (Martinez-Balbas et al. 1998;
Tie et al. 2001); and several DBFs including the
ecdysone receptor (ECR), GAGA factor (GAF), and
Jun-related antigen (JRA).
log2(Dam−fusion/Dam)
−1.0 −0.5
0.0
0.5
1
−1
−3
0.5
log2(Dam−fusion/Dam)
−1.0 −0.5 0.0
0.5
log2(Dam−fusion/Dam)
−1.5
−0.5
0.5 1.0
log2(Dam−fusion/Dam)
−0.8 −0.4
0.4
0.0
EFF
−5
0
5
Relative position from 5' end (kb)
16000
16100
16200
16300
16400
16500
Position on chr2R (kb)
−5
0
5
Relative position from 5' end (kb)
TBP
4
−5
0
5
Relative position from 5' end (kb)
early
late
embryo embryo
60
40
0
−0.4
20
−0.2
0.0
HCNEs per MB
80
0.2
100
D
Genes in BLACK chromatin
Other genes
SU(HW)
IAL
−5
0
5
Relative position from 5' end (kb)
log2(Dam−fusion/Dam)
−0.5 0.0 0.5 1.0 1.5
Type
LAM
−5
0
5
Relative position from 5' end (kb)
log2(Dam−fusion/Dam)
−0.6
−0.2
0.2
−1.5
0.5
0.0
−1.0
0.0
2 −1.0
SU(HW)
Genes
Mean expression (log2)
D1
SUUR
−5
0
5
Relative position from 5' end (kb)
−5
0
5
Relative position from 5' end (kb)
IAL
1
log2(Dam−D1/Dam) log2(Dam−SUUR/Dam) log2(Dam−H1/Dam)
−1.5
−1
0
1
LAM
EFF
C
H1
−5
0
5
Relative position from 5' end (kb)
D1
−1
log2(Dam−SU(HW)/D.) log2(Dam−IAL/Dam) log2(Dam−EFF/Dam) log2(Dam−Lam/Dam)
SUUR
log2(Dam−fusion/Dam)
−2.0
1.0
−1.0
0.0
B
H1
log2(Dam−fusion/Dam)
−1.0
0.0 0.5 1.0 1.5
A
larva
pupa
adult
male
GREEN BLUE
BLACK
RED YELLOW
Figure 4
BLACK chromatin is pervasively bound by distinctive proteins and contains silent, but regulatable genes. (A)
Sample plots of binding profiles of the six proteins that are the most prevalent in BLACK chromatin. Genes on
both strands as well as chromatin types are depicted below the profiles. (B) Average binding of the 6 proteins present in BLACK chromatin and TBP around transcription start sites of genes lying in BLACK chromatin (black line)
and other genes (grey line). Only genes lying entirely within one type have been considered. (C) Average expression
of BLACK genes (black line) and other genes (grey) during fly development (Stolc et al. 2004). For each time point
the values are normalized to the mean of all genes. (D) Density of highly conserved non-coding elements (HCNEs)
per chromatin type.
These differences in protein composition
prompted us to investigate the timing of DNA
replication during S-phase, which is known to
differ between chromatin types (Gilbert 2002).
Analysis of a genome-wide replication timing map
from Kc167 cells (Schwaiger et al. 2009) shows that
DNA in RED and YELLOW chromatin is generally
replicated early in S-phase, as may be expected for
euchromatin. However, RED chromatin tends to
be replicated even earlier than YELLOW chromatin
(Fig 5A).
119
A
B
ORC binding
0.5 1.0 1.5 2.0
3
2
1
−1
MRG15
−0.5
log2(Dam − MRG15 Dam)
C
0
ORC enrichment
20
10
0
−10
−20
log2 (Early / Late)
4
Replication timing
−5
0
5
Relative position from 5' end (kb)
D
−5
0
5
Relative position from 3' end (kb)
3
2
1
0
−1
log2(H3K36me3 input)
4
H3K36me3
−5
0
5
Relative position from 5' end (kb)
−5
0
5
Relative position from 3' end (kb)
Figure 5
RED and YELLOW are two distinct types of euchromatin. (A) Violin plots of replication timing per chromatin type
(Schwaiger et al. 2009). (B) Violin plots of origin of replication complex 2 (ORC2) binding per chromatin type
(MacAlpine et al. 2010). (C) Average binding of MRG15 around 5’ and 3’ ends of genes in RED and YELLOW
chromatin. Left panel, alignment to 5’ end; right panel, alignment to transcript 3’ ends. Only genes that are entirely
within one chromatin type are depicted. (D) Average enrichment of H3K36me3 (Bell et al. 2010), plotted as in (C).
This coincides with a strong enrichment of origin
recognition complex (ORC) binding in RED chromatin as mapped by ChIP (MacAlpine et al. 2010)
(Fig 5B), suggesting that DNA replication is often
initiated in RED chromatin.
120
These observations further underscore that RED
and YELLOW chromatin are distinct types of
euchromatin.
Active genes in YELLOW but not RED chromatin
carry H3K36me3
Only one protein of the dataset is abundant in
YELLOW but not in RED chromatin: MRG15,
which is a chromodomain-containing protein
(Leung et al. 2001). Because human MRG15 has
previously been reported to bind H3K36me3
(Zhang et al. 2006), we compared the fine distribution of MRG15 and H3K36me3 along genes within
the two chromatin types (Bell et al. 2010). Indeed,
both are highly enriched along genes in YELLOW
chromatin, but nearly absent from RED chromatin
(Fig 5C,D).
These data are consistent with binding of MRG15
to H3K36me3 in vivo. Interestingly, H3K36me3
was previously thought to be a universal marker of
elongating transcription units (Lee and Shilatifard
2007; Rando and Chang 2009). Our analysis reveals
that, at least in Drosophila Kc167 cells, this histone
mark is mostly absent from genes lying in RED
chromatin, even though these genes are expressed
at similar levels as genes in YELLOW chromatin
(Fig 2D).
RED and YELLOW chromatin mark
different types of genes
The substantial differences between RED and
YELLOW chromatin suggested that the genes they
harbor may be regulated by two globally distinct
pathways. We therefore investigated whether
genes located in RED and YELLOW chromatin
have different characteristics. We began by comparing the embryonic tissue expression patterns of
genes in the two chromatin types. Strikingly, genes
with a broad expression pattern over many embryonic stages and tissues (Tomancak et al. 2007)
are highly enriched in YELLOW chromatin, while
genes with more restricted expression patterns are
depleted (Fig 6A).
Consistently, Gene Ontology (GO) analysis revealed that annotation terms related to universal
cellular functions such as “ribosome”, “DNA repair”
and “nucleic acid metabolic process” are almost
exclusively found in YELLOW chromatin (Fig 6B),
while genes in RED chromatin are linked to more
specific processes such as “receptor binding”,
“defense response”, “transcription factor activity”
and “signal transduction” (Fig 6C). Taken together,
these results indicate that genes in YELLOW
chromatin tend to be expressed in most cell types,
while the expression of genes in RED chromatin
tends to be restricted to certain cell types only.
Features of RED chromatin point to complex
gene regulation
Several other properties of RED chromatin suggest
extensive gene regulation in these regions. First,
intergenic regions in RED domains contain about
two-fold more HCNEs than YELLOW chromatin
(Fig 4D), although not as much as BLACK and
BLUE chromatin. Furthermore, genome-wide
formaldehyde-assisted identification of regulatory
elements (FAIRE) (Giresi et al. 2007; Braunschweig
et al. 2009) points to a high density of regulatory
chromatin complexes in RED chromatin (Fig 6D).
We find that individual RED chromatin loci can
be locally occupied by as many as 44 of the 53
tested proteins, with an overall median occupancy
of 30 proteins (Supplementary Figure 3, associated with Fig 6). These proteins belong to a wide
variety of functional and structural categories
(e.g., homeobox, leucine zipper, chromodomain,
bromodomain, helix-loop-helix, zinc fingers, BTB/
POZ proteins). In contrast, YELLOW chromatin
has a less complex composition with a median
occupancy of 12 proteins.
We considered that the high occupancy in RED
chromatin is due to intrinsic properties such as
‘openness’ or nucleosome remodeling activity that
would locally facilitate protein binding. Alternatively, proteins might be individually targeted to
RED regions through protein-protein interactions
or other specific mechanisms.
We reasoned that specific targeting mechanisms
should not impact the binding of a non Drosophila DBF to the genome; accordingly, enrichment
in RED chromatin should be observed only under
the first hypothesis. To test this, we generated
a DamID profile for the DNA-binding domain
(DBD) of yeast Gal4. The recognition motif for
this DBD is present at roughly equal densities
among the five chromatin types (data not shown).
We observed no enrichment of Gal4-DBD in
RED chromatin (Figure 6E). This suggests that
RED chromatin does not intrinsically promote
protein-DNA interactions, and that high protein
occupancy is more likely due to specific targeting
mechanisms.
121
4
B
1.0
A
DNA repair
ribosome
structural constituent of ribosome
DNA metabolic process
structural molecule activity
nucleic acid metabolic process
nucleus
intracellular
80
158
0.8
155
177
0.6
726
1098
2926
C
multicellular organismal development
plasma membrane
behavior
transcription factor activity
cellular component movement
receptor binding
extracellular region
proteinaceous extracellular matrix
934
0.4
Fraction of genes
220
285
163
0.2
157
173
74
0.0
120
18
all genes
broad
tissue−specific
0.0
0.2
0.4
0.6
0.8
1.0
Fraction of Genes
E
Gal4 DBD
1.0
0.5
0.0
log2(Gal4 DBD−Dam/Dam)
−0.5
−1.5
1
0
−1
log2 (FAIRE / input)
2
1.5
3
FAIRE
2.0
D
Figure 6
Genes in RED and YELLOW differ in regulation and function. (A) Distribution of genes having “broad” and “tissuespecific” expression patterns (defined in Tomancak et al. 2007) over the five chromatin types. Left bar shows distribution of all genes for comparison. (B)-(C) GO slim categories that are significantly enriched (B) or depleted (C)
in RED compared to YELLOW genes. Bars indicate the fraction of RED and YELLOW genes for the given category
(BLACK, GREEN and BLUE are not considered here). Vertical dotted line represents the distribution expected by
random chance. The total number of RED and YELLOW genes within each category are indicated on the left. (D)
Violin plots of the log2 FAIRE signal per chromatin type (Braunschweig et al. 2009).
Global rules of domain organization.
Finally, to obtain more insight into the relationships between the different chromatin types, we
asked whether the chromatin domains are ordered
in a particular pattern along the chromosome
arms. Specifically, we investigated whether certain
chromatin types have preferential adjacent types.
We calculated for each possible pair of types how
often they are directly adjacent, and compared
this to the frequency that may be expected by
random chance. The resulting observed/expected
122
ratios reveal a remarkable non-random ordering
of chromatin types (Figure 7). In particular, several
combinations are strongly underrepresented as
neighbors, such as BLUE/GREEN, RED/GREEN,
RED/BLACK, and to a lesser extent YELLOW/BLUE.
This suggests that these pairs of chromatin types
are not compatible as neighbors. Other pairs show
significant preferential adjacency, such as YELLOW/GREEN, BLUE/BLACK and BLUE/RED. Thus,
the linear ordering of chromatin domains along
chromosomes is highly non-random.
Green Blue
0.01
0.1 x
0.5 x
1x
2x
Black Yellow Red
obs./exp.
0.70
2.68
0.08
Green
1.65
0.23
1.53
Blue
1.22
0.06
Black
1.39
Yellow
Black
Red
Blue
Green
Red
Yellow
Figure 7
4
Chromatin types are ordered in a non-random manner. Adjacency preferences of all combinations of chromatin
types are represented as lines with widths scaled by the observed/expected frequencies. Associated ratios are
given on the top right. All ratios are significant at p < 0.01 in 1-sided tests on empirical distributions obtained by
randomization (Materials and Methods).
Discussion
Here, we report detailed genome-wide binding
maps of a broad set of 53 chromatin components,
as well as four histone modifications. Systematic
and unbiased integration of the 53 protein maps
indicates that the Drosophila genome is packaged
into a mosaic of five principal chromatin types,
each defined by a unique combination of proteins.
Extensive evidence demonstrates that the five
types differ in a wide range of characteristics
besides protein composition, such as biochemical
properties, transcriptional activity, histone modifications, replication timing, as well as sequence
properties and functions of the embedded genes.
This validates our classification by independent
means and provides important insights into the
functional properties of the five chromatin types.
The number of chromatin states
We emphasize that the five chromatin types
should be regarded as the major types. Some may
be further divided into sub-types, depending on
how fine-grained one wishes the classification to
be. For example, within each of the transcriptionally active chromatin types, promoters and 3’ ends
of genes exhibit (mostly quantitative) differences
in their protein composition (data not shown)
and thus could be regarded as distinct sub-types.
However, these local differences are minor relative
to the differences between the five principal types
that we describe here. The five types are robust to
the choice of mapped proteins (Supplementary
Fig 1C), and establish distinctions that were not
clearly made previously: First, BLACK chromatin
is a silent compartment that is very different from
PcG (BLUE) and HP1 (GREEN) chromatin; second,
YELLOW and RED chromatin are two distinct
types of euchromatin.
123
We cannot exclude that the accumulation of binding profiles of additional proteins would reveal
other novel chromatin types.
We anticipate that the pattern of chromatin types
along the genome will vary between cell types.
For example, many of the inactive genes that are
embedded in BLACK chromatin in embryonic
Kc167 cells, are active in larvae and adult flies (Fig
4C). Thus, the chromatin of these genes is likely to
switch to an active type during development.
While the integration of data for 53 proteins
provides substantial robustness to the classification of chromatin along the genome, a subset of
only five marker proteins (histone H1, PC, HP1,
MRG15 and BRM), which together occupy 97.6%
of the genome, can recapitulate this classification
with 85.5% agreement (Supplementary Fig 1D).
Assuming that no unknown additional principal
chromatin types exist in some cell types, DamID or
ChIP of this small set of markers may thus provide
an efficient means to examine the distribution
of the five chromatin types in various cells and
tissues, with acceptable accuracy. The correspondence between the chromatin states and
histone modifications (Fig 3, 5) suggests that one
would obtain an approximate picture by mapping
H3K9me2, H3K27me3, H3K36me3 and H3K4me2,
but this set lacks exclusive markers for RED and
BLACK chromatin.
BLACK chromatin: a novel type of
silent chromatin
About half of the Drosophila genome is packaged
by BLACK chromatin, which consists of a previously unknown combination of proteins. Essentially all
genes in BLACK chromatin exhibit extremely low
expression levels, indicating that BLACK chromatin either completely lacks activating factors or
constitutes a strongly repressive environment.
The abundant presence of LAM and histone H1,
two proteins previously linked to chromatin compaction and gene repression (Pickersgill et al. 2006;
Reddy et al. 2008; Sancho et al. 2008; Braunschweig
et al. 2009) supports the latter notion. Importantly,
BLACK chromatin is strongly depleted of PcG proteins, HP1, SU(VAR)3-9 and associated proteins,
indicating that BLACK chromatin is different from
previously characterized types of heterochromatin
(which we identified de novo here as BLUE and
GREEN chromatin).
124
The proteins that mark BLACK domains provide
important clues to the molecular biology of this
type of chromatin. Three of these proteins are
essential: loss of LAM, EFF or histone H1 causes
lethality during Drosophila development (Cenci
et al. 1997; Lenz-Bohme et al. 1997; Lu et al. 2009),
suggesting an important role for BLACK chromatin. The enrichment of LAM points to a role of
the nuclear lamina in gene regulation in BLACK
chromatin (Pickersgill et al. 2006).
Indeed, loci that interact with LAM were previously shown to be preferentially located near the
nuclear lamina (Pickersgill et al. 2006; Shevelyov et
al. 2009). Furthermore, depletion of LAM causes
derepression of several LAM-associated genes
(Shevelyov et al. 2009). Recent evidence indicates
that the insulator protein SU(HW) is a regulator
of LAM - genome interactions (J.v.B., U.B, B.v.S,
manuscript submitted).
D1 is a little-studied protein with 11 AT-hook
domains that may contribute to its targeting to
the relatively AT-rich BLACK chromatin (data not
shown). Interestingly, overexpression of D1 causes
ectopic pairing of intercalary heterochromatin
(Smith and Weiler 2010), suggesting a role in the
regulation of higher-order chromatin structure.
SUUR specifically regulates late replication on
polytene chromosomes (Zhimulev et al. 2003)
EFF is highly similar to the yeast and mammalian
ubiquitin ligase Ubc4 that mediates ubiquitination
of histone H3 (Liu et al. 2005; Singh et al. 2009),
raising the possibility that nucleosomes in BLACK
chromatin may carry specific ubiquitin marks.
These insights provide important leads for further
study of this previously unknown yet prevalent
type of silent chromatin.
RED and YELLOW: distinct types of euchromatin
In RED and YELLOW chromatin most genes are
active, and the overall expression levels are similar
between these two chromatin types. However,
RED and YELLOW chromatin differ in many respects. One of the conspicuous distinctions is the
disparate levels of H3K36me3 at active transcription units.
This histone mark is thought to be laid down in
the course of transcription elongation and may
block the activity of cryptic promoters inside the
transcription unit (Li et al. 2007). The absence of
H3K36me3 from genes in RED chromatin is all the
more surprising that these genes are on average
2.4 times larger than genes in YELLOW chromatin
(data not shown). Why active genes in RED chromatin lack H3K36me3 remains to be elucidated.
The genomic organization of the YELLOW and
RED chromatin types is also different. YELLOW
domains typically contain a cluster of genes (Fig
2C) whereas most RED domains contain rarely
more than two genes.
YELLOW chromatin is highly enriched in genes
with a broad expression pattern, i.e., constitutively
expressed genes. Possibly, the aggregation of these
genes into domains of YELLOW chromatin helps
to ensure stable expression of these genes. A similar domain organization of chromatin may explain
the reported clustering of housekeeping genes in
the human genome (Lercher et al. 2002).
A striking feature of RED chromatin is the large
number of proteins that is present, suggesting that
RED chromatin domains are “hubs” of regulatory
activity. This is likely to be related to the predominantly tissue-specific expression of genes in RED
chromatin, which presumably requires an intricate
interplay of many regulatory proteins. Five lines of
evidence led us to reject the possibility that the
high protein occupancy in RED chromatin may
originate from an artifact of DamID, e.g. caused
by a high accessibility of RED chromatin. First, all
DamID data are corrected for accessibility using
parallel Dam-only measurements. Second, several
proteins, such as EFF, SU(VAR)3-9 and histone
H1 exhibit lower occupancies in RED than in any
other chromatin type. Third, FAIRE, which is an
independent technique to identify active regulatory elements, reports an enrichment exclusively
in RED chromatin. Fourth, the ORC binding data
also show a specific enrichment in RED chromatin,
even though these data were acquired by ChIP,
by another laboratory and on another detection
platform (MacAlpine et al. 2010). Fifth, DamID of
Gal4-DBD does not show any enrichment in RED
chromatin.
RED chromatin, with its high occupancy of a
broad variety of proteins, resembles DBF binding hotspots that were previously discovered in a
smaller-scale study in Drosophila cells (Moorman
et al. 2006). Discrete genomic regions targeted
by many DBFs have recently also been found
in mouse ES cells (Chen et al. 2008), hence it is
tempting to speculate that an equivalent of RED
chromatin may also exist in mammalian cells.
Similarities between GREEN and YELLOW
chromatin
GREEN chromatin corresponds to the previously
well-characterized HP1-marked chromatin. Many
genes in GREEN chromatin are expressed (Fig 2D).
This is generally consistent with previous reports
(de Wit et al. 2007; Johansson et al. 2007), despite
the paradoxical fact that HP1-marked chromatin
can cause position effect variegation, i.e., silencing
of certain, susceptible genes when brought in close
proximity of HP1-marked chromatin by chromosomal translocations or transgene insertion
(Girton and Johansen 2008).
It is noteworthy that GREEN chromatin resembles
YELLOW chromatin in several aspects. Aside from
active transcription, both types of chromatin
share the presence of MRG15. Furthermore, both
types lack RED-specific proteins such as BRM,
SU(VAR)2-10 and MED31 (data not shown). These
data indicate that the process of transcription
associated with the YELLOW signature can take
place in the presence of the HP1/H3K9me2 complex in GREEN chromatin, whereas RED chromatin
may be incompatible with the features of GREEN
chromatin.
Non-random ordering may reflect compatibility
of chromatin types.
A striking observation is that the five chromatin
types are ordered in a highly non-random manner
along the chromosome arms (Figure 7). Some
pairs of types are rarely found as neighbors, while
other pairs appear neutral or even show a slight
preference to be neighbors. Interestingly, the
neighbor preferences show striking parallels with
the overall similarities in protein composition
between chromatin types. For example, GREEN
and YELLOW chromatin domains, which have
overlapping protein compositions (as discussed
above) are also preferential neighbors. Likewise,
BLACK and BLUE chromatin domains share several
proteins (Figure 3A) and are frequently adjacent
to each other. Finally, the preferred adjacency of
RED and BLUE chromatin may be linked to shared
proteins, because it has been observed that GAF,
DSP1 and PHO (which are most abundant in RED
125
4
chromatin) also have important functions in PcG
(BLUE) chromatin (Dejardin et al. 2005). Conversely, pairs of chromatin types that differ substantially
in composition, such as RED/BLACK, RED/GREEN
and GREEN/BLUE show strong tendencies not to
be neighbors.
signals in interphase nuclei but were not discarded
because MNT and GRO were successfully mapped
by DamID in previous studies (Orian et al. 2003;
Bianchi-Frias et al. 2004) and IAL binds metaphase
chromosomes (Giet and Glover 2001).
These observations suggest that chromosomes
may have evolved to minimize the occurrence
of adjoining chromatin domains with dissimilar
protein compositions. Proximity of such ‘incompatible’ domains might destabilize one or both
domains and result in loss of robust regulation of
the embedded genes, which could manifest itself
as position effect variegation. Although additional
studies will be needed to further test this model,
our identification of five principal types of chromatin provides a firm basis for future dissection of
the roles of global chromatin organization in gene
regulation.
DamID and Microarrays
Material & Methods
Constructs
DamID constructs used for this study are listed in
Supplementary Table 1. Newly generated constructs were cloned by using TOPO cloning and
GATEWAY recombination as described (Braunschweig et al. 2009) or by Cre-mediated recombination.
We constructed an acceptor vector containing the
Hsp70 promoter upstream of Dam by the Creator
Acceptor Vector Construction Kit (Clontech,
631618). Donor clones in pDNR-Dual vectors
containing the cDNA of interest were obtained
from the Drosphila Genomics Recource Center,
Bloomington. DamID vectors expressing the
Myc-tagged Dam fusion proteins were obtained
by Cre-mediated recombination according to the
CreatorTM DNA Cloning Kits User Manual (Clontech PT3460-1).
Nuclear localization was checked for all Damfusion proteins by immuno-fluorescence microscopy with the 9E10 anti-Myc antibody (Santa Cruz
Biotechnology) after heat-shock induced expression as described previously (van Steensel and Henikoff 2000). Only MNT, GRO and IAL gave weak
126
DamID assays were carried out as described previously (Moorman et al. 2006) with a minor modification: proteins were grouped in sets sharing the
same Dam controls for hybridization purposes. For
each group, 3-5 DamID assays on Dam alone were
carried out in parallel, the product of which was
pooled before labeling. Each profile was the result
of two independent experiments between which
the dye orientation was inverted to minimize dye
bias effects.
Fluorescent labeling was done with Klenow polymerase according to the NimbleGen array user’s
guide, version 4.0. 13 μg labeled DNA per channel,
2.4 μl alignment oligo, 24 μl NimbleGen hybridiziation component A and 1x hybridization buffer in
a total volume of 120 μl were heated to 98°C for 5
min and hybridized in a TECAN hybridization station for 16 hrs. at 42°C. Microarrays were NimbleGen oligo arrays with 385.000 probes (Choksi et
al. 2006). Slides were washed sequentially with
NimbleGen wash buffers I, II and III with 0.1 mM
DTT at 42°C, 25°C, and 23°C, respectively. Slides
were scanned at 5 μm resolution, and raw data
extracted using NimbleScan software.
The identity of the hybridized material was tracked
by the presence of unique oligonucleotide spikes
in each sample. Furthermore, because the Damfusion expression vectors are produced in Dampositive bacteria, small amounts of the transfected
plasmids are co-amplified in the methylation-specific amplification protocol. This leads to a strong
signal in the open reading frame of the mapped
protein, which allows us to verify the identity of
the used vector from the microarray data alone.
Chromatin Immunoprecipitation
All ChIP experiments and the subsequent linear
amplification reactions were performed as described previously (Kind et al. 2008). Labeling and
hybridization were performed as for the DamID
samples. Chromatin was immunoprecipitated
using anti-H3K27me3 (07-449) and anti-H3K4me2
(07-030) from Upstate Biotechnology and antiH3K9me2 (1220), and anti-H3 (1791) antibodies
from Abcam. Histone H1 was immunoprecipitated
with affinity-purified anti-H1 serum (Braunschweig et al. 2009). The H3K79me3 antibody was
kindly provided by Fred van Leeuwen and has
been described in (Schubeler et al. 2004). The
H3K27me3 antibody has been described in (Peters
et al. 2003) to be highly specific, with minor crossreactivity for H3K27me2. The specificity of the
H3K9me2 and H3K4me2 antibodies have been
tested by the manufacturers and are reported
to be highly specific for the respective histonemodifications.
Digital gene expression
Total RNA was isolated from growing Kc cells using
TriZOL (Invitrogen), and remaining DNA was degraded by shearing and DNaseI digestion. Poly(A)
selection, reverse transcription and tag sequencing
was carried out on an Illumina Solexa GAII and
using the tag profiling kit with DpnII. Two RNA
samples were sequenced, yielding a total of 7.4 and
9.0 million reads, respectively. Mapping was carried out by BLAST, requiring at most 2 mismatches
and 11 consecutively matching bases.
Only the tags mapping to the last GATC of a transcript were counted and represented 70.3% and
69.4% of the total number of reads, respectively.
Counts were normalized to the total number of
reads and replicates were averaged. Gene mapping
informations were taken from D. melanogaster
FlyBase release 5.8.
Data analysis
Microarray data normalization and analysis were
performed with R (R Development Core Team
2009). All DamID data were subjected to loess
normalization.
The median correlation between independent
replicate experiments was 0.72. Because Dam can
only methylate adenines in the sequence GATC,
genomic fragments between GATCs (typically
200-300bp in size) constitute the minimal unit of
measurement. These fragments are referred to as
“probed loci” in the text.
Accordingly, binding log-ratios were determined
by averaging the loess-normalized M values (log2
ratio of the channels) of all the microarray probes
mapping to the same probed locus in FlyBase
release 5. The chromatin type of each probed locus
was determined as described in the Supplementary Methods.
Custom R scripts were used to calculate average
binding profiles around 5’ and 3’ ends of genes.
Genomic locations are converted to coordinates
relative to the nearest 5’ (respectively 3’) end of
a gene, before applying a running median with a
window covering 2% of the plotted data. To ensure
that points are aligned only once, windows around
the end of a gene range from the midpoint of the
gene to the mid distance to the next gene.
HCNEs are defined as sequences of at least 50 bp
having at least 98% identity with another species
(Engstrom et al. 2007). Accordingly we mapped
HCNEs by aligning the introns and intergenic
regions of D. melanogaster (FlyBase release 5.17)
to the genome of D. mojavensis (FlyBase release 1)
by exonerate (Slater and Birney 2005).
For GO analysis, the D. melanogaster GO Slim
terms were downloaded from http://www.
geneontology.org/GO_slims/archived_GO_slims/goslim_Drosophila.0200. The obsolete terms were
updated according to the data obtained from
http://www.geneontology.org/
ontology/obo_format_1_2/gene_
ontology_ext.obo (date: 09:11:2009
14:57). Gene associations were downloaded from
http://www.geneontology.org/
cgi-bin/downloadGOGA.pl/gene_
association.fb.gz (CVS version 1.159).
Genes were further associated with all the terms
higher in the GO hierarchy as indicated by the
“is_a” and “part_of” keywords. Enrichment or
depletion for GO terms were tested against the
hypergeometric distribution, at alpha level 0.01
(correcting for multiple testing by the Bonferroni
method).
Expected frequencies of neighborhood of domains
of one type to the four other types were calculated
as relative frequencies of domains of the other
types. To derive significance levels, domains of
one type were retained while randomly permuting type assignments of all other domains 20,000
times. p-values were estimated from the obtained
frequency distributions and corrected for multiple
testing by the Bonferroni method.
127
4
Hidden Markov Models
M values of biological replicates were averaged to
give probe-wise M values, and values from array
probes overlapping with one DpnII restriction
fragment were averaged to a single score. In our experience, after loess normalization, the Student’s t
represents an excellent approximation of the score
distribution.
To identify the binding sites of a protein, we applied a two-state Hidden Markov Model (HMM)
with emissions following a Student’s t distribution.
The parameters were fitted by iteratively maximizing the expected value of the log-likelihood of
the observations with a modified Baum-Welch
procedure.
Instead of the expectation-maximization algorithm, we used an adaptation of the ECME
algorithm that is substantially faster (Liu and Rubin
1994). In short, each cycle consists of two subcycles. In both subcycles, the Forward-Backward
smoothing estimates are computed, and alternatively the means and variances or the degrees of
freedom are updated.
Gaps in the array design were filled by L/(D-1)
virtual positions for which emissions are not available (NA), where L is the gap size and D the mean
distance between consecutive GATC sequences.
When not available, the emission probability was
set to the same value for every state in order not
to discriminate between them. The most likely
sequence of states was computed through the
Viterbi algorithm.
Definition of chromatin types
By compiling the DamID scores of the 53 profiles,
the probed genome can be viewed as a cloud of
points in 53-dimensional space. In this space, two
DpnII restriction fragments are close to each other
if they are bound by the same proteins irrespective
of their genomic coordinates. In this representation recurrent protein binding signatures separate
as distinct lobes of the cloud.
128
We performed Principal Component Analysis
(PCA) to obtain the best representation of the
cloud in lower dimensions. It immediately appeared that at least three principal components
are required to give a qualitative representation of
the data, because a distinct sub-cloud separates in
the third component.
The projection on the first 3 principal components
revealed 5 distinct lobes, and none of the 5 lobes
forked on the fourth or higher principal components. Therefore, we reduced the data to its first 3
principal components. The total variance captured
hereby was 57.8%. By comparison, the median
score of variance captured by applying a twostate HMM to a single DamID profile was 39.4%,
suggesting that the procedure efficiently removes
technical variation from the dataset without causing strong loss of information.
Mapping of chromatin types
Fitting an HMM to the original dataset would
involve estimating 7,441 parameters (5x4 transition parameters, 5x53 means, 5x53 variances,
5x1374 covariances and 1 degree of freedom). We
therefore replaced the dataset with its projections
on the first 3 principal components to reduce
the number of parameters to 66 (5x4 transition
parameters, 5x3 means, 5x3 variance, 5x3 covariances and 1 degree of freedom), representing a
substantial gain in robustness.
To map the chromatin types in the fly genome,
we applied our adaptation of the Baum-Welch
algorithm to the three-dimensional projection of
the data. The initial degree of freedom was set to
6, the initial means of the modified Baum-Welch
algorithm were determined visually. The initial
covariance matrices were set to the covariance
matrix of the 3-dimensional cloud.
Note that the projected DamID values can be
expected to be distributed as a Student’s t variable
because they are weighted averages of the DamID
binding scores.
The output of the modified Baum-Welch algorithm showed some sensitivity to initial parameter values, mainly to the means. For some initial
values, the algorithm failed to identify 5 clearly
separated states. However, there was only one fitted model where the states were clearly separated,
showing that the segmentation is robust to initial
conditions. A detailed description of the HMM
will be provided as supplementary online files. An
implementation of the procedure as an R package
is available upon request.
Acknowledgments
We thank Francesco Russo for help with vector
cloning; Marja Nieuwland and Arno Velds for help
with RNA tag sequencing; Dirk Schübeler’s laboratory for sharing H3K36 methylation data prior to
publication; Luke Ward for data analysis; Reuven
Agami, Fred van Leeuwen, Harmen Bussemaker,
Wouter Meuleman and Ludo Pagie for helpful suggestions. Supported by an EMBO Long-term Fellowship to J.K., and grants from the Netherlands
Genomics Initiative and an EURYI Award to B.v.S.
Data availabilty
DamID, ChIP and expression data, as well as a
list of the coordinates of all identified chromatin
domains are available from NCBI’s Gene Expression
Omnibus, accession number GSE22069.
4
129
References
Bell O, Schwaiger M, Oakeley EJ, Lienert F, Beisel C, Stadler MB, Schubeler D. 2010.
Accessibility of the Drosophila genome discriminates PcG repression, H4K16 acetylation and replication timing.
Nat Struct Mol Biol 17: 894-900.
Berger SL. 2007.
The complex language of chromatin regulation during transcription. Nature 447: 407-412.
Bianchi-Frias D, Orian A, Delrow JJ, Vazquez J, Rosales-Nieves AE, Parkhurst SM. 2004.
Hairy transcriptional repression targets and cofactor recruitment in Drosophila. PLoS Biol 2: E178.
Braunschweig U, Hogan GJ, Pagie L, van Steensel B. 2009.
Histone H1 binding is inhibited by histone variant H3.3. EMBO J 28: 3635-3645.
Bushey AM, Ramos E, Corces VG. 2009.
Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions. Genes Dev 23: 1338-1350.
Cenci G, Rawson RB, Belloni G, Castrillon DH, Tudor M, Petrucci R, Goldberg ML, Wasserman SA, Gatti M. 1997.
UbcD1, a Drosophila ubiquitin-conjugating enzyme required for proper telomere behavior. Genes Dev 11: 863-875.
Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J et al. 2008.
Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133: 1106-1117.
Choksi SP, Southall TD, Bossing T, Edoff K, de Wit E, Fischer BE, van Steensel B, Micklem G, Brand AH. 2006.
Prospero acts as a binary switch between self-renewal and differentiation in Drosophila neural stem cells.
Dev Cell 11: 775-789.
Collas P. 2009.
The state-of-the-art of chromatin immunoprecipitation. Methods Mol Biol 567: 1-25.
Crevel G, Huikeshoven H, Cotterill S. 2001.
Df31 is a novel nuclear protein involved in chromatin structure in Drosophila melanogaster. J Cell Sci 114: 37-47.
Czermin B, Schotta G, Hulsmann BB, Brehm A, Becker PB, Reuter G, Imhof A. 2001.
Physical and functional association of SU(VAR)3-9 and HDAC1 in Drosophila. EMBO Rep 2: 915-919.
de Wit E, Greil F, van Steensel B. 2007.
High-resolution mapping reveals links of HP1 with active and inactive chromatin components. PLoS Genet 3: e38.
Dejardin J, Rappailles A, Cuvier O, Grimaud C, Decoville M, Locker D, Cavalli G. 2005.
Recruitment of Drosophila Polycomb group proteins to chromatin by DSP1. Nature 434: 533-538.
Ebert A, Lein S, Schotta G, Reuter G. 2006.
Histone modification and the control of heterochromatic gene silencing in Drosophila. Chromosome Res 14: 377-392.
Engstrom PG, Ho Sui SJ, Drivenes O, Becker TS, Lenhard B. 2007.
Genomic regulatory blocks underlie extensive microsynteny conservation in insects. Genome Res 17: 1898-1908.
Fauvarque MO, Laurenti P, Boivin A, Bloyer S, Griffin-Shea R, Bourbon HM, Dura JM. 2001.
Dominant modifiers of the polyhomeotic extra-sex-combs phenotype induced by marked P element
insertional mutagenesis in Drosophila. Genet Res 78: 137-148.
Gelbart ME, Bachman N, Delrow J, Boeke JD, Tsukiyama T. 2005.
Genome-wide identification of Isw2 chromatin-remodeling targets by localization of a catalytically inactive mutant.
Genes Dev 19: 942-954.
Giet R, Glover DM. 2001.
Drosophila aurora B kinase is required for histone H3 phosphorylation and condensin recruitment during chromosome condensation
and to organize the central spindle during cytokinesis. J Cell Biol 152: 669-682.
130
References
Gilbert DM. 2002.
Replication timing and transcriptional control: beyond cause and effect. Curr Opin Cell Biol 14: 377-383.
Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. 2007.
FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin.
Genome Res 17: 877-885.
Girton JR, Johansen KM. 2008.
Chromatin structure and the regulation of gene expression: the lessons of PEV in Drosophila. Adv Genet 61: 1-43.
Greil F, de Wit E, Bussemaker HJ, van Steensel B. 2007.
HP1 controls genomic targeting of four novel heterochromatin proteins in Drosophila. EMBO J 26: 741-751.
Greil F, Moorman C, van Steensel B. 2006.
DamID: mapping of in vivo protein-genome interactions using tethered DNA adenine methyltransferase.
Methods Enzymol 410: 342-359.
Greil F, van der Kraan I, Delrow J, Smothers JF, de Wit E, Bussemaker HJ, van Driel R, Henikoff S, van Steensel B. 2003.
Distinct HP1 and Su(var)3-9 complexes bind to sets of developmentally coexpressed genes depending on chromosomal location.
Genes Dev 17: 2825-2838.
4
Guelen L, Pagie L, Brasset E, Meuleman W, Faza MB, Talhout W, Eussen BH, de Klein A, Wessels L, de Laat W et al. 2008.
Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions.
Nature 453: 948-951.
Hediger F, Gasser SM. 2006.
Heterochromatin protein 1: don’t judge the book by its cover! Curr Opin Genet Dev 16: 143-150.
Johansson AM, Stenberg P, Pettersson F, Larsson J. 2007.
POF and HP1 bind expressed exons, suggesting a balancing mechanism for gene regulation. PLoS Genet 3: e209.
Kind J, Vaquerizas JM, Gebhardt P, Gentzel M, Luscombe NM, Bertone P, Akhtar A. 2008.
Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila.
Cell 133: 813-828.
Lee JS, Shilatifard A. 2007.
A site to remember: H3K36 methylation a mark for histone deacetylation. Mutat Res 618: 130-134.
Lenz-Bohme B, Wismar J, Fuchs S, Reifegerste R, Buchner E, Betz H, Schmitt B. 1997.
Insertional mutation of the Drosophila nuclear lamin Dm0 gene results in defective nuclear envelopes, clustering of nuclear pore complexes, and accumulation of annulate lamellae. J Cell Biol 137: 1001-1016.
Lercher MJ, Urrutia AO, Hurst LD. 2002.
Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet 31: 180-183.
Leung JK, Berube N, Venable S, Ahmed S, Timchenko N, Pereira-Smith OM. 2001.
MRG15 activates the B-myb promoter through formation of a nuclear complex with the retinoblastoma protein and
the novel protein PAM14. J Biol Chem 276: 39171-39178.
Liu C, Rubin DB. 1994.
The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81:633-648.
Li B, Gogol M, Carey M, Pattenden SG, Seidel C, Workman JL. 2007.
Infrequently transcribed long genes depend on the Set2/Rpd3S pathway for accurate transcription. Genes Dev 21: 1422-1430.
Liu Z, Oughtred R, Wing SS. 2005.
Characterization of E3Histone, a novel testis ubiquitin protein ligase which ubiquitinates histones. Mol Cell Biol 25: 2819-2831.
131
References
Lu X, Wontakal SN, Emelyanov AV, Morcillo P, Konev AY, Fyodorov DV, Skoultchi AI. 2009.
Linker histone H1 is essential for Drosophila development, the establishment of pericentric heterochromatin, and
a normal polytene chromosome structure. Genes Dev 23: 452-465.
MacAlpine HK, Gordan R, Powell SK, Hartemink AJ, MacAlpine DM. 2010.
Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res 20: 201-211.
Martinez-Balbas MA, Tsukiyama T, Gdula D, Wu C. 1998.
Drosophila NURF-55, a WD repeat protein involved in histone metabolism. Proc Natl Acad Sci U S A 95: 132-137.
Moorman C, Sun LV, Wang J, de Wit E, Talhout W, Ward LD, Greil F, Lu XJ, White KP, Bussemaker HJ et al. 2006.
Hotspots of transcription factor colocalization in the genome of Drosophila melanogaster.
Proc Natl Acad Sci U S A 103: 12027-12032.
Nagy PL, Griesenbeck J, Kornberg RD, Cleary ML. 2002.
A trithorax-group complex purified from Saccharomyces cerevisiae is required for methylation of histone H3.
Proc Natl Acad Sci U S A 99: 90-94.
Negre N, Hennetin J, Sun LV, Lavrov S, Bellis M, White KP, Cavalli G. 2006.
Chromosomal distribution of PcG proteins during Drosophila development. PLoS Biol 4: e170.
Orian A, van Steensel B, Delrow J, Bussemaker HJ, Li L, Sawado T, Williams E, Loo LW, Cowley SM, Yost C et al. 2003.
Genomic binding by the Drosophila Myc, Max, Mad/Mnt transcription factor network. Genes Dev 17: 1101-1114.
Peters AH, Kubicek S, Mechtler K, O’Sullivan RJ, Derijck AA, Perez-Burgos L, Kohlmaier A, Opravil S,
Tachibana M, Shinkai Y et al. 2003.
Partitioning and plasticity of repressive histone methylation states in mammalian chromatin. Mol Cell 12: 1577-1589.
Pickersgill H, Kalverda B, de Wit E, Talhout W, Fornerod M, van Steensel B. 2006.
Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet 38: 1005-1014.
Pindyurin AV, Moorman C, de Wit E, Belyakin SN, Belyaeva ES, Christophides GK, Kafatos FC,
van Steensel B, Zhimulev IF. 2007.
SUUR joins separate subsets of PcG, HP1 and B-type lamin targets in Drosophila. J Cell Sci 120: 2344-2351.
R Development Core Team. 2009.
R: A Language and Environment for Statistical Computing. www.r-project.org
Rando OJ, Chang HY. 2009.
Genome-wide views of chromatin structure. Annu Rev Biochem 78: 245-271.
Reddy KL, Zullo JM, Bertolino E, Singh H. 2008.
Transcriptional repression mediated by repositioning of genes to the nuclear lamina. Nature 452: 243-247.
Ringrose L. 2007.
Polycomb comes of age: genome-wide profiling of target sites. Curr Opin Cell Biol 19: 290-297.
Ringrose L, Paro R. 2007.
Polycomb/Trithorax response elements and epigenetic memory of cell identity. Development 134: 223-232.
Sancho M, Diani E, Beato M, Jordan A. 2008.
Depletion of human histone H1 variants uncovers specific roles in gene expression and cell growth. PLoS Genet 4: e1000227.
Schmiedeberg L, Skene P, Deaton A, Bird A. 2009.
A temporal threshold for formaldehyde crosslinking and fixation. PLoS One 4: e4636.
Schotta G, Ebert A, Reuter G. 2003.
SU(VAR)3-9 is a conserved key function in heterochromatic gene silencing. Genetica 117: 149-158.
132
References
Schubeler D, MacAlpine DM, Scalzo D, Wirbelauer C, Kooperberg C, van Leeuwen F, Gottschling DE,
O’Neill LP, Turner BM, Delrow J et al. 2004. The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev 18: 1263-1271.
Schwaiger M, Stadler MB, Bell O, Kohler H, Oakeley EJ, Schubeler D. 2009.
Chromatin state marks cell-type- and gender-specific replication of the Drosophila genome. Genes Dev 23: 589-601.
Shevelyov YY, Lavrov SA, Mikhaylova LM, Nurminsky ID, Kulathinal RJ, Egorova KS, Rozovsky YM, Nurminsky DI. 2009.
The B-type lamin is required for somatic repression of testis-specific gene clusters. Proc Natl Acad Sci U S A 106: 3282-3287.
Singh RK, Kabbaj MH, Paik J, Gunjan A. 2009.
Histone levels are regulated by phosphorylation and ubiquitylation-dependent proteolysis. Nat Cell Biol 11: 925-933.
Slater GS, Birney E. 2005.
Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31.
Smith MB, Weiler KS. 2010.
Drosophila D1 overexpression induces ectopic pairing of polytene chromosomes and is deleterious to development.
Chromosoma 119: 287-309.
Sparmann A, van Lohuizen M. 2006.
Polycomb silencers control cell fate, development and cancer. Nat Rev Cancer 6: 846-856.
Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE et al. 2004.
A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306: 655-660.
Tie F, Furuyama T, Prasad-Sinha J, Jane E, Harte PJ. 2001.
The Drosophila Polycomb Group proteins ESC and E(Z) are present in a complex containing the histone-binding protein p55 and the
histone deacetylase RPD3. Development 128: 275-286.
Tie F, Prasad-Sinha J, Birve A, Rasmuson-Lestander A, Harte PJ. 2003.
A 1-megadalton ESC/E(Z) complex from Drosophila that contains polycomblike and RPD3. Mol Cell Biol 23: 3352-3362.
Tolhuis B, de Wit E, Muijrers I, Teunissen H, Talhout W, van Steensel B, van Lohuizen M. 2006.
Genome-wide profiling of PRC1 and PRC2 Polycomb chromatin binding in Drosophila melanogaster. Nat Genet 38: 694-699.
Tomancak P, Berman BP, Beaton A, Weiszmann R, Kwan E, Hartenstein V, Celniker SE, Rubin GM. 2007.
Global analysis of patterns of gene expression during Drosophila embryogenesis. Genome Biol 8: R145.
van Steensel B, Delrow J, Henikoff S. 2001.
Chromatin profiling using targeted DNA adenine methyltransferase. Nat Genet 27: 304-308.
van Steensel B, Henikoff S. 2000.
Identification of in vivo DNA targets of chromatin proteins using tethered dam methyltransferase. Nat Biotechnol 18: 424-428.
Wang Z, Zang C, Cui K, Schones DE, Barski A, Peng W, Zhao K. 2009.
Genome-wide mapping of HATs and HDACs reveals distinct functions in active and inactive genes. Cell 138: 1019-1031.
Wolffe AP, Leblanc BP. 2000.
Creating molecular clues to uncover gene function. Nat Biotechnol 18: 379-380.
Yasuhara JC, Wakimoto BT. 2006.
Oxymoron no more: the expanding world of heterochromatic genes. Trends Genet 22: 330-338.
Zhang P, Du J, Sun B, Dong X, Xu G, Zhou J, Huang Q, Liu Q, Hao Q, Ding J. 2006.
Structure of human MRG15 chromo domain and its binding to Lys36-methylated histone H3. Nucleic Acids Res 34: 6621-6628.
Zhimulev IF, Belyaeva ES, Makunin IV, Pirrotta V, Volkova EI, Alekseyenko AA, Andreyeva EN, Makarevich GF, Boldyreva LV,
Nanayev RA et al. 2003.
Influence of the SuUR gene on intercalary heterochromatin in Drosophila melanogaster polytene chromosomes.
Chromosoma 111: 377-398.
133
4
Supplementary Material
2
3
4
5
6
7
8
9
12
10
11
13
14
15
16
17
18
19
20
21
22
23
25
26
27
28
24
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
a))
b)
c)
134
ACT5C
ASF1
ASH2
BCD
BEAF32b
BRM
CAF1
CDK7
CG31367
CtBP
CTCF
D1
DF31
DSP1
DWG
E(Z)
ECR
EFF
GAF
GRO
H1
HP1
HP1c
HP6
IAL
JRA
LAM
LHR
LOLAL
MAX
MED31
MNT
MRG15
MUS209
PC
PCAF
PCL
PHO
PHOL
PROD
RPD3
RPII18
SCE
SIN3A
SIR2
SU(HW)
SU(VAR)2-10
SU(VAR)3-7
SU(VAR)3-9
SUUR
TBP
TIP60
TOP1
FBpp0070787 pGWMycDam
X
FBpp0074715 pCreDamMyc
X
FBpp0084040 pCreDamMyc
FBpp0081165
pDamMyc
FBpp0086572
pGWMycDam
FBpp0075279
pDamMyc
FBpp0082511 pGWDamMyc
FBpp0070723
pCreDamMyc
FBpp0292134 pCreDamMyc
not specified
pDamMyc
FBpp0076588
pGWDamMyc
FBpp0081468
pGWDamMyc
This paper
This paper
This paper
X
a)
X
b)
X
X
X
X
This paper
This paper
X
This paper
c)
X
d)
X
X
e)
X
This paper
pCreDamMyc
FBpp0088319
pGWDamMyc
FBpp0070465
pGWMycDam
FBpp0076008 pCreDamMyc
not specified
pDamMyc
FBpp0082477 pCreDamMyc
FBpp0089419
pDamMyc
FBpp0084337
pMycDam
FBpp0085248
pGWDamMyc
FBpp0079251
pDamMyc
FBpp0083702
pMycDam
FBpp0077134
pDamMyc
X
f)
FBpp0079769 pCreDamMyc
X
This paper
FBpp0087499
pDamMyc
FBpp0078733
pDamMyc
FBpp0086073
pDamMyc
FBpp0085956 pCreDamMyc
X
FBpp0074785
pMycDam
X
FBpp0078496 pCreDamMyc
FBpp0070554
pMycDam
FBpp0082579
pCreDamMyc
FBpp0078496 pCreDamMyc
FBpp0078059
pDamMyc
FBpp0075701 pGWDamMyc
FBpp0085914
pCreDamMyc
FBpp0088268 pMycDam
FBpp0088268 pCreDamMyc
FBpp0085785
pCreDamMyc
FBpp0073173
pMycDam
FBpp0078433 pDamMyc
FBpp0084614
pMycDam
FBpp0087003
pDamMyc
X
FBpp0080015
pMycDam
X
FBpp0082404
pGWDamMyc
FBpp0087657 pCreDamMyc
FBpp0082204
pDamMyc
X
FBpp0082583
pMycDam
X
FBpp0075933
pDamMyc
X
FBpp0071596
pCreDamMyc
FBpp0070636
pGWMycDam
FBpp0073822
pGWMycDam
van Steensel et al., 2009
Bianchi-Frias et al., 2004
d)
e)
f)
X
Braunschweig et al., 2009
d)
e)
X
d)
X
X
d)
This paper
X
a)
X
This paper
X
X
X
d)
X
X
X
d)
a)
X
X
f)
X
f)
f)
f)
X
a)
X
d)
X
d)
f)
X
This paper
X
a)
X
This paper
X
h)
X
This paper
X
X
X
This paper
i)
X
This paper
X
X
This paper
X
X
a)
X
X
This paper
This paper
X
a)
X
f)
X
i)
X
h)
c)
b)
X
This paper
X
e)
f)
X
X
j)
This paper
X
van Bemmel et al., submitted, 2010
de Wit et al., 2008
d)
a)
X
FBpp0085273 Moorman et al., 2006
Profile published before
Plasmid Published
Replication
Insulator
Trithorax Group
Structural Protein
Cofactor
Chromatin Assembly/
Remodeling
Binds Histone Modification
Histone Modifying Enzyme
Polycomb Group
Nuclear Envelope
Transcription Machinery
Transcription Factor
Pericentric Heterochromatin
Name
1
Plasmid Backbone
FlyBase Protein Accession
Supplementary Table 1: Overview of chromatin components that were used in this study
This paper
X
X
g)
h)
i)
Greil et al., 2003
Orian et al., 2003
Tolhuis et al., 2006
j)
Pindyurin et al., 2007
b)
f)
Supplementary Material
4
This page was intentionally left blank.
135
Supplementary Material
15
10
H1 ChIP
PC1
5
0.5 −4
−2
0
B
0.0
log2(H1 ChIP/input)
log2(Dam−H1/Dam)
H1 DamID
−1.0
A
0
Genes
11650
11750
11850
11950
12050
12150
12250
12350
12450
12550
12650
0
5
PC2
8
6
SU(HW) ChIP
PC3
4
1 2 3 4
−1
0
1
2
3
−5
SU(HW) DamID
−1
log2(SU(HW) ChIP/input) log2(Dam−SU(HW)/Dam)
Position on chr2L (kb)
2
Genes
6900
7000
7100
7200
7300
7400
7500
7600
7700
7800
0
6800
1.5 2.5
−2
0
5
PC2
CTCF ChIP
2
6
D
3
4 −0.5
−5
4
1
log2(CTCF ChIP/input)
CTCF DamID
−1
log2(Dam−CTCF/Dam)
Position on chr2R (kb)
500
600
700
800
900
1000
1100
1200
1300
1400
−2
Position on chr2L (kb)
0
400
PC1
2
Genes
C
−4
80
−6
−4
−2
0
2
4
6
2
4
6
35
40
45
50
PC3
−4
Subsample size
0
% Faithful segmentations
% Overlap with current segmentation
−2
20
2
4
40
6
60
PC2
0
Estimated percentage
100
Robustness of the classification
−6
−4
−2
0
PC2
136
Supplementary Material
Supplementary Figure 1 (associated with Fig. 1)
Validation of DamID and robustness analysis of the 5-type segmentation. (A) Comparisons of DamID (Braunschweig et al. 2009) and ChIP for histone H1 (this study), SU(HW), and CTCF (Bushey et al. 2009). All datasets were
subjected to running mean smoothing with a window of three data points. Genes on the top and bottom strand
are depicted as lines with blocks indicating exons. (B) The five-state classification is robust to the quantification
method. DamID profiles for each protein were binarized before projection onto the first three principal components. The shape of the cloud is different from the one shown in main Figure 1B, but the five types still form
separated clouds in the first three principal components. (C) Sensitivity analysis of the segmentation. The Principal
Component Analysis and HMM procedure was applied as in main Figure 1 to datasets where one or more proteins
were left out. It is expected that smaller sets will not always exhibit all five states; for example, a subset that lacks
the four PcG proteins will not identify the BLUE state. In that case, the loci formerly assigned to BLUE will be assigned to another color by the HMM. We call such a segmentation “unfaithful”. On the contrary, “faithful” segmentations show no replacement of a color by another. The percentage of overlap between the current segmentation
(based on the complete dataset) and unfaithful segmentations is irrelevant: it mostly reflects the size of the type
that has been replaced. For example if BLUE has been replaced by another color at least 20% of the calls will differ
(i.e. all the former BLUE calls). For subsets of 35-51 proteins, 50 samples were drawn at random without replacement. For subsets of 52 proteins, all possible 53 subsets were tested. For each subsample size, the percentage of
faithful segmentations (thin lines) and their mean percentage of overlap with the current segmentation (thick
lines) were determined. Vertical bars represent ± standard deviations. The plot shows that larger sets of proteins
give an increasingly reliable discovery of the five states. Faithful segmentations show substantial agreement with
the current definition, even for smaller sample sizes. Thus, identification of the states is sensitive to protein choice,
but the classification itself is robust. (D) Minimal set of proteins defining the five types. A carefully selected set of
5 proteins (histone H1, PC, HP1, MRG15 and BRM) summarizes the segmentation in 5 types. The data points were
projected on their first three principal components, showing that the types are clearly visible with this minimal set
of proteins. The coloring was obtained by applying the HMM to the first three principal components, exactly as
was done for the 53 proteins. The agreement with the 53 profile segmentation is 85.5%.
137
4
Supplementary Material
chr2L
A
18
19
20
chr2R
21
22
0
1
Position (Mb)
2
3
4
Position (Mb)
chr4
0
B
1
Antp−C
AmaDfd ftz
lab pb bcd
2.4
Scr
Antp
2.6
2.8
3.0
Position on chr3R (Mb)
BX−C
iab−4
Ubx bxd abd−A
12.4
Abd−B
12.6
12.8
13.0
Position on chr3R (Mb)
Supplementary Figure 2 (associated with Fig. 3)
Localization of GREEN and BLUE chromatin supports their identity with known heterochromatin types. (A) Chromosomal maps of the pericentric region of chromosome 2 and of the entire chromosome 4. GREEN chromatin
domains are shown, and other types are collectively represented in grey. Constrictions symbolize centromeres. (B)
Close-up view of the HOX gene clusters. The HOX genes are known to be Pc targets in Kc167 cells. Genes on the
top and bottom strands are shown as boxes
138
Supplementary Material
0
18
Proteins bound
36
A
16.2
16.4
16.6
16.8
17.0
17.2
Position on chr2L (Mb)
B
4
15000
0 400 1000 0
1500
3000
14000
600
0
Frequency
Frequency
0
Frequency
Frequency
Frequency
Protein count per locus
0
10
20
30
40
Number of bound proteins (out of 53 proteins)
Supplementary Figure 3 (associated with Fig. 6)
Chromatin types differ widely in their total protein occupancy. (A) Sample plot of the total occupancy (out of 53
mapped proteins) on a 1Mb segment of chromosome 2L. The height of each vertical line indicates the number
of proteins bound to a locus. The color of the line indicates the local chromatin type. (B) Histograms of total occupancy distributions per chromatin type. Grey vertical dashed lines indicate median values.
139