SUPPLEMENTARY INFORMATION Choice of Cell Types Used in

SUPPLEMENTARY INFORMATION
Choice of Cell Types Used in This Study. To begin to understand transcription-associated
looping interactions at the TAL1 locus, we studied three different cell types: (i) erythroid cells
which express TAL1, (ii) lymphoid T-ALL cells which do not express TAL1, and lymphoid
T-ALL cells which express TAL1. The human K562 cell line is a well-documented cell type
which reflects the in vivo properties of the erythroid lineage, GATA1 occupancy at target
genes,1 and for which we have previously published data on the activity of the TAL1 +51
erythroid enhancer.2,3 Using this cell type would also allow us to understand one of the key
questions related to the complexity of the TAL1 locus – how does the TAL1 +51 erythroid
enhancer communicate with its cognate promoters? Jurkat cells which express TAL1 in TALL (but which do not have a TAL1/STIL deletion) served as a well-characterized T-ALL
cell type4 to understand looping interactions which may regulate inappropriate TAL1
expression during leukemogenesis. HPB-ALL cells, which are of T-ALL origin, but which do
not express TAL1, served as a “control” cell type from which we could determine
transcription-associated looping interactions in TAL1 expressing cell types (either K562 or
Jurkat). To validate that the looping interactions we identified in human cell types, were also
found in mouse, we chose an appropriate TAL1 expressing murine erythroid cell line (MEL)
and a TAL1 non-expressing lymphoid cell line (BW5147). The regulatory features (histone
modifications) of the Tal1 locus has been studied extensively in both of these cell types in our
laboratory and showed similar characteristics to those of K562 and HPB-ALL.3 To confirm
that the looping interactions we found in human and mouse cells reflected bona fide in vivo
interactions in the erythroid and lymphoid lineages, we also examined looping in murine
primary erythroblasts and lymphocytes. These two cell types were the obvious choices for
these confirmatory studies.
SUPPLEMENTARY FIGURES AND TABLES
Supplementary Figure S1. Schematic diagrams of the organization of the human TAL1
regulon. A number of cis-regulatory elements have been characterized at the TAL1 locus. In
addition to the TAL1 promoters 1a and 1b, these include the stem cell enhancer (designated
+17/+18/+19 and +19/+20/+21 in mouse and human respectively, based on their distances in
kb from promoter 1a), and the erythroid enhancer (+40 and +51 in mouse and human
respectively). The stem cell and erythroid enhancers are believed to direct TAL1 expression
in the hematopoietic stem cell5,6 and in the erythroid lineage7 respectively. More recent
studies have demonstrated the presence of CTCF-bound elements at the TAL1 locus which
display insulator enhancer-blocking or barrier activity in vitro or in vivo.2,8 Collectively, these
regulatory elements span approximately 88 kb in human – a genomic region which contains
the entire TAL1 “regulon” defined by CTCF-bound elements at both ends (at +57 and -31).2
In all panels, locations and directions of transcription of TAL1 and its flanking genes
PDZK1IP1, STIL and CMPK1 are shown by the horizontal blue arrows. Locations of their
cognate promoters are shown with vertical red arrows. Enhancers of the TAL1 gene studied
here (the +19/+20/+21 stem cell enhancer, the +51 erythroid enhancer and the -10 enhancer)
are shown with vertical green arrows. CTCF-bound elements studied here (+57, +53, +40 and
-31) are shown with vertical blue arrows and labelled with CTCF in the blue hexagon. Scales
(in kb) are shown. (A) The extent of the predicted regulon (horizontal black line with
arrowheads) is 88 kb and defined by CTCF-bound elements (+57 and -31) at its extents.2 (B)
The CTCF-bound element (+40) is juxtaposed between the TAL1 stem cell enhancer and the
PDZK1IP1 promoter and thus prevents communication between these two elements (denoted
by X) and impairs transcription (denoted by X). (C) The CTCF-bound element (+40) is
juxtaposed between the TAL1 erythroid enhancer and the TAL1 promoters and thus prevents
communication between these elements (denoted by X) and impairs transcription (denoted by
X). (D) Transcription of the STIL gene is impaired (denoted by X) by the presence of the
CTCF-bound element (-31) within its transcribed region. This is due to the inability of RNA
polymerase II (Pol II) (brown oval) to transcribe through the region occupied by CTCF
(denoted by X). In the scenarios shown in panels B, C and D, the impediments imposed to
transcription are compatible with known roles of CTCF-bound insulator elements in
preventing communication between regulatory elements by altering chromatin loop formation
[loop domain model9] or by interfering with the movement of Pol II [tracking model10].
Supplementary Figure S2. mRNA expression levels of TAL1 and its flanking genes.
(A).Bar diagram showing mRNA expression levels (log2) of human PDZK1IP1, TAL1, STIL
and CMPK1 in K562 and HPB-ALL cell lines. (B) Bar diagram showing mRNA expression
levels (log2) of murine Pdzk1ip1, Tal1 and Stil in MEL and BW5147 cell lines. All data are
shown with standard error measurements. Expression levels were determined relative to
housekeeping gene ACTB, the mRNA value of which was set at log2 = 16.6 (panel A) and
log2 = 13.3 (panel B) (not shown in the bar diagrams).
Supplementary Figure S3. Schematic flow diagram of the 3C procedure. 3C uses
formaldehyde cross-linking (A) to covalently fix interacting chromatin segments to proteins
at their sites of occupancy within nuclei of living cells. Subsequently, the crosslinked
chromatin is digested using an appropriate restriction endonuclease (B), followed by intramolecular ligation of cross-linked chromatin fragments (C). The resulting 3C library contains
a large number of ligation products, a proportion of which reflect interactions between non
adjacent genomic regions which lie in close proximity within the nucleus. These interactions
can be detected by PCR using oligonucleotide primer pairs (one primer originating from each
of the regions which are to be tested) (D). The interaction frequency of two genomic regions
is represented by the abundance of corresponding ligation product amplified by PCR –
quantified by gel electrophoresis (E). The 3C libraries prepared for this study used a 4-bp
cutting restriction endonuclease (Csp6I) which generates restriction fragments of 600 bp, on
average, across both the human and mouse TAL1 loci – thus allowing us to detect differences
in ligation frequencies at sufficient resolution to resolve between regulatory elements. Further
details of library preparation can be found in Methods.
Supplementary Figure S4. Looping interactions involving Tal1 promoters and
enhancers in murine erythroid and lymphoid cell lines. (A) Schematic organization of the
murine Tal1 locus. Locations and directions of transcription of TAL1 and its flanking genes
Pdzk1ip1, Stil and Cmpk1 are shown (horizontal blue arrows). Locations of promoters
(vertical red arrows) and enhancers of the Tal1 gene studied here (vertical green arrows) are
shown. The erythroid and the stem cell enhancers are highlighted. Elements are named
according to their distance (in kb) from Tal1 promoter 1a. Looping interactions tested in this
study are denoted by dotted grey lines with arrowheads. (B) Bar diagram of interaction
patterns across the murine Tal1 locus in erythroid (MEL) and lymphoid (BW5147) cell lines
determined by 3C. Interactions, measured as relative ligation frequencies (black bars) at
various locations across the locus, are shown with standard errors. Location of 3C “bait”
region (Tal1 promoter 1b = PTal1) is shown (vertical red arrows). p values are indicated for
relative ligation frequencies which are significantly higher for test regions when compared to
those of control regions (controls defined as regions located between the “bait” and test
regions). Scales (in kb) are shown at the bottom of panel B. (C) Comparison of interaction
patterns at the TAL1 locus in MEL (TAL1 expressing) and BW5147 (TAL1 non-expressing)
cells normalized against ERCC ligation frequencies. p values are indicated for interactions
which are significantly higher in MEL cells. p < 0.01 = **; p < 0.001 = ***; p < 0.0001 =
****; p < 0.00001 = *****.
Supplementary Figure S5. Assessment of 3C library quality in cell types used in this
study. Bar diagrams show the interaction frequencies between two non co-linear Csp6I
fragments at the ERCC3 locus determined by 3C. (A) Human K562, HPB-ALL and Jurkat
cell lines. (B) Murine primary erythrocytes and lymphocytes. (C) Murine MEL and BW5147
cell lines. (D) K562 cells 48 hrs after transfection with siRNA for luciferase (LUC) or
GATA1 (KDGATA1). (E) K562 cells 96 hrs after transfection with siRNA for luciferase (LUC)
or GATA1 (KDGATA1). Interaction levels (grey bars), measured as relative ligation
frequencies (black bars), are shown with standard errors and represent the mean from 2
bioreplicate samples. These ERCC3 data were used to normalize relative ligation frequencies
for comparisons between cell types. Note: restriction enzyme digestion frequencies for these
libraries varied between 73-92% (not shown here). These were measured by assessing
digestion efficiency at a single Csp6I site at the TAL1 locus.11
Supplementary Figure S6. Effect of GATA1 siRNA on K562 cell growth, morphology
and apoptosis. For all analyses described below, four conditions were analyzed at both 48
and 96 hr time points. Two conditions were used as negative controls: wild type K562 cells
(WT) and K562 cells electroporated with water only (EP). The two test conditions were K562
cells electroporated with either luciferase siRNA (LUC) or GATA1 siRNA (GATA1). (A)
Growth curve of viable cell numbers relative to input (0 hr) at both 48 and 96 hr. Three
biological replicates were analyzed for each condition and viable cell counts assayed using a
haemocytometer. (B) Bar diagram showing percentages of cells having more than one nuclei
per cell. (C) Bar diagram showing percentages of cells having irregular bulges (blebs) in the
plasma
membrane.
(D)
Bar
diagram
showing
percentages
of
cells
showing
immunofluorescence for annexin V (marker of apoptosis). For the analyses in panels B→D,
three bioreplicates of 100 cells for each condition were examined by microscopy and scored
for nuclei content, blebs and annexin V staining. All data are shown with standard error
measurements. Significant differences of cellular structure, function and viability were only
identified between wild type K562 cells and any of the conditions subjected to
electroporation (electroporation with water; electroporation with luciferase siRNA and
electroporation with GATA1 siRNA) at either the 48 hr or 96 hr time point. This confirmed
that electroporation per se had the most significant detrimental effect on K562 cells.
Introduction of GATA1 siRNA into K562 cells had, however, no significant detrimental
effects on cellular functions and viability above levels detected in the electroporation with
water control.
Supplementary Figure S7. ENCODE ChIP-seq data across the human TAL1 locus.
Publically
available
ENCODE12
ChIP-seq
data
for
TAL1
(Snyder,
Stanford;
wgEncodeEH001824), GATA1 (Farnham, USC; wgEncodeEH000638), GATA2 (Farnham,
USC; wgEncodeEH000683), CTCF (Snyder, Stanford; wgEncodeEH002797), RAD21
(Snyder, Stanford; wgEncodeEH000649), SMC3 (Snyder, Stanford; wgEncodeEH00184),
and CTCFL (Myers-Hudson Alpha, wgEncodeEH001652) were visualized across the human
TAL1 locus using the UCSC genome browser (http://genome.ucsc.edu/). Genes and their
exon-intron structures are shown at the top of the figure. TAL1 regulatory elements are
annotated at the bottom of the figure. Two CTCF peaks (57-1 and 57-2) are shown for the
CTCF-binding element at +57 (see Supplementary Figure S11). Scale is shown and coordinates are for chromosome 1 (hg19). Note: we examined all publicly available ENCODE
datasets for the K562 cell line to identify other features which may be unique to -31 and aid
in our understanding of its function. The -31 element was the only insulator at the TAL1 locus
which showed occupancy of the CTCF paralogue, CTCFL, in K562 cells. Furthermore,
whilst -31 appeared to bind CTCF and RAD21 in a GATA1-dependent manner, GATA1 was
not directly bound to it in K562 (see manuscript text), nor did it have conserved GATA1
motifs.
Supplementary
Figure
S8.
Schematic
models
showing
all
possible
looping
configurations involving CTCF and RAD21 occupied elements at the TAL1 locus in
erythroid cells. (A) Interactions between the +40 and -31 elements result in the TAL1
promoters (red box) being placed in a loop containing the +19/+20/+21 stem cell enhancer
(green box). However, the +51 erythroid enhancer (green box) is not contained within this
loop as it lies distal to the +40 element. (B) Interactions between the +57 (or +53) and +40
elements place the erythroid enhancer in a chromatin loop. However, the TAL1 promoters and
the stem cell enhancer are not within this loop. (C) A composite looping pattern containing
the loops from A and B places the TAL1 promoters and the stem cell enhancer in a separate
loop from that containing the erythroid enhancer. (D) Interactions between the +57 (or +53)
and -31 elements place the TAL1 promoters and the stem cell and erythroid enhancers in the
same chromatin loop, thus facilitating their communication through direct contact (green line
with arrows). CTCF and RAD21 occupancy is shown in the colour key.
Supplementary Figure S9. Looping interactions of CTCF/RAD21-bound elements at the
TAL1 locus. Bar diagrams of looping interactions involving the +53, +40 and -31 elements
determined by 3C in K562 and HPB-ALL cell lines. (A) Interactions between +53 (bait) and
+40 and -31. (B) Interactions between -31 (bait) and +53 and +40. (C) Interactions between
+40 (bait) and +53 and -31. Interaction frequencies (black bars), as measured by relative
ligation frequencies, are shown with standard errors and normalized relative to BAC controls.
Locations of 3C “bait” regions are denoted by vertical red arrows. Locations of genes at the
TAL1 locus and their directions of transcription are shown at the top of the figure. p values
are indicated for interaction frequencies which are significantly higher for test regions when
compared to those of control regions. Scales (in kb) are shown at the bottom of the figure. p <
0.0001 = ****. Whilst +53 showed an elevated interaction frequency with +40 in K562
(Supplementary Figure S9a), we did not consider this interaction to be biologically relevant
because: (i) the +46 control region located between +53 and +40 also showed even higher
ligation frequencies suggesting that random ligation events accounted for the data between
+53 and +40, (ii) the interaction could not be validated when +40 was used as a bait in 3C
(Supplementary Figure S9c); (iii) there was no significant difference in the levels of this
interaction in K562 and HPB-ALL cells which would suggest transcriptional-dependence
(Supplementary Figure S10).
Supplementary
Figure
S10.
Comparisons
of
looping
interactions
between
CTCF/RAD21-bound elements in K562 and HPB-ALL cells normalized against ERCC
ligation frequencies. (A) Bar diagram showing interaction frequencies between +53 (bait),
+40 and -31. (B) Bar diagram showing interaction frequencies between -31 (bait), +53 and
+40. (C) Bar diagram showing interaction frequencies between +40 (bait), +53 and -31.
Locations of 3C “bait” regions are denoted by vertical red arrows. p values are indicated for
interactions which are significantly higher in K562 (TAL1 expressing) cells. p < 0.0001 =
****; p <0.00001 = *****.
Supplementary Figure S11. Evolutionary conservation of CTCF motifs at CTCF and
RAD21 bound elements at the TAL1 locus. (A) Schematic diagram of the TAL1 locus.
Locations and directions of transcription of TAL1 and its flanking genes PDZK1IP1, STIL
and CMPK1 are shown by the horizontal blue arrows. Locations of their cognate promoters
and the TAL1 enhancers (+51, +19 → +21, -10) are shown with vertical red or green arrows
respectively. CTCF and RAD21 bound elements studied here (+57, +53, +40 and -31) are
shown with vertical blue arrows. Scale (in kb) is shown. (B) CTCF motifs at +57, +53, +40
and -31 are shown. Two CTCF motifs (italics) were identified at the +57 element, while a
single motif was identified at each of +53, +40 and -31. (C) The ENCODE project12 data
tracks
showing
ChIP-seq
data
for
CTCF
(wgEncodeEH002797)
and
RAD21
(wgEncodeEH000649) at +57, +53, +40 and -31 were obtained from public ENCODE data
released from the Snyder (Stanford) laboratory. Scales and genome co-ordinates for human
chromosome 1 in bp (hg19) are also shown. CTCF motifs at these elements align to peaks of
CTCF binding and show strong similarity to the canonical CTCF motif13 (shown at the
bottom of each panel). CTCF motifs at these elements are conserved across species at the
DNA sequence level. Sequence conservation across five species (human, mouse, rat, dog and
chicken) is shown for each CTCF motif. When compared to the motifs at +57, +53, and +40,
the CTCF motif at -31 was the most highly conserved through evolution. (D) Composite
showing alignment of all five composite CTCF motifs found at the +57, +53, +40 and -31
elements. The consensus 20 bp CTCF motif13 is also shown. Sequence differences between
CTCF motifs at -31, when compared to those at +57, +53 and +40, are highlighted with the
red arrows.
Supplementary Figure S12. Occupancies of CTCF and RAD21 at insulator elements at
the TAL1 locus. (A) Occupancies for CTCF and RAD21 at the +57, +40 and -31 elements in
K562 cells. (B) Occupancies for CTCF and RAD21 at the +57, +40 and -31 elements in
HPB-ALL cells. ChIP enrichments (log2) are shown with standard errors. Annotation of test
and negative control regions is denoted in black and grey text respectively. Positive control is
a CTCF/RAD21-bound element at the HNF4A locus. The green arrow highlights the lower
levels of RAD21 at the -31 element in HPB-ALL cells. In HPB-ALL, the -31 element does
not participate in looping with other CTCF/RAD21-bound elements at the TAL1 locus (see
Figure 4 and Supplementary Figures S9 and S10).
Supplementary Figure S13. Schematic flow diagram of the 4C-microarray procedure.
4C uses formaldehyde cross-linking (A) to covalently fix interacting chromatin segments to
proteins at their sites of occupancy within nuclei of living cells. Subsequently, the crosslinked
chromatin is digested using an appropriate restriction endonuclease (B), followed by intramolecular ligation of cross-linked chromatin fragments (C). The resulting 3C library contains
a large number of ligation products, a proportion of which reflect interactions between non
co-linear genomic regions which lie in close proximity within the nucleus; these include
products containing the “bait” (i.e., the region of interest) ligated to a range of interacting
DNA “prey” fragments. A → C are steps also used in 3C (see Supplementary Figure S2).
The 3C library is then subjected to sonication (D) which reduces the average size of ligated
fragments, thus avoiding incomplete primer extension in the following step. Primer extension
(E) is with a 5’-biotinylated primer complementary to the “bait” sequence. The fragments
containing the “bait” sequence after primer extension are isolated from the pool of 3C DNA
using streptavidin beads, followed by blunt-ending and blunt adapter ligation (F). PCR
amplification of fragments containing the “bait” sequence are generated using a nested primer
complementary to the “bait” in combination with a nested adapter primer. The resultant
products are the “bait”-specific 4C library (G). The 4C DNA is then fluorescently labelled
and hybridised onto a TAL1 genomic tiling path microarray2 in a competitive hybridization
with fluorescently labelled total genomic DNA from the cell type in question (H). Array
information is obtained and quantified as previously described.2
Supplementary Figure S14. 4C interaction patterns obtained across the TAL1 locus
using TAL1 promoter 1b as the “bait”. (A) K562. (B) HPB-ALL. Y axes in A and B show
the frequencies of interactions expressed as a proportion of the “bait” signals for each
microarray tile. X axis shows location of each microarray tile across the TAL1 locus and its
flanking genes. (C) Organization of the human TAL1 locus. The location of Csp6I sites are
shown by black bars. The scale is genome co-ordinates (bp) for human chromosome 1
(hg.17). Gene names are annotated in black. Exon-intron structures of genes are shown as
joined up blue bars. Directions of transcription of genes are shown as black arrows. The
location of all promoter, enhancer and CTCF/RAD21 elements at the TAL1 locus previously
described2 are shown at the bottom of panel C. The location of the TAL1 promoter 1b “bait”
is shown by the red line.
Supplementary Figure S15. Looping interactions at the TAL1 locus relevant to T-ALL
biology. (A) Bar diagrams of interaction patterns between the TAL1 promoters and the 81/TALd breakpoint region (intron 1 of STIL) in K562, HPB-ALL and Jurkat cells
determined by 3C. (B) Bar diagrams of interaction patterns across the TAL1 locus in Jurkat
cells determined by 3C. In both A and B, interactions, measured as relative ligation
frequencies (black bars), are shown with standard errors. Location of 3C “bait” region (TAL1
promoter 1b = PTAL1) is shown in each panel (vertical red arrows). p values are indicated for
relative ligation frequencies which are significantly higher for test regions when compared to
those of control regions (controls defined as regions located between the “bait” and test
regions). Scales (in kb) are shown at the bottom of the panels. (C). Comparison of interaction
patterns at the TAL1 locus in K562 and Jurkat cells normalized against ERCC ligation
frequencies. The location of the Jurkat -7 enhancer4 approx. 500 bp downstream of -8 is
shown. p values are indicated for interactions which are significantly higher in Jurkat cell. p
< 0.01 = **; p < 0.001 = ***; p < 0.0001 = ****.
Supplementary Figure S16. Spatial interactions between the TAL1 active hub and
deletion breakpoints found in T-ALL. (A) Schematic diagram shows the location of a
common breakpoint in intron 1 of the STIL gene (TALd) which becomes juxtaposed to either
one of four sites in the 5’-proximal portion of the TAL1 gene in T-ALL patients (TALd1 →
d4 14
). Breakpoints are shown as the red arrowheads. Black bar is the genomic region spanning
the TAL1 and STIL genes while the dotted region represents the approximate size of T-ALL
STIL/TAL1 deletions. The TALd1 breakpoint occurs close to the 5’ boundary of TAL1
promoter 1b. The schematic organization of the TAL1 gene is also shown. TAL1 exons lying
adjacent to all three TAL1 promoters are shown as green boxes; other TAL1 exons are shown
as red boxes. The STIL promoter (PSTIL) is also shown. (B) Looping interactions between
TAL1 promoter 1b and intron 1 of the STIL gene occur at the sites of deletion breakpoints in
T-ALL. The schematic diagram shows the organization of the TAL1 and STIL genes (scale
and co-ordinates according to hg17) and the genomic location of the two microarray tiles
which detected signals for the 4C “bait” sequence (containing TAL1 promoter 1b; denoted as
Tile TAL1 P1B) and its interacting “prey” sequence within intron 1 of the STIL gene (denoted
as Tile STIL +1). The location of each of the two T-ALL breakpoints (TALd1 and TALd)
within each tile is also shown. The length of each microarray tile (in DNA bp) is also shown.
Supplementary Figure S17. Cis-regulatory remodelling of vertebrate TAL1 loci during
evolution. Left of the schematic shows the evolutionary tree and divergence of TAL1 across
more than 360 million years of vertebrate evolution. Right of the schematic shows the
organization of the TAL1 (green) and PDZK1IP1 (black) genes with respect to the TAL1
promoters and the +51 erythroid and +19/+20/+21 stem cell enhancers. Ets, GATA and Ebox DNA sequence motifs which are evolutionarily conserved within these regulatory
elements are also shown. The “switch”15 in the stem cell enhancer motif from a GATA/E-box
(frog and chicken) to a GATA/Ets box (mammals) is shown. While the protein-coding
content of the TAL1 regulon has remained unchanged throughout 360 million years of
vertebrate evolution,15 the organization of its cis-regulatory circuitry has shown evidence of
evolutionary remodelling. Despite this, vertebrate patterns of TAL1 expression have
remained highly conserved,16-21 suggesting that mechanisms which circumvent remodelling
may facilitate TAL1 function to be preserved across species. The TAL1 hubs we describe
here may provide answers to this question. Given that loss of a single GATA factor is
sufficient to abrogate chromatin looping and disassemble the TAL1 active hub, all that may
be required for hub formation are GATA factors bound at evolutionarily conserved GATA
motifs. Such motifs are present at TAL1 promoters and enhancers throughout evolution. Thus,
alterations in the composition of other transcription factor motifs (eg. Ets or E-box motifs) at
TAL1 cis-regulatory elements through vertebrate evolution15 may not be problematic for hub
assembly and co-ordination of TAL1 transcription.
Supplementary Figure S18. Models of STIL, CMPK1 and PDZK1IP1 transcription
dependence on the TAL1 active hub. (A) Linear schematic diagram showing the
organization of the human TAL1 locus. Details are as described in Figure 1. (B) The
recruitment model. The proximity of the STIL, CMPK1 and PDZK1IP1 promoters to the
TAL1 active hub favours the recruitment of Pol II (shown in blue) and other factors to their
respective promoters (shown by blue arrows connecting the hub to the promoters) in a hubdependent step (i). Transcription can then occur from these promoters in a hub-independent
manner (ii). In this model, loss of this proximity between the TAL1 promoters and the
flanking genes would result in a decrease of recruitment of these factors to the relevant
promoters (as we observed for STIL). However, our Pol II occupancy data does not support
this model for either PDZK1IP1 or CMPK1. (C) Direct interaction model. The promoters of
STIL, CMPK1 and PDZK1IP1 engage directly with the Pol II machinery within the TAL1
active hub which is entirely hub-dependent at all stages of transcription. Transcription is
facilitated by the movement of chromatin through the hub with loops becoming large or
smaller accordingly (steps i to iv shown in this figure with respect to STIL transcription –
however the same could apply for CMPK1 and PDZK1IP1 depending on the direction that
DNA within the loops traverse through the hub). This model is compatible with the data that
both PDZK1IP1 and STIL show contact with the hub at various points within their gene
bodies. For both models presented in B and C, the production of a full-length STIL mRNA
would require the transient removal of CTCF and RAD21 from the -31 element [B(ii) and
C(iv)]. Consistent with this, we demonstrated that CTCF and RAD21 binding at this element
is dynamic (Figure 5). Locations of promoters, enhancers, CTCF/RAD21 elements, direction
of transcription of relevant genes (grey arrows), GATA1, TEC and Pol II recruitment, and
CTCF/RAD21 occupancies are also shown as in Figures 3, 5 and 7. Note: The TAL1 -10
enhancer has not been shown to be contact with the hub for simplicity of the models shown in
this figure. This interaction, however, is shown in the erythroid model presented in Figure 7.
GENE
FORWARD 5' → 3'
REVERSE 5' → 3'
ACTB
AGAAGGAGATCACTGCCCTGG
CACATCTGCTGGAAGGTGGAC
TUBB
GCAGATGCTTAACGTGCAGA
CAATGAAGGTGACTGCCATC
GAPDH
AGGTCCACCACTGACACGTTG
AGCTGAACGGGAAGCTCACT
TAL1
TTTTGTGAAGACGGCACGG
TGAGAGCTGACAACCCCAGG
PDZK1IP1
TTGCAATCGCCTTTGCAGTC
TCCATCTGCCTTGTTTCCGA
STIL
ATGCACATAACGTGGATCACG
TCCATGCTCAAATCCACACC
CMPK1
TCTCATGAAGCCGCTGGT
TCCTGCAGAAAGGTGTGTGT
GATA1
CAAGCTACACCAGGTGAACCG
AGCTGGTCCTTCGGCTGC
LDB1
CCAGCTAGCACCTTCGCC
GTCGTCAATGCCGTTGGC
TCF3
AGGTGCTGTCCCTGGAGGAG
CCGACTTGAGGTGCATCTGG
GATA2
ATCAAGCCCAAGCGAAGACT
CATGGTCAGTGGCCTGTTAAC
Supplementary Table S1. Oligonucleotide primer pairs used to determine the
expression levels of gene transcripts using SyBr green-based quantitative PCR. First
column shows the gene name. Second and third columns show the DNA sequences for the
forward and reverse primers respectively.
Protein
Epitope
Source
Catalogue No.
TCF3 (E47)
E47 (N-649)
Santa Cruz Biotechnology
sc-763
LDB1
CLIM-2 (N-18)
Santa Cruz Botechnology
sc-11198
GATA1
GATA1 (M-20)
Santa Cruz Biotechnology
sc-1234
CTCF
CTCF (C-20)
Santa Cruz Biotechnology
sc-15914
RAD21
Anti-RAD21
Abcam
ab992
RNA pol II
Anti-RNA pol II
Abcam
ab5408
Supplementary Table S2. Antibodies used for chromatin immunoprecipitation (ChIP).
First column shows the protein and isoform name. Second column is the epitope to which the
antibody is raised. Third and fourth columns are the commercial source and catalogue
number for each antibody respectively.
GENE
ASSAY
FEATURE
FORWARD 5' → 3'
REVERSE 5' → 3'
TAL1
TAL1 +137
neg. control
TTTGCAGTGCCCTGTTCTTAG
TGTTGGCTACCTTGATCATGTG
TAL1 +57
insulator
CTGCAATATCTCGAGCAGCCAC
GAACAACACGGGCATGGAGATG
TAL1 +51
erythroid
enhancer
TGACCTTACAGCCCTTCACCC
AGCTCCCTGCTCCCAGCAC
TAL1 +40
insulator
GTCAATGTCCACCGTCCCTTTC
GGAGCCAGTTTGCTGCTGAAG
TAL1 +32
neg. control
GGATTGAGGAGAGGGCATGTG
GCACGGCTGTGGAGCTATG
TAL1 +20
stem cell
enhancer
TTCGAACGGATCACATCCTG
TTGGTCCGAGCTCTGCCTC
promoter 1a
CGCCGCAGAGATAAGGCACT
CCCACTCCCTCCGGTGAAAT
TAL1 -28
neg. control
TGTCACGCAGGATATAGTGGCA
TTAGGAGGCTGAAGTAGGAGGAC
TAL1 -30
neg. control
GTGCCCTTGAGAGCCTAGGG
CCTCAACAGCCTGTCTTATAATTG
TAL1 -31
insulator
CAACCAGGTGCTGCTTGAGTC
GAGAAGAGCTGCTGGGAAGG
TAL1 -35
neg. control
TGGTAACCTGGGAACAAGGTGT
ACTGGCTCCTTCTCATCATTCAGG
TAL1 -37
neg. control
CCACTGTGCCCAGCCTATTT
GTGAGCCAAGACAGTGCCATT
TAL1 -94
neg. control
CAGGGTATATCTATGTTCCTAGCAC
GATTGATGAATGGTGACAAAGC
P
TAL1
CMPK1
P
CMPK1
promoter
GCGCAGAGGTTAGCGTGTC
GCCTCTAACCCAAATCCGC
STIL
P
STIL
promoter
GCTCCTACCCTGCAAACAGAC
GGAAACCAGGAGCACAAAGC
TBP
P
TBP
promoter
(pos. control)
GACCTATGCTCACACTTCTCATGG
CGTTGATAATGTCACTTCCGCCAG
HNF4A
HNF4A
CTCF/Rad21
(pos. control)
GATTATCACACCTTGAGGGTAGGG
ACTGTCCTGTACATTGTCCCTG
Supplementary Table S3. Oligonucleotide primer pairs used to determine chromatin
immunoprecipitation (ChIP) enrichment levels across the TAL1 locus. First column
shows the gene name. Second column shows the region assayed relative to the gene locus
(numerical designations refer to distance in kb from the relevant gene promoter; - = upstream
from promoter, + = downstream from promoter). Third column describes the function of the
element assayed. Fourth and fifth columns show the DNA sequences for the forward and
reverse primers respectively.
BAIT
P
PREY
TAL1
SPECIES
BAIT PRIMER 5' → 3'
human
CTCTGTGTCCGAGTGTGGTG
PREY PRIMER 5' → 3'
TAL1 +64
TCTTCCTAGCCTCGATGGTC
TAL 1 +51
CGCAGAAAAGCAAGGATAGG
TAL 1 +46
GTGAGAACCAGGACCCAGAA
TAL 1 +19
CCCACAATGGAGAGGATGAC
TAL 1 +15
AGCCTGAGTGCTACAAAGGT
TAL1 -8
GCGTGAAAGTCAACCATGTG
TAL1 -10
CCTGAACCAGGAGTTTGTCAC
TAL1 -25
TGGCAAGTAGGCTGGAACTT
TAL1 -31
GTTACTGGCACCCCCTGTT
TAL1 -41
AGTGGAAGAGCCTCCCTTTG
TAL1 -72
GGTGATCCACCTGCCTCAT
TAL1 -81
ATGCTCGCTCTTGCATTCCT
TAL1 -85
TGCAAAGGCCCTGAGTTACA
TAL1 +57
human
GGCAACCATGGGTCTAAAGCAT
TAL1 +46
GTGAGAACCAGGACCCAGAA
TAL1 +40
GAAACCTGGGAGTCACCTGAA
TAL1 +30
TTACAGACGCATGCCACCTC
TAL1 -25
TGGCAAGTAGGCTGGAACTT
TAL1 -31
GTTACTGGCACCCCCTGTT
TAL1 -41
AGTGGAAGAGCCTCCCTTTG
TAL1 +53
human
TGGGAAGAAATGGCATCTACGC
TAL1 +46
GTGAGAACCAGGACCCAGAA
TAL1 +40
GAAACCTGGGAGTCACCTGAA
TAL1 +30
TTACAGACGCATGCCACCTC
TAL1 -25
TGGCAAGTAGGCTGGAACTT
TAL1 -31
GTTACTGGCACCCCCTGTT
TAL1 -41
AGTGGAAGAGCCTCCCTTTG
TAL1 +40
human
TAL1 +64
TCTTCCTAGCCTCGATGGTC
TAL1 +53
TGGGAAGAAATGGCATCTACGC
TAL1 +46
GTGAGAACCAGGACCCAGAA
TAL1 -25
TGGCAAGTAGGCTGGAACTT
TAL1 -31
GTTACTGGCACCCCCTGTT
TAL1 -41
AGTGGAAGAGCCTCCCTTTG
TAL1 -31
human
TCTTCCTAGCCTCGATGGTC
TAL1 +53
TGGGAAGAAATGGCATCTACGC
TAL1 +46
GTGAGAACCAGGACCCAGAA
TAL1 +40
GAAACCTGGGAGTCACCTGAA
TAL1 +30
TTACAGACGCATGCCACCTC
human
CGCAGAAAAGCAAGGATAGG
TAL1 +30
TTACAGACGCATGCCACCTC
TAL1 +19
CCCACAATGGAGAGGATGAC
TAL1 +15
AGCCTGAGTGCTACAAAGGT
ERCC3
TAL1
Ercc3
GTTACTGGCACCCCCTGTT
TAL1 +64
TAL1 +51
P
GAAACCTGGGAGTCACCTGAA
human
CCCTGGACATGTCGGAAA
mouse
TGCCCCTTAAGCTTGGTTTC
AGGGGTTTGCTCTTTGAGGT
TAL1 +55
TGGGAACAGATTGTGGGACT
TAL1 +40
TGCTGGCTTCCTCTCTTTTC
TAL1 +30
AAAAGCCTCTCCCTCTCCAG
TAL1 +18
CCTAGATGAGGGGTGAGAGC
TAL1 +15
AGCCTTTCCCCTTGATGTTC
TAL1 -5
CGACCTTCCCTACGTCTTTG
TAL1 -9
GAGAACAGATGGGCTTGGTC
mouse
AACGGACAGCTTTAGGCAGA
TGGCTGTAGTTGTGCCTTCTC
Supplementary Table S4. Oligonucleotide primer pairs used for 3C analysis in human
and mouse cells. First column shows the 3C “bait” region and gene locus from which it is
derived. Naming system is as per Supplemental Table 3. Second column shows the “prey”
region used in 3C primer combinations with the “bait”. Third column is the species in which
the assays were performed. The fourth and fifth columns are the “bait” and “prey” primer
sequences respectively.
Primer Name
Sequence 5' → 3'
PTAL1-1b (primer extension)
biot-GGCGGCGTTGGCTGCTTCTAAGTG
PTAL1-1b (nested PCR primer)
GACAGGCTCTGTGTCCGAGT
Blunt-ended adapter (forward)
ACAGGTTCAGAGTTCTACAGTCCGAC
Blunt-ended adapter (reverse)
p-GTCGGACTGTAGAACTCTGAAC
Adapter PCR primer
GGTTCAGAGTTCTACAGTCCGAC
Supplementary Table S5. Oligonucleotide primers used for 4C sample preparation. First
column shows the primer name and its use in constructing the 4C library. Second column
shows the primer sequence. Biotinylation = biot, p = 5’ phosphate.
REFERENCES
1.
Fujiwara T, O'Geen H, Keles S, et al. Discovering hematopoietic mechanisms through
genome-wide analysis of GATA factor chromatin occupancy. Mol Cell. 2009;36(4):667-681.
2.
Dhami P, Bruce AW, Jim JH, et al. Genomic approaches uncover increasing
complexities in the regulatory landscape at the human SCL (TAL1) locus. PLoS One.
2010;5(2):e9059. Prepublished on 2010/02/09 as DOI 10.1371/journal.pone.0009059.
3.
Dhami P. The SCL gene and transcriptional control of haematpoiesis. PhD thesis,
University of Cambridge, United Kingdom. 2005.
4.
Sanda T, Lawton LN, Barrasa MI, et al. Core transcriptional regulatory circuit
controlled by the TAL1 complex in human T cell acute lymphoblastic leukemia. Cancer Cell.
2012;22(2):209-221. Prepublished on 2012/08/18 as DOI S1535-6108(12)00256-5 [pii]
10.1016/j.ccr.2012.06.007.
5.
Gottgens B, Nastos A, Kinston S, et al. Establishing the transcriptional programme for
blood: the SCL stem cell enhancer is regulated by a multiprotein complex containing Ets and
GATA factors. Embo J. 2002;21(12):3039-3050.
6.
Gottgens B, Broccardo C, Sanchez MJ, et al. The scl +18/19 stem cell enhancer is not
required for hematopoiesis: identification of a 5' bifunctional hematopoietic-endothelial
enhancer bound by Fli-1 and Elf-1. Mol Cell Biol. 2004;24(5):1870-1883.
7.
Ogilvy S, Ferreira R, Piltz SG, Bowen JM, Gottgens B, Green AR. The SCL +40
enhancer targets the midbrain together with primitive and definitive hematopoiesis and is
regulated by SCL and GATA proteins. Mol Cell Biol. 2007;27(20):7206-7219. Prepublished
on 2007/08/22 as DOI MCB.00931-07 [pii]
10.1128/MCB.00931-07.
8.
Follows GA, Ferreira R, Janes ME, et al. Mapping and functional characterisation of a
CTCF-dependent insulator element at the 3' border of the murine Scl transcriptional domain.
PLoS One. 2012;7(3):e31484. Prepublished on 2012/03/08 as DOI
10.1371/journal.pone.0031484
PONE-D-11-12685 [pii].
9.
Kurukuti S, Tiwari VK, Tavoosidana G, et al. CTCF binding at the H19 imprinting
control region mediates maternally inherited higher-order chromatin conformation to restrict
enhancer access to Igf2. Proc Natl Acad Sci U S A. 2006;103(28):10684-10689. Prepublished
on 2006/07/04 as DOI 0600326103 [pii]
10.1073/pnas.0600326103.
10.
Zhao H, Dean A. An insulator blocks spreading of histone acetylation and interferes
with RNA polymerase II transfer between an enhancer and gene. Nucleic Acids Res.
2004;32(16):4903-4919.
11.
Zhou Y. Transcriptional regulation of the stem cell leukaemia gene (SCL/TAL1) via
chromatin looping. PhD thesis, University of Cambridge, United Kingdom. 2013.
12.
A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol.
2011;9(4):e1001046. Prepublished on 2011/04/29 as DOI 10.1371/journal.pbio.1001046.
13.
Essien K, Vigneau S, Apreleva S, Singh LN, Bartolomei MS, Hannenhalli S. CTCF
binding site classes exhibit distinct evolutionary, genomic, epigenomic and transcriptomic
features. Genome Biol. 2009;10(11):R131. Prepublished on 2009/11/20 as DOI gb-2009-1011-r131 [pii]
10.1186/gb-2009-10-11-r131.
14.
Breit TM, Mol EJ, Wolvers-Tettero IL, Ludwig WD, van Wering ER, van Dongen JJ.
Site-specific deletions involving the tal-1 and sil genes are restricted to cells of the T cell
receptor alpha/beta lineage: T cell receptor delta gene deletion mechanism affects multiple
genes. J Exp Med. 1993;177(4):965-977. Prepublished on 1993/04/01 as DOI.
15.
Gottgens B, Ferreira R, Sanchez MJ, et al. cis-Regulatory remodeling of the SCL
locus during vertebrate evolution. Mol Cell Biol. 2010;30(24):5741-5751. Prepublished on
2010/10/20 as DOI MCB.00870-10 [pii]
10.1128/MCB.00870-10.
16.
Gottgens B, Barton LM, Gilbert JG, et al. Analysis of vertebrate SCL loci identifies
conserved enhancers. Nat Biotechnol. 2000;18(2):181-186.
17.
Green AR, Lints T, Visvader J, Harvey R, Begley CG. SCL is coexpressed with
GATA-1 in hemopoietic cells but is also expressed in developing brain. Oncogene.
1992;7(4):653-660.
18.
Jaffredo T, Bollerot K, Sugiyama D, Gautier R, Drevon C. Tracing the hemangioblast
during embryogenesis: developmental relationships between endothelial and hematopoietic
cells. Int J Dev Biol. 2005;49(2-3):269-277. Prepublished on 2005/05/21 as DOI 041948tj
[pii]
10.1387/ijdb.041948tj.
19.
Mead PE, Kelley CM, Hahn PS, Piedad O, Zon LI. SCL specifies hematopoietic
mesoderm in Xenopus embryos. Development. 1998;125(14):2611-2620.
20.
Sinclair AM, Gottgens B, Barton LM, et al. Distinct 5' SCL enhancers direct
transcription to developing brain, spinal cord, and endothelium: neural expression is mediated
by GATA factor binding sites. Dev Biol. 1999;209(1):128-142.
21.
Zhang XY, Rodaway AR. SCL-GFP transgenic zebrafish: in vivo imaging of blood
and endothelial development and identification of the initial site of definitive hematopoiesis.
Dev Biol. 2007;307(2):179-194. Prepublished on 2007/06/15 as DOI S0012-1606(07)00737-3
[pii]
10.1016/j.ydbio.2007.04.002.