Fulltext - Brunel University Research Archive

Genome wide classification and characterisation of CpG
sites in cancer and normal cells.
Mohammadmersad Ghorbani1, Michael Themis2 and Annette Payne1*
1
Department of Computer Science, 2 Department of Biosciences, Brunel University,
Uxbridge, Middlesex, UB8 3PH, UK
* To whom correspondence should be addressed. [email protected]
Key words: motif, pattern identification, methylation in cancer, computational
analysis, pattern searching algorithm, CpG, DNA sequence
Abstract
This study identifies common methylation patterns across different cancer types in
an effort to identify common molecular events in diverse types of cancer cells and
provides evidence for the sequence surrounding a CpG to influence its susceptibility
to aberrant methylation. CpGs sites throughout the genome were divided into four
classes: sites that either become hypo or hyper-methylated in a variety cancers
using all the freely available microarray data (HypoCancer and HyperCancer
classes) and those found in a constant hypo (Never methylated class) or hypermethylated (Always methylated class) state in both normal and cancer cells. Our
data shows that most CpG sites included in the HumanMethylation450K microarray
remain unmethylated in normal and cancerous cells; however, certain sites in all the
cancers investigated become specifically modified. More detailed analysis of the
sites revealed that majority of those in the never methylated class were in CpG
islands whereas those in the HyperCancer class were mostly associated with miRNA
coding regions. The sites in the Hypermethylated class are associated with genes
involved in initiating or maintaining the cancerous state, being enriched for
processes involved in apoptosis, and with transcription factors predicted to bind to
these genes linked to apoptosis and tumourgenesis (notably including E2F). Further
we show that more LINE elements are associated with the HypoCancer class and
more Alu repeats are associated with the HyperCancer class. Motifs that classify the
classes were identified to distinguish them based on the surrounding DNA sequence
alone, and for the identification of DNA sequences that could render sites more
prone to aberrant methylation in cancer cells. This provides evidence that the
sequence surrounding a CpG site has an influence on whether a site is hypo or
hyper methylated.
Author Summary
This study identifies common methylation patterns across different cancer types in
an effort to identify common molecular events in diverse types of cancer cells. In this
paper we describe our meta- analyses of all the CpG sites throughout the genome
from all the studies using the HumanMethylation450K microarray available of both
normal and cancer cells using computational and bioinformatics methods. We
believe that this work provides evidence that certain CpGs are more likely to be
aberrantly methylated than others. Also we have characterised the properties of the
CpGs that are and are not aberrantly methylated to suggest reasons why they
should be so. Our data shows that most of the CpG sites studied remain
unmethylated in normal and cancerous cells; however, certain sites in all the cancers
investigated become specifically modified. Motifs and features that classify the
classes were identified to distinguish them based on the surrounding DNA sequence
alone, and for the identification of DNA sequences and features that could render
sites more prone to aberrant methylation in cancer cells. This showed that the
sequence surrounding a CpG site could have an influence on whether a site is
aberrantly methylated in the oncogenic state and whether that aberrant methylation
is hypo or hyper methylated.
Introduction
DNA methylation involving the addition of methyl groups to CpG sequences is one of
the mechanisms used by the cells to control gene expression gene silencing being a
major biological consequence of DNA methylation. This phenomenon, known as
epigenetic control has been reported to be important to mammalian development, X
inactivation and genomic imprinting [1]. Epigenetic changes have been shown to
occur in both healthy cells, where it assists in regulating gene expression during
development, and diseased cells, where it is associated with aberrant gene
expression, most notably in oncogenesis [2]. Many studies have also shown that
differentially methylated CpG sites can act as a biomarkers in identifying disease and
specific CpG site methylation can be a signature for specific types of tumours [3],[4],
[5], [6], [7]. In tumour development global DNA hypomethylation is often followed by
hypermethylation at specific CpGs [8], [9], [10], [11], [12]. Closer inspection of these
studies and the fact that the cancer phenotype is associated with aberrant
expression of a significant number of the same genes e.g. TP53 and RB1, in
different cancer types, would suggest that there are common pathways and
molecular mechanisms that can be identified across the different types of cancer.
Further the differentially methylated CpGs could be informative in discovering
mechanisms leading to malignancy. Factors that influence CpG methylation include
chromatin accessibility, which have been shown to modulate methylation, DNASE1
footprinting, transcription factor levels and CTCF binding, where higher levels and
the act of binding protect DNA from methylation. [13], [14], [15], [16] [17].
Particular DNA motifs have been identified in previous studies that may be used to
predict the methylation status of DNA sequences in normal cells. Notably
methylation is more prevalent in regions of low CpG density, with regions of
intermediate density being most variably methylated [18]. Yamada and Satou [19]
employed machine learning methods, specifically support vector machine and
random forest methods, using previously reported methylation data, to analyse DNA
sequence features to predict methylation status. They revealed that frequencies of
sequences containing CG, CT or CA are different when they compared
unmethylated and methylated CpG islands. Ali and Seker [20] used an adapted Knearest neighbour classifier method to predict the methylation state on
chromosomes 6, 20 and 22 in various tissues. They identified four feature sub-sets
which showed that methylated CpG islands can be distinguished from unmethylated
CpG islands based on DNA sequence. Lastly Previti et al., [21] used a data mining in
the absence of supervised clustering to predict the methylation status of CpG islands
in different tissues. These studies showed that there are significant differences in the
sequences of CpG islands (CGIs) that predisposed them to methylation. Other
studies have identified that the density and spacing of CpGs, the histone code
(methylation of histone 3 at Lysine 4 (H3K4), CTCF protein binding and REST
protein binding can influence DNA methylation [22], [23], [24], [25], [26], [27], [28],
[29]. In their review of computational epigenetics called “Computational Epigenetics”,
Bock and Lengauer [18] highlighted the fact that, although it is clear that much work
has been done to document the epigenetic state of the genome (much of it reported
in the ENCODE project [17]), to date, work in the area of de novo DNA methylation
prediction is limited. One study however has shown that aberrant methylation has
been shown to be associated with mutations where methylation in the MGMT
promoter has been demonstrated to be closely associated with G:C to A:T mutations
[30].
Thus whilst studies have identified motifs associated with normal methylation
patterns few studies have attempted to search for motifs associated with aberrant
methylation using computational techniques, one study by Feltus et al., [31] used
Restriction Landmark Genome Scanning software to identify methylation resistant
and methylation prone motifs based on DNA sequence and another by Lu et al., [32]
has been carried out using word composition computation. Lastly Gorbani et al., [33]
have suggested that the sequence surrounding a CpG can be used to predict
aberrant methylation in trinucleotide repeat diseases using a pattern searching
algorithm. Their results suggest that the sequence surrounding a CpG can be used
to predict aberrant methylation. In another study by McCabe et al., [34] patterns were
identified using machine learning techniques and used for pattern matching where
DNA signatures and a co-occurrence with polycomb binding were found to predict
aberrant CpG methylation in cancer cells. The reason for recruitment of the de novo
DNA methyltransferases to specific genomic targets however remains largely
unknown. Dnmt3 and certain transcription factors have been shown to interact with
each other to target methylation Hervouet et al., 2009 [35] and recently it has been
reported that DNMT3L and the lysine methyltransferase G9a are required for the
initiation of proviral de novo DNA methylation [36], [37]. Lastly Rowe et al., [38] have
shown that ERV sequences are sufficient to direct rapid de novo methylation of a
flanked promoter in embryonic stem (ES) cells.
In this study we have used a pattern searching algorithm to identify motifs in the
DNA surrounding aberrantly methylated CpGs in the DNA of cancer cells from
multiple cancer types and tissues so as to investigate whether common patterns of
methylation across these different cancers can be identified. Previous studies have
concentrated on one cancer or tissue type. Further most former studies that
analysed surrounding DNA sequences are based on the sequences surrounding
CpG islands or two classes of islands, methylation prone and methylation resistant.
CpGs not associated with islands were not included [31]. With more data becoming
publicly available about the methylation status around single CpG sites not
associated with islands, it is now possible to investigate increasing numbers of sites
and more additional classes of DNA methylation. In this study, we examined the
DNA sequences surrounding CpG sites. We divided sites into four classes of DNA
methylation: sites that either become hypo or hyper-methylated in a variety cancers
(HypoCancer and HyperCancer classes) and those found in a constant hypo (Never
methylated class) or hyper-methylated (Always methylated class) state in both
normal and cancer cells. Thus we have divided the CpG sites into four classes:
1. Never methylated in either cancer or normal cells (class NM)
2. Always methylated class in cancer and normal cells (class AM),
3. Hypomethylated in normal and hypermethylated in cancer (class
HyperCancer)
4. Hypermethylated in normal and hypomethylated in cancer (class
HypoCancer).
Then we investigated the DNA sequence flanking these sites to find out if we
could find common sequences or motifs in each class. We have carried out this
work in an attempt to better understand a possible influence of DNA sequence on
aberrant methylation.
Objectives of this work
1. Identify four classes of CpG sites based on data from diverse cancer types
and normal tissue.
2. Identify methylation sites that could act as biomarkers.
3. Analyse the genes and DNA features associated with differentially methylated
CpG sites to identify any links with carcinogenesis.
4. To identify DNA motifs in the DNA sequence surrounding a CpG that could
render a CpG prone to aberrant methylation in cancer.
5. Using these motifs, suggest prediction criteria that could be used to identify
CpG sites that are differentially methylated in normal and cancer cells in silico.
Results
CpG sites and their classes
Using the method described 653 CpG sites were identified that could be divided into
the four classes according to their methylation status: 447 CpG sites in the Never
methylated class (class NM), 148 sites in the Always methylated class (class AM),
51 hypomethylated in normal and hypermethylated in cancer (class HyperCancer)
and 7 sites hypermethylated in normal and hypomethylated in cancer (class
HypoCancer). We mapped the positional relationship of the CpG sites to CpG
islands in the UCSC browser. 81 CpG sites were not in any positional relationship
with a CpG Island. Never methylated sites are predominantly within islands. Most of
the CpGs in the two classes of variably methylated sites have no relationship to any
CpG islands. Always methylated CpGs are spread among the different positional
relationships to UCSC CpG islands. These results are shown in figures 1 and 2.
MicroRNA results
The UCSC table browser was used in order to find out if methylation of these CpG
sites could interfere with the expression of microRNA coding regions since miRNAs
are suggested to interact with epigenetic machinery [39] and are important regulators
of gene expression that are aberrantly regulated in cancer through changes in
methylation [40]. The track “miR Sites High” in table “miRcode Predicted MicroRNA
Target Sites microRNA” was investigated. A total number of 241 of the CpG sites
were shown to overlap with microRNA coding sites. The results are depicted in figure
3. 148 NM class sites overlap (33%), 68 AM class sites (46%), 25 Class
HyperCancer (49%) and 0 Class HypoCancer.
Sixty four of these hits are to unique microRNAs to a class (provided as
supplementary file Table 2).17 are unique to NM class sites, 4 microRNA are unique
to normal_hypomethylated and cancer hypermethylated (HyperCancer), 7 are
unique to AM class sites. Table 2 shows the unique microRNA sites and figure 4
shows the distribution of microRNAs species between the 3 classes they were
identified in.
Genes neighbouring the CpGs in Class HyperCancer
The genes neighbouring the CpGs found in Class HyperCancer are listed in table 3
along with their function as identified by Cormine software
http://www.coremine.com/medical/. This shows that the vast majority have some link
with cancer or tumourgenesis.
On manual, DAVID and IPA (http://www.ingenuity.com/products/ipa) software
functional clustering analysis [41, 42] the most enriched gene cluster was found to
be one with a functional key word of “Apoptosis” indicating that a large proportion of
these genes are involved or predicted to be involved in apoptosis.
DNA binding protein sites near these genes
The genes listed in Table 3 above were analysed for their predicted DNA binding
protein sites including 5Kbp up and down stream of their coding regions using
oPossium transcription factor binding analysis software
(http://opossum.cisreg.ca/oPOSSUM3/) which looks for and reports DNA protein
binding motifs in gene sequences using their consensus binding sites. The most
enriched predicted binding site according to oPossum [43] was for MZF1_1-4, a zinc
finger transcription factor (TF) which is suspected as one regulator of transcriptional
events during hemopoietic development and has been implicated in upregulating
apoptosis by interacting with LDOC1 and enhancing the activity of LDOC1 for
inducing apoptosis [44], thus if methylation in cancer prevents its binding this could
affect the cells ability to enter apoptosis. MZF-1 has also been show to supress
tumourgenicity [45].
The second most enriched, KLF4, contributes to the down-regulation of p53/TP53
transcription [46], which is important in tumorigenesis.
These genes are also enriched for the E2F family of transcription factors as
assessed by oPossum software; 19 of the genes are predicted to bind (equivalent to
55.88%) this compares to 32.77% of all genes in the human genome.
Genes neighbouring the CpGs in Class HypoCancer
The genes neighbouring the CpGs found in Class HypoCancer are listed in table 4
along with their function as described by Cormine software
http://www.coremine.com/medical/.
When analysed for transcription factor binding NOS1AP was the protein which had
the most TF motifs associated with it and these include Sox2, RREB1, Evi1, NR3C1
with the highest z-score as determined by oPossum and notably E2F1. None of the
other genes in this list were predicted to bind E2F type transcription factors.
LINE and Alu repeats
Since methylation changes in cancers have been shown to be associated with
repetitive elements, particularly LINE elements and ALU repeats [47, 48], we
analysed 1000bp of the DNA surrounding the CpGs in class HyperCancer and
HypoCancer for the presence of LINE and ALU repeats using the UCSC [49]
Genome Browser. Using the “custom annotation tracks” feature and reporting we
were able to identify and count the number and position of these repeats. The results
show that proportionally more LINE elements are associated with class HypoCancer,
the hypomethylated group of CpGs and more Alu repeats are associated with class
HyperCancer, the hypermethylated group of CpGs see Table 5.
Discovered motifs
We used MEME software [50] as described in the methods section to identify motifs
that distinguish the 4 classes of CpG sites. Table 6 shows the top 5 motifs, based on
p-value, which were found near the four classes of CpG site and their length and
sequences, as determined by MEME.
These motifs were then compared to known DNA binding protein motifs: The only
one of significance was the M3A motif which binds OCT1 which is methylated in
cisplatin resistant cells [51]. Interestingly STAT3 which is involved in cell division in
cancer cells is moderated by OCT1 [52].
Classification results
WEKA Analysis :
Using 10-fold cross validation methodology we used 3 algorithms to classify the CpG
sites according to their class, based on their motifs. 1) a support vector machine
algorithm resulted in 69.5253 % being correctly classified 2) a logistic algorithm
resulted in 73.9663 % 3) a J48 algorithm resulted in 71.2098 % correct prediction of
each CpG site into one of the 4 classes (NM, AM classes, Hypermethylated in
normal and Hypomethylated in cancer (classes C and A) or vice versa).
Since the CpGs that distinguish between normal and cancer calls are of particular
interest we performed a similar classification analysis using the Hypermethylated in
normal and Hypomethylated in cancer or vice versa only.
Using 10-fold cross validation methodology we used the 3 algorithms to classify the
distinguishing CpG sites according to their class based on their motifs. 1) a support
vector machine algorithm resulted in 98.2759 % correctly classified 2) a logistic
algorithm resulted in 96.5517 % 3) a J48 algorithm resulted in 94.8276 % prediction
of each of the 2 classes of CpG, Hypermethylated in normal and Hypomethylated in
cancer or vice versa.
Figure 5 illustrates that the m13C (TCCAAGGGACACC) motif doesn’t occur in the
flanking DNA sequences of 50 out of 51 of the CpGs identified in class HyperCancer
and occurs in all 7 of the sequences surrounding CpGs identified in class
HypoCancer. This motif therefore is the most discriminative motif using the J48
algorithm to classify the CpGs into the 2 classes.
Discussion
In this study we have shown that it is possible to divide the CpG sites in the human
genome into 4 classes based on the methylation status in normal and cancer cells
across many forms of cancer using multiple data sets: sites that are hypomethylated
in normal and hypermethylated in cancer (class HyperCancer), hypermethylated in
normal and hypomethylated in cancer cells (class HypoCancer), sites that are
always hypermethylated in both normal and cancer cells and sites that are always
hypomethylated in both normal and cancer. Interestingly, the results show that by far
the largest number of CpG sites are unmethylated in both the cancerous and normal
cell states and that those CpG sites that are differentially methylated in cancer cells
are methylated suggesting that the transition to the tumourgenetic phenotype
involves the methylation of particular CpG sites, which may be the cause of aberrant
gene expression found in cancer cells. We suggest further from these results that the
sites in former two classes may be useful biomarkers for cancer cells when
undertaking methylation analysis. The data used in this study was all that was
available at the time and we acknowledge that as more data becomes available,
most notably in the TCAG data base further work to validate these results will be
required, using new software customised to analyse the data in these files which is in
a different format to those in GEO. These results however represent a statistical
analysis of an unbiased large sampling of the publicly available data and therefore
suggest that our results will hold true for the whole population of data.
The sites in the four classes were analysed for their distinguishing characteristics
and properties. Firstly their position in relation to CpG islands was deduced. Sites
that are never methylated are predominantly within CpG islands and sites that are
aberrantly hyper or hypomethylated in cancer cells, are not, perhaps suggesting that
islands afford protection against global methylation changes in cancer cells.
The proximity to microRNA coding regions showed a greater percentage of the
HyperCancer CpGs class are associated with one or more miRNAs coding
sequences than any other class, with the HypoCancer class having none. Further,
the number of times a particular miRNA coding region is associated with a class of
CpG shows that never methylated CpGs had a greater number of microRNA sites
associated with them per site, with some particular microRNAs identified repeatedly
(up to 15 times). Several studies have provided evidence that disregulated miRNA
expression contributes to the initiation and progression of human cancers [53, 54,
55, 56, 40]. Hypermethylation of micro RNAs has been shown to be present in many
cancer types and could be the cause of this dysregulation. Thus it follows that the
presence of miRNAs near CpGs could contribute to the hypermethylation of these
CpGs in cancer cells.
The genes within 1Kb of the Hypocancer or Hypercancer classes of CpG sites that
show a distinction in methylation status were identified and functionally
characterised. This analysis showed that these sites are associated with genes that
are involved in initiating or maintaining the cancerous state of cells, with those
associated with class HyperCancer enriched for their involvement in apoptosis.
Further, the transcription factors predicted to bind the genes associated with class
HyperCancer are enriched for those linked to apoptosis and tumourgenisis (including
E2F) indicating a possible mechanism by which the aberrant methylation may exert
an effect. This strongly suggests that the differential methylation seen in these sites
influences functionally pathogenic processes seen in cancerous cells probably
instigated through aberrant gene expression.
LINE and Alu repeats associated with the differentially methylated classes were
identified and the results showed that proportionally more LINE elements are
associated with the HypoCancer class of CpGs and more Alu repeats are associated
with the HyperCancer class of CpGs. Could LINE elements therefore protect against
de novo methylation and Alu repeats render CpGs more susceptible? Interestingly
hypomethylation of LINE-1 and Alu have been suggested to be the cause of global
hypomethylation and genomic instability in many malignancies and autoimmune
diseases [57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] however not
all Alu sequences are hypomethylated in human cancers [73]. Alu sequences
located upstream of the CDKN2A promoter were found to be hypermethylated in
cancer cell lines [74], and an Alu sequence located in intron 6 of TP53 showed
extensive methylation in normal and cancer cells [74] and [75].
In order to see if the DNA sequence surrounding a CpG site has any influence on its
methylation state MEME software was used to identify distinguishing motifs for each
of the CpG classes and there similarity to known binding motifs for DNA binding
proteins was determined. Of the motifs identified only one had similarity of note,
which was the motif labelled M3A that showed similarity to the OCT1 motif, which
has an involvement in cell division via STAT3 [76].
We were able to classify the distinguishing motifs that were identified to enable the
classes to be distinguished based on DNA sequence alone and thus identify DNA
sequences that could render CpG sites more prone to aberrant methylation in cancer
cells. We were able to distinguish the 4 classes successfully with an accuracy of
74% and an attained an accuracy of 98% in distinguishing between sites that are
hypo and hyper methylated in cancer cells. Thus we have shown that the sequence
surrounding the CpG site has an influence on whether a site is aberrantly methylated
in the oncogenic state and whether that aberrant methylation is hypo or hyper
methylation. The motif that best distinguished the HyperCancer class from the
HypoCancer class was the m13C motif, which contains the binding motif for the
EBF1 and the RME transcription factors which have been shown to act as a tumour
suppressor in multiple tumour types notably leukaemia’s and colon cancer [77,78].
The NR2F1 binding motif is also present in m13C, another transcription factor with
oestrogen response element binding which is down regulated in many tumour types
[79]. Also NR4A2 which is a nuclear orphan receptor involved in neoplasms and a
potential therapy target binds to this sequence [80]. This suggests that this motif is
highly susceptible to hypo methylation in the cancer cells as it is seen predominantly
in the HypoCancer class and the demethylation of this motif may be linked to tumour
suppression functionality response in these cells.
Thus in summary, this study has shown that CpG sites in the human genome can be
divided into four classes depending on their methylation status in diverse normal and
cancer cell types. The two classes which show differential methylation in the normal
and cancerous state show associations with genes and DNA features that are
commensurate with the cancerous state. We show that only a distinct subset of CpG
sites may need to be analysed for their methylation status to determine the
cancerous state. In common with other more limited studies we have identified that
there are DNA motifs surrounding a CpG that render them susceptible to methylation
in the cancerous state. Further we show that CpG sites can be classified using the
DNA sequence surrounding them into one of the four classes, showing that the
methylation state of any given CpG can be predicted with a high degree of accuracy.
Methods
Datasets Selection
As stated, most previous studies were focused on CpG islands but with the advance
of technology there is now data available for single CpG sites not necessarily
associated with islands using microarray technology. A widely used platform is
HumanMethylation450k because of maximal coverage (in terms of number of CpGs
analysed in the genome per chip) and data from samples and tissue types available.
Here, we selected 16 data series to study cancer cells and respective normal
controls using raw data (signal Intensities from GEO in tabular format) available for
CpG sites contained in the HumanMethylation450K microarray (Table 1). We
included one set of data that investigated only normal samples to ensure the number
of normal data points to be nearer equal for normal and cancerous cells. The data
series used for this work are listed in Table 1. They represent all the publicly
available peer reviewed data sets obtainable at the time of undertaking the study.
The platform soft files were downloaded from GEO and converted to table format
with a custom filtering program (which merely filters out the data in the code of the
file from other code and reformats it into a table format) consisting of 16 data series
of 535 tissue samples of which 301 were from cancer samples and 234 were from
normal samples. These consisted of 259,783,695 data points representing the
methylation status of each particular CpG in a particular sample within a data set.
We selected these 16 data series so as to examine the methylation across all cancer
types possible and compare them with the wide variety of normal tissues. The data is
from multiple individuals which allowed us to find common patterns between
individuals as well as different cancer types. The individual samples selected for our
analysis from each data set were either untreated tumour or untreated normal
samples i.e. not all the samples from any one data set was included, only those
appropriate to this study. We wished to identify which CpGs are methylated as part
of the pathology common to all types of cancers in a variety of tissues in many
individuals. No data is included from cell lines or treated cells, and all are from either
normal tissue not adjacent to the tumour or cancerous cells from the patient thus the
difference in the methylation that we see is not due to cell culture conditioning or
neighbouring cell contamination. Additional matched control tissues from other
studies were also included so as to make the number of control data sets the same
as the number of cancerous ones which is important for our numerical based
analyses. The data series were tissue matched, cancer with the same tissue control
as far as possible with no one tissue type representing more than 40% of the
samples thus 60% common methylation state at any one CpG was chosen as the
threshold for the analyses to mitigate as far as possible any bias in tissue or cell type
(publicly available data in diverse normal tissue types due to ethical considerations
being the limiting factor). All the data sets were from experiments carried out using
the same platform. i.e. HumanMethylation450k so that differences are due to the
sample and not to the platform used.
CpG sites identification
Samples from the datasets were stored in two files, which were read by a Java
program to identify CpG sites with specific defined criteria. Each of the files was read
line by line to produce vectors of beta values. Any vectors which satisfy the following
criteria were selected for further analysis: CpG sites for which all the samples’ beta
values were more than 0.8 were defined as Hypermethylated CpG sites and sites
which had beta values of less than 0.2 were defined as Hypomethylated as
described in [81] for variably methylated sites. In order to identify four classes of
CpG sites, four classes were defined:
1. Class HyperCancer were sites which are hypermethylated in 60% of cancer
samples and hypomethylated in 60% of normal samples
2. Class HypoCancer were sites which are hypomethylated in 60% of cancer
samples and 60% hypermethylated in normal samples
3. Class AM were sites that are always hypermethylated (where 99% percent of
the samples have beta-values more than 0.8) in both normal and cancer cells
4. Class NM were sites that are never methylated (where 99% of samples have
a beta-value less than 0.1) in both normal and cancer cells.
CpG sites in each class with more than 50% overlap were removed.
Motif Discovery
The MEME (Multiple EM for Motif Elicitation) software suite (http://meme.nbcr.net)
was used motifs discovery. We used default MEME settings with ZOOPS (zero or
one motif per sequence) parameter, for discovering motifs for each class of identified
CpG sites. Sixty bps of flanking sequence around each CpG site was used as input
for the MEME analysis for each class and five best motifs according to their E-value
as calculated in the MEME probability matrix were selected for further analysis with
custom designed Java program. 20 motifs (5 for each CpG class) were used as input
to the MAST tool to align these motifs against the 653 CpG DNA sequences in the
four classes. The MAST program removed 2 motifs which have more than 60%
overlap with others and so finally 18 motifs were selected by MAST used for further
analysis.
Using motifs for Classification
A Java program was developed to convert the MAST hit results to a feature matrix
and the results used in the Weka package (http://www.cs.waikato.ac.nz/~ml/weka/)
to evaluate the potential of using these motifs for classification of four classes of
CpG sites. Using three different machine learning methods and 10 fold cross
validation CpG sites were classified according to their motifs. The input matrix was
the CpG sites with their corresponding class, and the features are motifs which
appear in the flanking DNA. Similar methods have been used in previous studies [82,
83]. J48, logistic and support vector machines were used as a classification tools for
this purpose.
Acknowledgements
We acknowledge the support in kind of Brunel University and staff in the
Departments of Computer Science and Biosciences.
Figure Legends
Figures 1 and 2 Graphs to show the number of CpG sites in each class and the
positional relationships to CpG Islands. Figure 1 showing the number as a proportion
of the total in each position relative to the CpG subdivided into classes. Figure 2
showing the number as a proportion of the total in each class, subdivided into
positions relative to the CpG.
Figure 3 Graph to show the percentage of each class of CpG that are associated
with a microRNA site.
Figure 4 Graph to show the number of times a particular miRNA species coding
sequence occurs in the DNA sequence in the different classes of CpGs identified in
this study.
Figure 5 Weka result for the most discriminative motif using the J48 algorithm to
classify the CpGs into the 2 classes.
0.0045
0.004
0.0035
0.003
Hypomethylated in
normal Hypermethylated
in cancer
Hypermethylated in
normal hypomethylated
in cancer
Never methylated
0.0025
0.002
0.0015
0.001
0.0005
Always Methylated
0
island
N_Shelf
N_Shore
S_Shelf
S_Shore
Figure 1
1.2
1
0.8
island
N_Shelf
N_Shore
S_Shelf
S_Shore
none
0.6
0.4
0.2
0
Hypomethylated in
normal
Hypermethylated in
cancer
Figure 2
Hypermethylated in
normal
hypomethylated in
cancer
Never methylated
Always Methylated
Figure 3
Figure 4
Figure 5
Tables
Table 1 Data series and the samples contained in them used in this study. All were obtained
from GEO http://www.ncbi.nlm.nih.gov/gds/
Series
GSE20945
Title of study
Transient low doses of DNAdemethylating agents exert durable
antitumor effects on hematological and
epithelial tumor cells
Evaluation of the Infinium Methylation
450K technology
Tissue and samples used
Primary leukaemia untreated
samples
GSE30338
IDH1 mutation is sufficient to establish
the glioma hypermethylator phenotype
GSE36278
Hotspot mutations in H3F3A and IDH1
define distinct epigenetic and biological
subgroups of glioblastoma
Glioma tumour samples
making sure only the tumour
samples were included
Primary glioblastoma and nonneoplastic brain samples
fine
GSE37965
DNA methylation profiling in breast
cancer discordant identical twins
identifies DOK7 as novel epigenetic
biomarker.
DNA methylation alterations exhibit
Whole blood
this is whole blood from cancer
patients and normals not the
tumours
Lymph node Kidney Soft tissue
GSE29290
GSE38240
Normal breast and breast cancer
tumour samples
intraindividual stability and interindividual
heterogeneity in prostate cancer
metastases
GSE38266,
GSE38268
GSE30870
GSE31848
GSE32148
GSE33233
GSE34486
GSE36064
GSE39141
GSE42118
Identification and functional validation of
HPV-mediated hypermethylation in head
and neck squamous cell carcinoma
Distinct DNA methylomes of newborns
and centenarians
Recurrent variations in DNA methylation
in human pluripotent stem cells and their
differentiated derivatives
Genome-wide peripheral blood leukocyte
DNA methylation microarrays identified a
single association with inflammatory
bowel diseases
Distinct DNA methylomes of newborns
and centenarians.
DNA methylation regulates lineagespecifying genes in primary lymphatic
and blood endothelial cells
Age-associated DNA methylation in
pediatric populations
Genome-wide DNA methylation profiling
predicts relapse in childhood B-cell acute
lymphoblastic leukaemia
DNA methylation changes are a late
event in acute promyelocytic
leukemia and coincide with loss of
transcription factor binding
Liver Subdural
Bone Adrenal
Prostate
Spleen
Bladder
Lung
Blood
metastasis samples
HPV- HNSCC tumour samples
Whole blood , Cord Blood
samples used as normal controls
Somatic tissue various tissue type
samples
Peripheral Blood of normal
individual samples
Whole blood samples used as
controls
Dermal blood endothelial cells
from normal _buttock samples
White blood cells from healthy
individuals
bone marrow mononuclear cells
from healthy person samples
bone marrow from healthy donors
samples
Table 2 Distribution of unique microRNA sequences in three classes
number RNAcode
NM
number RNAcode
AM
number RNAcode HyperCancer number RNAcode HypoCancer
miR205/205a
5
1b
2
miR-193/193b/193a1 3p
4
2 miR-141/200a
3
1 miR-153
miR-33a-3p/365/3652 3p
2
miR130ac/301ab/301b/301
b3 3p/454/721/4295/3666
1
2
2
2
2
4 miR-216b/216b-5p
5 miR-23abc/23b-3p
6 miR-96/507/1271
7 miR-490-3p
1
1
1
1
3 miR-183
4 miR-18ab/4735-3p
5 miR-223
6 miR-191
7 miR-150/5127
miR93/93a/105/106a/291
a3p/294/295/302abcde
/372/373/428/519a/5
20be/520acd8 3p/1378/1420ac
9 miR-203
miR-140/140-5p/87610 3p/1244
11 miR-26ab/1297/4465
12 miR-455-5p
13 miR-551a
14 miR-145
15 miR-204/204b/211
16 miR-208ab/208ab-3p
17 miR-499-5p
3
2 miR-451
miR146ac/14
3 6b-5p
miR122/122a
4 /1352
1
1
1
2
2
1
1
1
1
1
1
1
1
Table 3 The genes neighbouring the CpGs found in Class HyperCancer are listed
along with their function as identified by Cormine software
UCSC_RefGene_Name
PPFIA1
EXD3
PTPRCAP
Function
Cell motility, apoptosis. invasion
suppressor gene, cell division and
chromosome partitioning, cell motility,
gene silencing activity
Protein tyrosine phosphatase receptor,
apoptosis
LOC100129637
TMC6
Unknown
DNA repair
BIN2
endocytosis
C17orf101
MAP1D
oxidoreductase activity
aminopeptidase activity,
phosphorylation
cytoskeletal protein, cell migration,
apoptosis
endocytosis, phagocytosis, apoptosis
cell migration
exonuclease activity, cell division, signal
transduction, DNA replication
regulation of leukocyte activation, cell
SORBS2
ELMO1
ERI3
LAG3
Cancer Involvement
Amplified breast and head
and neck cancers (cell trying
to avoid invasion)
None known
Hypermethylated in many
cancers. Implicated in
tumorigenesis
Unknown
Variants seen in Cervical
Cancer
Abrogated in
Myeloproliferative neoplasms
None known
Over expressed in colon
cancer
Downregulated in pancreatic,
thyroid and cervical cancer
Promote cell invasion in
ovary, colon and brain cancer
Increased in breast cancer
Involved in many different
proliferation, apoptosis
PLCB2
phospholipase C activity, calcium ion
binding, signal transduction apoptosis
SPN
regulation of inflammatory response to
antigenic stimulus, induction of
apoptosis by extracellular signals
NAD+ ADP-ribosyltransferase activity,
cell proliferation, apoptosis
PARP10
MYO1G
myosin complex, cell division, DNA
hypermethylation
CD6
PC
Cell Adhesion Molecule (CAM),
apoptosis cell proliferation
intracellular signaling cascade, small
GTPase mediated signal transduction,
cell proliferation, apoptosis
Regulation of actin cytoskeleton, cell
proliferation apoptosis
MAPK signaling pathway, Apoptosis,
RIG-I-like receptor signaling pathway,
Adipocytokine signaling pathway,
regulation of protein amino acid
phosphorylation
carbonate dehydratase activity, zinc ion
binding, cell proliferation, apoptosis
regulation of protein amino acid
phosphorylation, cell migration
DNA binding, lipid transporter activity
chloride channel activity, embryo
development
protein tyrosine phosphatase activity,
cell proliferation, apoptosis
regulation of Ras protein signal
transduction, gene expression
protein amino acid phosphorylation, cell
growth, apoptosis
zinc ion binding, RING type, apoptosis,
DNA methylation
negative regulation of adaptive immune
response, positive regulation of cell
death, apoptosis
transcription repressor activity
MIR365-1
RADIL
Not known
cell adhesion, forkhead and RAS
RAPGEF1
NCKAP1L
TRAF5
C3orf21
CA6
CCDC88C
TNRC18
ANO8
PTPN7
TBC1D16
STK16
RFFL
SPN
cancers assisting in detection
avoidance and resistance to
apoptosis
Highly expressed in Breast
cancer promoting mitosis and
migration of tumour cells
Significantly expressed in
lymphomas
Inhibits transformation of
cells, in KEGG small cell lung
cancer
Involved in survival
leukaemia and breast cancer
cell
Aberrantly expressed in
leukemia
Upregulation in breast, lung,
gastrointestinal and
gynaecological cancers
Down regulated in many
cancers
Expressed in lymphomas and
small cell lung cancer
None known
Expressed in ovarian and
breast cancers
Involved in tumour invasion
None known
Over expressed in many
cancers
Implicated in blood cancers
Involved in melanoma
progression
Over expressed in tumour
cells
Involved in myeloma
Supressed in many tumours
Upregulated in many
tumours (renal, small cell
lung, sarcoma)
Not known
None Known
associated
proteolysis, macromolecule catabolic
process, cell proliferation, cell cycle
lamin filament, cytoskeleton, cell cycle,
methylation, apoptosis
FBXL16
LMNB2
JAK3
Down regulated in many
cancers
Down regulated in prostate,
gastric, skin and leukaemia
cancers
Upregulated in many cancers
positive regulation of leukocyte
activation, apoptosis, signal
transduction, phosphorylation
ATP-activated inward rectifier
Upregulated in
potassium channel activity,
nasopharyngeal carcinoma
vasodilation, apoptosis, gene expression
KCNJ8
Table 4 The genes neighbouring the CpGs found in Class HypoCancer along with
their function as described by Cormine software http://www.coremine.com/medical/.
UCSC RefGene Name
RPTOR
C22orf9
NOS1AP
RGS12
Function
Androgen receptor activity,
kinase activity, telomerase
activity, kinase activity, cell
growth, cell cycle, insulin
signalling
Not Known
Signal transduction, gene
expression, cell migration,
cell proliferation
Signal transduction, cell
cycle. RNA interference,
apoptosis, SNAP receptor
activity
Cancer Involvement
Up regulated in multiple
cancers
None Known
Associated with breast
cancer progression
Mutated in colorectal
tumours
Table 5 The number and proportion of CpGs associated with LINE elements and Alu
repeats.
Class
No.CpGs in
Class
HyperCancer
HypoCancer
52
7
Line Elements:
Total in class
(% of CpGs
having one or
more)
26 (33%)
12 (71%)
Alu repeats:
Total in class
(% of CpGs
having one or
more)
49 (44%)
4 (29%)
Table 6 The top 5 motifs, based on p-value, which were found near the four classes of CpG
site and their length and sequences, as determined by MEME.
Width
id
m1A
m2A
m3A
m4A
m5A
m6AM
m7AM
m8AM
m9AM
m10AM
m11C
m12C
m13C
m14C
m16NM
m17NM
m19NM
m20NM
class
HyperCancer
HyperCancer
HyperCancer
HyperCancer
HyperCancer
AM
AM
AM
AM
AM
HypoCancer
HypoCancer
HypoCancer
HypoCancer
NM
NM
NM
NM
HyperCancer
HypoCancer
AM
NM
normal_hypomethylated_cancer_hypermethylated
normal_hypermethyl_cancer_hypomethyl
always methylated
never methylated
11
19
29
11
15
50
29
48
41
29
13
14
13
8
15
15
10
27
motif
proportion in the class
0.652173913 AAGACAGGAAG
0.190751445 GGGGAGGGGGGGGCGGAGG
1 ATTATTGAGTATCACTTTGTATATCTTTT
0.578947368 CACACCGTCCT
0.333333333 AGCAGGAGAAGCAGG
0.6875 TCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACC
0.875 GCTTTTTAGAGACGGAGTCTCGCTCTGTT
0.333333333 TGAGAGGCGCTTGCGGGCCAGCCGGAGTTCCCGGTGGGCATGGGCTTG
0.405405405 GGTGACGAGGCGCGACAGGGTGACGAGGCGCGATTGGGTGA
0.459459459 TGGGTGAGGAGGCGCGACTCGGTGATGAG
0.75 TTTAAATTCATTT
0.194444444 CTTCCAGGCTTGGT
0.666666667 TCCAAGGGACAGC
0.272727273 TGAGGAAT
0.736842105 TTTCCTTTTTCTTGT
0.95 AGTGCGCATGCGCAG
0.952380952 CACTTCCGGT
0.807692308 CGCGCGGCATGCCGGGACTTGTAGTTC
References
1. Bird, A.P. and Wolffe,A.P. (1999) Methylation-induced repression— belts,
braces, and chromatin. Cell, 99: 451-454.
2. Baylin S.B. (2005) DNA methylation and gene silencing in cancer Nature
Clinical Practice Oncology
http://lists.bilkent.edu.tr/~science/MBG523/Lectures/Epigenetics%20articles/D
NA%20Methy.%20and%20Gene%20silenc.%20in%20cancer.pdf accessed
21/02/2014.
3. Heyn, H., Carmona, F.J., Gomez, A., Ferreira, H.J., Bell, J.T., Sayols, S.,
Ward, K., Stefansson, O.A., Moran, S., Sandoval, J., Eyfjord, J.E., Spector,
T.D. And Esteller, M. (2013) DNA methylation profiling in breast cancer
discordant identical twins identifies DOK7 as novel epigenetic
biomarker. Carcinogenesis, 34(1): 102-108.
4. Fukushige S, Horii A. (2013) DNA methylation in cancer: a gene silencing
mechanism and the clinical potential of its biomarkers. Tohoku J Exp
Med., 229(3):173-85.
5. Klose R.J. and Bird A.P. (2006) Genomic DNA methylation: The mark and
its mediators. Trends Biochem. Sci. 31: 89-97.
6. Das PM and Singal R. (2004) DNA methylation and cancer. Journal of
Clinical Oncology, 22: 4632-4642
7. Taberlay PC, PA Jones. (2011) DNA methylation and cancer Epigenetics and
Disease, - Springer.
http://www.springer.com/cda/content/document/cda_downloaddocument/9783
764389888-c1.pdf?SGWID=0-0-45-1004851-p174022756 (accessed
17/02/14)
8. Shames, D. S., Girard, L., Gao, B., Sato, M., Lewis, C. M., et al. (2006) A
genome-wide screen for promoter methylation in lung cancer identifies
novel methylation markers for multiple malignancies. PLoS Medicine, 3:
e486
9. Michaelson-Cohen R, Keshet I, Straussman R, Hecht M, Cedar H, Beller U.
(2011) Genome-wide de novo methylation in epithelial ovarian cancer. Int
J Gynecol Cancer. 21(2): 269-79.
10. Gama-Sosa,M.A., Slagel,V.A., Trewyn,R.W., Oxenhandler,R.,Kuo,K.C.,
Gehrke,C.W. and Ehrlich,M. (1983) The 5-methylcytosinecontent of DNA
from human tumors. Nucleic Acids Res., 11: 6883–6894.
11. Feinberg,A.P. and Vogelstein,B. (1983) Hypomethylation distinguishes
genes of some human cancers from their normal counterparts. Nature,
301: 89–92.
12. Feinberg, A.P., Gehrke,C.W., Kuo,K.C. and Ehrlich,M. (1988)
Reducedgenomic 5-methylcytosine content in human colonic neoplasia.
Cancer Res., 48: 1159–1161.
13. Cho,D.H., Thienes,C.P., Mahoney,S.E., Analau,E., Filippova,G.N. and
Tapscott,S.J. (2005) Antisense transcription and heterochromatin at the
DM1 CTG repeats are constrained by CTCF. Mol. Cell, 20: 483-489.
14. McKinnell,I.W., Ishibashi,J., Le Grand,F., Punch,V.G., Addicks,G.C.,
Greenblatt,J.F., Dilworth,F.J. and Rudnicki,M.A. (2008) Pax7 activates
myogenic genes by recruitment of a histone methyltransferase complex.
Nat. Cell Biol., 10: 77-84.
15. De Biase,I., Chutake,Y.K., Rindler,P.M. and Bidichandani,S.I. (2009)
Epigenetic silencing in friedreich ataxia is associated with depletion of
CTCF (CCCTC-binding factor) and antisense transcription. PLoS One, 4:
e7914.
16. Gebhard C, Benner C, Ehrich M, Schwarzfische L (2010) General
transcription factor binding at CpG islands in normal cells correlates
with resistance to de novo DNA methylation in cancer cells. Cancer Res;
70(4): 1398–407.
17. An Integrated Encyclopedia of DNA Elements in the Human Genome The
ENCODE Project Consortium. (2012) Nature doi: 10.1038/nature11247
18. Bock,C. and Lengauer,T. (2008) Computational epigenetics.
Bioinformatics, 24: 1-10.
19. Yamada,Y. and Satou,K. (2008) Prediction of genomic methylation status
on CpG islands using DNA sequence features. WSEAS Transactions on
Biology and Biomedicine, 5: 153-162.
20. Ali,I. and Seker,H. (2010) A comparative study for characterisation and
prediction of tissue-specific DNA methylation of CpG islands in
chromosomes 6, 20 and 22. Conf. Proc. IEEE Eng. Med. Biol. Soc., 18321835.
21. Previti,C., Harari,O., Zwir,I. and del Val,C. (2009) Profile analysis and
prediction of tissue-specific CpG island methylation classes. BMC
Bioinformatics, 10: 116.
22. Glass JL1, Fazzari MJ, Ferguson-Smith AC, Greally JM. (2009) CG
dinucleotide periodicities recognized by the Dnmt3a-Dnmt3L complex
are distinctive at retroelements and imprinted domains. Mamm Genome.
20(9-10): 633-43.
23. Grewal, S.I.S. and Jia,S. (2007) Heterochromatin revisited. Nat. Rev.
Genet., 8: 35-46.
24. Filippova,G.N., Thienes,C.P., Penn,B.H., Cho,D.H., Hu,Y.J., Moore,J.M.,
Klesert,T.R., Lobanenkov,V.V. and Tapscott,S.J. (2001) CTCF-binding sites
flank CTG/CAG repeats and form a methylation-sensitive insulator at the
DM1 locus. Nat. Genet., 28: 335-343.
25. Lienert F, Wirbelauer C, Som I, Dean A, Mohn F, Schübeler D. (2011)
Identification of genetic elements that autonomously determine DNA
methylation states. Nat Genet. 43(11):1091-7
26. Okitsu CY, Hsieh CL. (2007) DNA methylation dictates histone H3K4
methylation. Mol Cell Biol. 27(7):2746-57
27. Ooi SK, Qiu C, Bernstein E, Li K, Jia D, Yang Z, Erdjument-Bromage H,
Tempst P, Lin SP, Allis CD, Cheng X, Bestor TH.: (2007) DNMT3L connects
unmethylated lysine 4 of histone H3 to de novo methylation of DNA.
Nature. 448(7154):714-7.
28. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Schöler A, van Nimwegen
E, Wirbelauer C, Oakeley EJ, Gaidatzis D, Tiwari VK, Schübeler D. (2011)
DNA-binding factors shape the mouse methylome at distal regulatory
regions. Nature. 480(7378):490-5.
29. Weber M, Hellmann I, Stadler MB, Ramos L, Pääbo S, Rebhan M, Schübeler
D. (2007) Distribution, silencing potential and evolutionary impact of
promoter DNA methylation in the human genome. Nat Genet. 39(4):45766.
30. Yuan,G. (2011) Prediction of epigenetic target sites by using genomic DNA
sequence. In Anonymous Handbook of Research on Computational and
Systems Biology: Interdisciplinary Applications. IGI Global, pp. 187-201.
31. Feltus, F.A., Lee, E.K., Costello, J.F., Plass, C. And Vertino, P.M. (2006) DNA
motifs associated with aberrant CpG island methylation. Genomics,
87(5): 572-579.
32. Lu,L., Lin,K., Qian,Z., Li,H., Cai,Y. and Li,Y. (2010) Predicting DNA
methylation status using word composition. Journal of Biomedical
Science and Engineering, 3: 672-676.
33. Ghorbani M, Taylor SJ, Pook MA, Payne A. (2013) Comparative
(computational) analysis of the DNA methylation status of trinucleotide
repeat expansion diseases. J Nucleic Acids.; 689798.
34. McCabe,M.T., Lee,E.K. and Vertino,P.M. (2009) A multifactorial signature
of DNA sequence and polycomb binding predicts aberrant CpG island
methylation. Cancer Res., 69: 282-291.
35. Hervouet,E., Vallette,F.M. and Cartron,P.F. (2009) Dnmt3/transcription
factor interactions as crucial players in targeted DNA methylation.
Epigenetics, 4: 487-499.
36. Leung DC, Dong KB, Maksakova IA, Goyal P, Appanah R, Lee S, Tachibana
M, Shinkai Y, Lehnertz B, Mager DL, Rossi F, Lorincz MC. (2011) Lysine
methyltransferase G9a is required for de novo DNA methylation and the
establishment, but not the maintenance, of proviral silencing. Proc Natl
Acad Sci U S A. 108(14):5718-23.
37. Ooi SK1, Wolf D, Hartung O, Agarwal S, Daley GQ, Goff SP, Bestor TH.
(2010) Dynamic instability of genomic methylation patterns in
pluripotent stem cells. Epigenetics Chromatin. 3(1):17.
38. Rowe HM, Friedli M, Offner S, Verp S, Mesnard D, Marquis J, Aktas T, Trono
D. (2013) De novo DNA methylation of endogenous retroviruses is
shaped by KRAB-ZFPs/KAP1 and ESET. Development. 140(3):519-29.
39. Iorio,M.V., Piovan,C. and Croce,C.M. (2010) Interplay between microRNAs
and the epigenetic machinery: An intricate network. Biochimica Et
Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1799, 694-701.
40. Vrba L, Muñoz-Rodríguez JL, Stampfer MR, Futscher BW. (2013) miRNA
Gene Promoters Are Frequent Targets of Aberrant DNA Methylation in
Human Breast Cancer. PLoS ONE 8(1): e54398.
41. Huang DW, Sherman BT, Lempicki RA. (2009a) Systematic and integrative
analysis of large gene lists using DAVID Bioinformatics Resources.
Nature Protoc.; 4(1):44-57.
42. Huang DW, Sherman BT, Lempicki RA. (2009b) Bioinformatics enrichment
tools: paths toward the comprehensive functional analysis of large gene
lists. Nucleic Acids Res.; 37(1):1-13.
43. Ho-Sui SJ, Mortimer J, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP and
Wasserman WW. (2005) oPOSSUM: Identification of over-represented
transcription factor binding sites in co-expressed genes. Nucleic Acids
Res. 33(10):3154-64.
44. Inoue M1, Takahashi K, Niide O, Shibata M, Fukuzawa M, Ra C. (2005)
LDOC1, a novel MZF-1-interacting protein, induces apoptosis. FEBS Lett.
579(3):604-8.
45. Hsieh YH, Wu TT, Huang CY, Hsieh YS, Liu JY. (2007) Suppression of
tumorigenicity of human hepatocellular carcinoma cells by antisense
oligonucleotide MZF-1. Chin J Physiol. 50(1):9-15
46. Rowland B D., Bernards R and Peeper D S. (2005) The KLF4 tumour
suppressor is a transcriptional repressor of p53 that acts as a contextdependent oncogene Nature Cell Biology 7: 1074 - 1082
47. Weisenberger D J., Campan M, Long T I., Kim M, Woods C. (2005) Analysis
of repetitive element DNA methylation by MethyLight. Nucleic Acids Res.
33(21): 6823–6836
48. Walters RJ, Williamson EJ, English DR, Young JP, Rosty C, Clendenning M,
Walsh MD, Parry S, Ahnen DJ, Baron JA, Win AK, Giles GG, Hopper JL,
Jenkins MA, Buchanan DD. (2013) Association between hypermethylation
of DNA repetitive elements in white blood cell DNA and early-onset
colorectal cancer. Epigenetics. 8(7):748-55.
49. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler
D. (2002) UCSC Genome Browser: The human genome browser at
UCSC. Genome Res. 6:996-1006.
50. Bailey T. L. (2006) MEME:discovering and analysing DNA and protein
sequence motifs. Nucleic Acids Res. 34
51. Lin R, Li X, Li J, Zhang L, Xu F, Chu Y, Li J. (2013) Long-term cisplatin
exposure promotes methylation of the OCT1 gene in human esophageal
cancer cells. Dig Dis Sci. 58(3):694-8.
52. Wang Z, Zhu S, Shen M, Liu J, Wang M, Li C, Wang Y, Deng A, Mei Q.
(2013) STAT3 is involved in esophageal carcinogenesis through
regulation of Oct-1. Carcinogenesis. 34(3):678-88.
53. Croce C.M. (2009) Causes and consequences of microRNA dysregulation
in cancer Nat. Rev. Genet., 10: 704–714
54. Esquela-Kerscher, F.J. Slack. (2006) Oncomirs - microRNAs with a role in
cancer. Nat. Rev. Cancer, 6: 259–269
55. Esteller M. (2011) Non-coding RNAs in human disease Nat. Rev.
Genet.,.12: 861–874
56. Suzuki H, Maruyama R, Yamamoto E, Kai M. (2012) DNA methylation and
microRNA dysregulation in cancer Molecular Oncology 6: 567–578
57. Chalitchagorn K, Shuangshoti S, Hourpai N, Kongruttanachok N,
Tangkijvanich P, Thong-ngam D, et al. (2004) Distinctive pattern of LINE-1
methylation level in normal tissues and the association with
carcinogenesis. Oncogene.; 23:8841-6.
58. Schulz WA. (2006) L1 retrotransposons in human cancers. J Biomed
Biotechnol.:83672.
59. Estecio MR, Gharibyan V, Shen L, Ibrahim AE, Doshi K, He R, et al. (2007)
LINE-1 hypomethylation in cancer is highly variable and inversely
correlated with microsatellite instability. PLoS One.;2:e399.
60. Hsiung DT, Marsit CJ, Houseman EA, Eddy K, Furniss CS, McClean MD, et
al. (2007) Global DNA methylation level in whole blood as a biomarker in
head and neck squamous cell carcinoma. Cancer Epidemiol Biomarkers
Prev.; 16:108-14.
61. Cho NY, Kim BH, Choi M, Yoo EJ, Moon KC, Cho YM, et al. (2007)
Hypermethylation of CpG island loci and hypomethylation of LINE-1 and
Alu repeats in prostate adenocarcinoma and their relationship to
clinicopathological features. J Pathol. 211:269-77.
62. Matsuzaki K, Deng G, Tanaka H, Kakar S, Miura S, Kim YS. (2005) The
relationship between global methylation level, loss of heterozygosity,
and microsatellite instability in sporadic colorectal cancer. Clin Cancer
Res.; 11:8564-9.
63. Perrin D1, Ballestar E, Fraga MF, Frappart L, Esteller M, Guerin JF, Dante R.
(2007) Specific hypermethylation of LINE-1 elements during abnormal
overgrowth and differentiation of human placenta. Oncogene.
26(17):2518-24.
64. Pattamadilok J, Huapai N, Rattanatanyong P, Vasurattana A, Perrin D,
Ballestar E, Fraga MF, Frappart L, Esteller M, Guerin JF, et al.: (2007)
Specific hypermethylation of LINE-1 elements during abnormal
overgrowth and differentiation of human placenta. Oncogene. 26:251824.
65. Pattamadilok J, Huapai N, Rattanatanyong P, Vasurattana A, Triratanachat S,
Tresukosol D, et al. (2008) LINE-1 hypomethylation level as a potential
prognostic factor for epithelial ovarian cancer. Int J Gynecol Cancer
18:711–7.
66. Moore LE, Pfeiffer RM, Poscablo C, Real FX, Kogevinas M, Silverman D, et
al. (2008) Genomic DNA hypomethylation as a biomarker for bladder
cancer susceptibility in the Spanish Bladder Cancer Study: a casecontrol study. Lancet Oncol. 9:359-66.
67. Smith IM, Mydlarz WK, Mithani SK, Califano JA. (2007) DNA global
hypomethylation in squamous cell head and neck cancer associated
with smoking, alcohol consumption and stage. Int J Cancer. 121:1724-8.
68. Subbalekha K, Pimkhaokham A, Pavasant P, Chindavijak S, Phokaew C,
Shuangshoti S, et al. (2009) Detection of LINE-1s hypomethylation in oral
rinses of oral squamous cell carcinoma patients. Oral Oncol. 45:184-91.
69. Karouzakis E, Gay RE, Michel BA, Gay S, Neidhart M. (2009) DNA
hypomethylation in rheumatoid arthritis synovial fibroblasts. Arthritis
Rheum. 60:3613-22.
70. Choi IS, Estecio MR, Nagano Y, Kim do H, White JA, Yao JC, et al. (2007)
Hypomethylation of LINE-1 and Alu in well-differentiated neuroendocrine
tumors (pancreatic endocrine tumors and carcinoid tumors). Mod Pathol.
20:802-10.
71. Roman-Gomez J, Jimenez-Velasco A, Agirre X, Castillejo JA, Navarro G, San
Jose-Eneriz E, et al. (2008) Repetitive DNA hypomethylation in the
advanced phase of chronic myeloid leukemia. Leuk Res. 32:487-90.
72. Lee HS, Kim BH, Cho NY, Yoo EJ, Choi M, Shin SH, et al. (2009) Prognostic
implications of and relationship between CpG island hypermethylation
and repetitive DNA hypomethylation in hepatocellular carcinoma. Clin
Cancer Res. 15:812-20
73. Fiala E, Ehrlich M and. Laird P W: (2005) Association between
hypermethylation of DNA repetitive elements in white blood cell DNA
and early-onset colorectal cancer. Nucleic Acids Research, 33, 21: 6823–
6836
74. Weisenberger, D.J., Velicescu, M., Cheng, J.C., Gonzales, F.A., Liang, G.,
Jones, P.A. (2004) Role of the DNA methyltransferase variant DNMT3b3
in DNA methylation Mol. Cancer Res. 262–72
75. Magewu, A.N. and Jones, P.A. (1994) Ubiquitous and tenacious
methylation of the CpG site in codon 248 of the p53 gene may explain its
frequent appearance as a mutational hot spot in human cancer Mol. Cell.
Biol. 14: 4225–4232
76. Zhipeng Wang,Shaojun Zhu,Min Shen,Juanjuan Liu, Meng Wang, Chen Li,
Yukun Wang, Anmei Deng and Qibing Mei (2013) STAT3 is involved in
esophageal carcinogenesis through regulation of Oct-1 Carcinogenesis
34 (3): 678-688.
77. Liao D (2009) Emerging roles of the EBF family of transcription factors
in tumor suppression. Mol Cancer Res. 7(12):1893-901
78. Chen F, Song J, Di J, Zhang Q, Tian H, Zheng J. (2012) IRF1 suppresses
Ki-67 promoter activity through interfering with Sp1 activation. Tumour
Biol. 33(6):2217-25
79. Thompson VC. Day TK, Bianco-Miotto T, Selth LA, Han G, Thomas M,
Buchanan G, Scher HI, Nelson CC; Australian Prostate Cancer BioResource,
Greenberg NM, Butler LM, Tilley WD. (2012) A gene signature identified
using a mouse model of androgen receptor-dependent prostate cancer
predicts biochemical relapse in human disease. Int J Cancer. 131(3):66272
80. Deutsch AJ., Angerer H, Fuchs TE, Neumeister P. (2012) The Nuclear
Orphan Receptors NR4A as Therapeutic Target in Cancer Therapy.
Anticancer Agents Med Chem. 12(9):1001-14
81. Du P, Zhang X, Huang C-C, Jafari N, Kibbe W A, Hou L, and Lin S M (2010)
Comparison of Beta-value and M-value methods for quantifying
methylation levels by microarray analysis. BMC Bioinformatics. 11: 587.
82. Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2007) CpG Island
Mapping by Epigenome Prediction, PLoS Comput Biol vol. 3, no. 6, pp.
E110
83. Wrzodek, C., Büchel, F., Hinselmann, G., Eichner, J., Mittag, F. and Zell, A.
(2012) Linking the Epigenome to the Genome: Correlation of Different
Features to DNA Methylation of CpG Islands, PLoS ONE , vol. 7, no. 4,
pp. e35327