Genome wide classification and characterisation of CpG sites in cancer and normal cells. Mohammadmersad Ghorbani1, Michael Themis2 and Annette Payne1* 1 Department of Computer Science, 2 Department of Biosciences, Brunel University, Uxbridge, Middlesex, UB8 3PH, UK * To whom correspondence should be addressed. [email protected] Key words: motif, pattern identification, methylation in cancer, computational analysis, pattern searching algorithm, CpG, DNA sequence Abstract This study identifies common methylation patterns across different cancer types in an effort to identify common molecular events in diverse types of cancer cells and provides evidence for the sequence surrounding a CpG to influence its susceptibility to aberrant methylation. CpGs sites throughout the genome were divided into four classes: sites that either become hypo or hyper-methylated in a variety cancers using all the freely available microarray data (HypoCancer and HyperCancer classes) and those found in a constant hypo (Never methylated class) or hypermethylated (Always methylated class) state in both normal and cancer cells. Our data shows that most CpG sites included in the HumanMethylation450K microarray remain unmethylated in normal and cancerous cells; however, certain sites in all the cancers investigated become specifically modified. More detailed analysis of the sites revealed that majority of those in the never methylated class were in CpG islands whereas those in the HyperCancer class were mostly associated with miRNA coding regions. The sites in the Hypermethylated class are associated with genes involved in initiating or maintaining the cancerous state, being enriched for processes involved in apoptosis, and with transcription factors predicted to bind to these genes linked to apoptosis and tumourgenesis (notably including E2F). Further we show that more LINE elements are associated with the HypoCancer class and more Alu repeats are associated with the HyperCancer class. Motifs that classify the classes were identified to distinguish them based on the surrounding DNA sequence alone, and for the identification of DNA sequences that could render sites more prone to aberrant methylation in cancer cells. This provides evidence that the sequence surrounding a CpG site has an influence on whether a site is hypo or hyper methylated. Author Summary This study identifies common methylation patterns across different cancer types in an effort to identify common molecular events in diverse types of cancer cells. In this paper we describe our meta- analyses of all the CpG sites throughout the genome from all the studies using the HumanMethylation450K microarray available of both normal and cancer cells using computational and bioinformatics methods. We believe that this work provides evidence that certain CpGs are more likely to be aberrantly methylated than others. Also we have characterised the properties of the CpGs that are and are not aberrantly methylated to suggest reasons why they should be so. Our data shows that most of the CpG sites studied remain unmethylated in normal and cancerous cells; however, certain sites in all the cancers investigated become specifically modified. Motifs and features that classify the classes were identified to distinguish them based on the surrounding DNA sequence alone, and for the identification of DNA sequences and features that could render sites more prone to aberrant methylation in cancer cells. This showed that the sequence surrounding a CpG site could have an influence on whether a site is aberrantly methylated in the oncogenic state and whether that aberrant methylation is hypo or hyper methylated. Introduction DNA methylation involving the addition of methyl groups to CpG sequences is one of the mechanisms used by the cells to control gene expression gene silencing being a major biological consequence of DNA methylation. This phenomenon, known as epigenetic control has been reported to be important to mammalian development, X inactivation and genomic imprinting [1]. Epigenetic changes have been shown to occur in both healthy cells, where it assists in regulating gene expression during development, and diseased cells, where it is associated with aberrant gene expression, most notably in oncogenesis [2]. Many studies have also shown that differentially methylated CpG sites can act as a biomarkers in identifying disease and specific CpG site methylation can be a signature for specific types of tumours [3],[4], [5], [6], [7]. In tumour development global DNA hypomethylation is often followed by hypermethylation at specific CpGs [8], [9], [10], [11], [12]. Closer inspection of these studies and the fact that the cancer phenotype is associated with aberrant expression of a significant number of the same genes e.g. TP53 and RB1, in different cancer types, would suggest that there are common pathways and molecular mechanisms that can be identified across the different types of cancer. Further the differentially methylated CpGs could be informative in discovering mechanisms leading to malignancy. Factors that influence CpG methylation include chromatin accessibility, which have been shown to modulate methylation, DNASE1 footprinting, transcription factor levels and CTCF binding, where higher levels and the act of binding protect DNA from methylation. [13], [14], [15], [16] [17]. Particular DNA motifs have been identified in previous studies that may be used to predict the methylation status of DNA sequences in normal cells. Notably methylation is more prevalent in regions of low CpG density, with regions of intermediate density being most variably methylated [18]. Yamada and Satou [19] employed machine learning methods, specifically support vector machine and random forest methods, using previously reported methylation data, to analyse DNA sequence features to predict methylation status. They revealed that frequencies of sequences containing CG, CT or CA are different when they compared unmethylated and methylated CpG islands. Ali and Seker [20] used an adapted Knearest neighbour classifier method to predict the methylation state on chromosomes 6, 20 and 22 in various tissues. They identified four feature sub-sets which showed that methylated CpG islands can be distinguished from unmethylated CpG islands based on DNA sequence. Lastly Previti et al., [21] used a data mining in the absence of supervised clustering to predict the methylation status of CpG islands in different tissues. These studies showed that there are significant differences in the sequences of CpG islands (CGIs) that predisposed them to methylation. Other studies have identified that the density and spacing of CpGs, the histone code (methylation of histone 3 at Lysine 4 (H3K4), CTCF protein binding and REST protein binding can influence DNA methylation [22], [23], [24], [25], [26], [27], [28], [29]. In their review of computational epigenetics called “Computational Epigenetics”, Bock and Lengauer [18] highlighted the fact that, although it is clear that much work has been done to document the epigenetic state of the genome (much of it reported in the ENCODE project [17]), to date, work in the area of de novo DNA methylation prediction is limited. One study however has shown that aberrant methylation has been shown to be associated with mutations where methylation in the MGMT promoter has been demonstrated to be closely associated with G:C to A:T mutations [30]. Thus whilst studies have identified motifs associated with normal methylation patterns few studies have attempted to search for motifs associated with aberrant methylation using computational techniques, one study by Feltus et al., [31] used Restriction Landmark Genome Scanning software to identify methylation resistant and methylation prone motifs based on DNA sequence and another by Lu et al., [32] has been carried out using word composition computation. Lastly Gorbani et al., [33] have suggested that the sequence surrounding a CpG can be used to predict aberrant methylation in trinucleotide repeat diseases using a pattern searching algorithm. Their results suggest that the sequence surrounding a CpG can be used to predict aberrant methylation. In another study by McCabe et al., [34] patterns were identified using machine learning techniques and used for pattern matching where DNA signatures and a co-occurrence with polycomb binding were found to predict aberrant CpG methylation in cancer cells. The reason for recruitment of the de novo DNA methyltransferases to specific genomic targets however remains largely unknown. Dnmt3 and certain transcription factors have been shown to interact with each other to target methylation Hervouet et al., 2009 [35] and recently it has been reported that DNMT3L and the lysine methyltransferase G9a are required for the initiation of proviral de novo DNA methylation [36], [37]. Lastly Rowe et al., [38] have shown that ERV sequences are sufficient to direct rapid de novo methylation of a flanked promoter in embryonic stem (ES) cells. In this study we have used a pattern searching algorithm to identify motifs in the DNA surrounding aberrantly methylated CpGs in the DNA of cancer cells from multiple cancer types and tissues so as to investigate whether common patterns of methylation across these different cancers can be identified. Previous studies have concentrated on one cancer or tissue type. Further most former studies that analysed surrounding DNA sequences are based on the sequences surrounding CpG islands or two classes of islands, methylation prone and methylation resistant. CpGs not associated with islands were not included [31]. With more data becoming publicly available about the methylation status around single CpG sites not associated with islands, it is now possible to investigate increasing numbers of sites and more additional classes of DNA methylation. In this study, we examined the DNA sequences surrounding CpG sites. We divided sites into four classes of DNA methylation: sites that either become hypo or hyper-methylated in a variety cancers (HypoCancer and HyperCancer classes) and those found in a constant hypo (Never methylated class) or hyper-methylated (Always methylated class) state in both normal and cancer cells. Thus we have divided the CpG sites into four classes: 1. Never methylated in either cancer or normal cells (class NM) 2. Always methylated class in cancer and normal cells (class AM), 3. Hypomethylated in normal and hypermethylated in cancer (class HyperCancer) 4. Hypermethylated in normal and hypomethylated in cancer (class HypoCancer). Then we investigated the DNA sequence flanking these sites to find out if we could find common sequences or motifs in each class. We have carried out this work in an attempt to better understand a possible influence of DNA sequence on aberrant methylation. Objectives of this work 1. Identify four classes of CpG sites based on data from diverse cancer types and normal tissue. 2. Identify methylation sites that could act as biomarkers. 3. Analyse the genes and DNA features associated with differentially methylated CpG sites to identify any links with carcinogenesis. 4. To identify DNA motifs in the DNA sequence surrounding a CpG that could render a CpG prone to aberrant methylation in cancer. 5. Using these motifs, suggest prediction criteria that could be used to identify CpG sites that are differentially methylated in normal and cancer cells in silico. Results CpG sites and their classes Using the method described 653 CpG sites were identified that could be divided into the four classes according to their methylation status: 447 CpG sites in the Never methylated class (class NM), 148 sites in the Always methylated class (class AM), 51 hypomethylated in normal and hypermethylated in cancer (class HyperCancer) and 7 sites hypermethylated in normal and hypomethylated in cancer (class HypoCancer). We mapped the positional relationship of the CpG sites to CpG islands in the UCSC browser. 81 CpG sites were not in any positional relationship with a CpG Island. Never methylated sites are predominantly within islands. Most of the CpGs in the two classes of variably methylated sites have no relationship to any CpG islands. Always methylated CpGs are spread among the different positional relationships to UCSC CpG islands. These results are shown in figures 1 and 2. MicroRNA results The UCSC table browser was used in order to find out if methylation of these CpG sites could interfere with the expression of microRNA coding regions since miRNAs are suggested to interact with epigenetic machinery [39] and are important regulators of gene expression that are aberrantly regulated in cancer through changes in methylation [40]. The track “miR Sites High” in table “miRcode Predicted MicroRNA Target Sites microRNA” was investigated. A total number of 241 of the CpG sites were shown to overlap with microRNA coding sites. The results are depicted in figure 3. 148 NM class sites overlap (33%), 68 AM class sites (46%), 25 Class HyperCancer (49%) and 0 Class HypoCancer. Sixty four of these hits are to unique microRNAs to a class (provided as supplementary file Table 2).17 are unique to NM class sites, 4 microRNA are unique to normal_hypomethylated and cancer hypermethylated (HyperCancer), 7 are unique to AM class sites. Table 2 shows the unique microRNA sites and figure 4 shows the distribution of microRNAs species between the 3 classes they were identified in. Genes neighbouring the CpGs in Class HyperCancer The genes neighbouring the CpGs found in Class HyperCancer are listed in table 3 along with their function as identified by Cormine software http://www.coremine.com/medical/. This shows that the vast majority have some link with cancer or tumourgenesis. On manual, DAVID and IPA (http://www.ingenuity.com/products/ipa) software functional clustering analysis [41, 42] the most enriched gene cluster was found to be one with a functional key word of “Apoptosis” indicating that a large proportion of these genes are involved or predicted to be involved in apoptosis. DNA binding protein sites near these genes The genes listed in Table 3 above were analysed for their predicted DNA binding protein sites including 5Kbp up and down stream of their coding regions using oPossium transcription factor binding analysis software (http://opossum.cisreg.ca/oPOSSUM3/) which looks for and reports DNA protein binding motifs in gene sequences using their consensus binding sites. The most enriched predicted binding site according to oPossum [43] was for MZF1_1-4, a zinc finger transcription factor (TF) which is suspected as one regulator of transcriptional events during hemopoietic development and has been implicated in upregulating apoptosis by interacting with LDOC1 and enhancing the activity of LDOC1 for inducing apoptosis [44], thus if methylation in cancer prevents its binding this could affect the cells ability to enter apoptosis. MZF-1 has also been show to supress tumourgenicity [45]. The second most enriched, KLF4, contributes to the down-regulation of p53/TP53 transcription [46], which is important in tumorigenesis. These genes are also enriched for the E2F family of transcription factors as assessed by oPossum software; 19 of the genes are predicted to bind (equivalent to 55.88%) this compares to 32.77% of all genes in the human genome. Genes neighbouring the CpGs in Class HypoCancer The genes neighbouring the CpGs found in Class HypoCancer are listed in table 4 along with their function as described by Cormine software http://www.coremine.com/medical/. When analysed for transcription factor binding NOS1AP was the protein which had the most TF motifs associated with it and these include Sox2, RREB1, Evi1, NR3C1 with the highest z-score as determined by oPossum and notably E2F1. None of the other genes in this list were predicted to bind E2F type transcription factors. LINE and Alu repeats Since methylation changes in cancers have been shown to be associated with repetitive elements, particularly LINE elements and ALU repeats [47, 48], we analysed 1000bp of the DNA surrounding the CpGs in class HyperCancer and HypoCancer for the presence of LINE and ALU repeats using the UCSC [49] Genome Browser. Using the “custom annotation tracks” feature and reporting we were able to identify and count the number and position of these repeats. The results show that proportionally more LINE elements are associated with class HypoCancer, the hypomethylated group of CpGs and more Alu repeats are associated with class HyperCancer, the hypermethylated group of CpGs see Table 5. Discovered motifs We used MEME software [50] as described in the methods section to identify motifs that distinguish the 4 classes of CpG sites. Table 6 shows the top 5 motifs, based on p-value, which were found near the four classes of CpG site and their length and sequences, as determined by MEME. These motifs were then compared to known DNA binding protein motifs: The only one of significance was the M3A motif which binds OCT1 which is methylated in cisplatin resistant cells [51]. Interestingly STAT3 which is involved in cell division in cancer cells is moderated by OCT1 [52]. Classification results WEKA Analysis : Using 10-fold cross validation methodology we used 3 algorithms to classify the CpG sites according to their class, based on their motifs. 1) a support vector machine algorithm resulted in 69.5253 % being correctly classified 2) a logistic algorithm resulted in 73.9663 % 3) a J48 algorithm resulted in 71.2098 % correct prediction of each CpG site into one of the 4 classes (NM, AM classes, Hypermethylated in normal and Hypomethylated in cancer (classes C and A) or vice versa). Since the CpGs that distinguish between normal and cancer calls are of particular interest we performed a similar classification analysis using the Hypermethylated in normal and Hypomethylated in cancer or vice versa only. Using 10-fold cross validation methodology we used the 3 algorithms to classify the distinguishing CpG sites according to their class based on their motifs. 1) a support vector machine algorithm resulted in 98.2759 % correctly classified 2) a logistic algorithm resulted in 96.5517 % 3) a J48 algorithm resulted in 94.8276 % prediction of each of the 2 classes of CpG, Hypermethylated in normal and Hypomethylated in cancer or vice versa. Figure 5 illustrates that the m13C (TCCAAGGGACACC) motif doesn’t occur in the flanking DNA sequences of 50 out of 51 of the CpGs identified in class HyperCancer and occurs in all 7 of the sequences surrounding CpGs identified in class HypoCancer. This motif therefore is the most discriminative motif using the J48 algorithm to classify the CpGs into the 2 classes. Discussion In this study we have shown that it is possible to divide the CpG sites in the human genome into 4 classes based on the methylation status in normal and cancer cells across many forms of cancer using multiple data sets: sites that are hypomethylated in normal and hypermethylated in cancer (class HyperCancer), hypermethylated in normal and hypomethylated in cancer cells (class HypoCancer), sites that are always hypermethylated in both normal and cancer cells and sites that are always hypomethylated in both normal and cancer. Interestingly, the results show that by far the largest number of CpG sites are unmethylated in both the cancerous and normal cell states and that those CpG sites that are differentially methylated in cancer cells are methylated suggesting that the transition to the tumourgenetic phenotype involves the methylation of particular CpG sites, which may be the cause of aberrant gene expression found in cancer cells. We suggest further from these results that the sites in former two classes may be useful biomarkers for cancer cells when undertaking methylation analysis. The data used in this study was all that was available at the time and we acknowledge that as more data becomes available, most notably in the TCAG data base further work to validate these results will be required, using new software customised to analyse the data in these files which is in a different format to those in GEO. These results however represent a statistical analysis of an unbiased large sampling of the publicly available data and therefore suggest that our results will hold true for the whole population of data. The sites in the four classes were analysed for their distinguishing characteristics and properties. Firstly their position in relation to CpG islands was deduced. Sites that are never methylated are predominantly within CpG islands and sites that are aberrantly hyper or hypomethylated in cancer cells, are not, perhaps suggesting that islands afford protection against global methylation changes in cancer cells. The proximity to microRNA coding regions showed a greater percentage of the HyperCancer CpGs class are associated with one or more miRNAs coding sequences than any other class, with the HypoCancer class having none. Further, the number of times a particular miRNA coding region is associated with a class of CpG shows that never methylated CpGs had a greater number of microRNA sites associated with them per site, with some particular microRNAs identified repeatedly (up to 15 times). Several studies have provided evidence that disregulated miRNA expression contributes to the initiation and progression of human cancers [53, 54, 55, 56, 40]. Hypermethylation of micro RNAs has been shown to be present in many cancer types and could be the cause of this dysregulation. Thus it follows that the presence of miRNAs near CpGs could contribute to the hypermethylation of these CpGs in cancer cells. The genes within 1Kb of the Hypocancer or Hypercancer classes of CpG sites that show a distinction in methylation status were identified and functionally characterised. This analysis showed that these sites are associated with genes that are involved in initiating or maintaining the cancerous state of cells, with those associated with class HyperCancer enriched for their involvement in apoptosis. Further, the transcription factors predicted to bind the genes associated with class HyperCancer are enriched for those linked to apoptosis and tumourgenisis (including E2F) indicating a possible mechanism by which the aberrant methylation may exert an effect. This strongly suggests that the differential methylation seen in these sites influences functionally pathogenic processes seen in cancerous cells probably instigated through aberrant gene expression. LINE and Alu repeats associated with the differentially methylated classes were identified and the results showed that proportionally more LINE elements are associated with the HypoCancer class of CpGs and more Alu repeats are associated with the HyperCancer class of CpGs. Could LINE elements therefore protect against de novo methylation and Alu repeats render CpGs more susceptible? Interestingly hypomethylation of LINE-1 and Alu have been suggested to be the cause of global hypomethylation and genomic instability in many malignancies and autoimmune diseases [57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72] however not all Alu sequences are hypomethylated in human cancers [73]. Alu sequences located upstream of the CDKN2A promoter were found to be hypermethylated in cancer cell lines [74], and an Alu sequence located in intron 6 of TP53 showed extensive methylation in normal and cancer cells [74] and [75]. In order to see if the DNA sequence surrounding a CpG site has any influence on its methylation state MEME software was used to identify distinguishing motifs for each of the CpG classes and there similarity to known binding motifs for DNA binding proteins was determined. Of the motifs identified only one had similarity of note, which was the motif labelled M3A that showed similarity to the OCT1 motif, which has an involvement in cell division via STAT3 [76]. We were able to classify the distinguishing motifs that were identified to enable the classes to be distinguished based on DNA sequence alone and thus identify DNA sequences that could render CpG sites more prone to aberrant methylation in cancer cells. We were able to distinguish the 4 classes successfully with an accuracy of 74% and an attained an accuracy of 98% in distinguishing between sites that are hypo and hyper methylated in cancer cells. Thus we have shown that the sequence surrounding the CpG site has an influence on whether a site is aberrantly methylated in the oncogenic state and whether that aberrant methylation is hypo or hyper methylation. The motif that best distinguished the HyperCancer class from the HypoCancer class was the m13C motif, which contains the binding motif for the EBF1 and the RME transcription factors which have been shown to act as a tumour suppressor in multiple tumour types notably leukaemia’s and colon cancer [77,78]. The NR2F1 binding motif is also present in m13C, another transcription factor with oestrogen response element binding which is down regulated in many tumour types [79]. Also NR4A2 which is a nuclear orphan receptor involved in neoplasms and a potential therapy target binds to this sequence [80]. This suggests that this motif is highly susceptible to hypo methylation in the cancer cells as it is seen predominantly in the HypoCancer class and the demethylation of this motif may be linked to tumour suppression functionality response in these cells. Thus in summary, this study has shown that CpG sites in the human genome can be divided into four classes depending on their methylation status in diverse normal and cancer cell types. The two classes which show differential methylation in the normal and cancerous state show associations with genes and DNA features that are commensurate with the cancerous state. We show that only a distinct subset of CpG sites may need to be analysed for their methylation status to determine the cancerous state. In common with other more limited studies we have identified that there are DNA motifs surrounding a CpG that render them susceptible to methylation in the cancerous state. Further we show that CpG sites can be classified using the DNA sequence surrounding them into one of the four classes, showing that the methylation state of any given CpG can be predicted with a high degree of accuracy. Methods Datasets Selection As stated, most previous studies were focused on CpG islands but with the advance of technology there is now data available for single CpG sites not necessarily associated with islands using microarray technology. A widely used platform is HumanMethylation450k because of maximal coverage (in terms of number of CpGs analysed in the genome per chip) and data from samples and tissue types available. Here, we selected 16 data series to study cancer cells and respective normal controls using raw data (signal Intensities from GEO in tabular format) available for CpG sites contained in the HumanMethylation450K microarray (Table 1). We included one set of data that investigated only normal samples to ensure the number of normal data points to be nearer equal for normal and cancerous cells. The data series used for this work are listed in Table 1. They represent all the publicly available peer reviewed data sets obtainable at the time of undertaking the study. The platform soft files were downloaded from GEO and converted to table format with a custom filtering program (which merely filters out the data in the code of the file from other code and reformats it into a table format) consisting of 16 data series of 535 tissue samples of which 301 were from cancer samples and 234 were from normal samples. These consisted of 259,783,695 data points representing the methylation status of each particular CpG in a particular sample within a data set. We selected these 16 data series so as to examine the methylation across all cancer types possible and compare them with the wide variety of normal tissues. The data is from multiple individuals which allowed us to find common patterns between individuals as well as different cancer types. The individual samples selected for our analysis from each data set were either untreated tumour or untreated normal samples i.e. not all the samples from any one data set was included, only those appropriate to this study. We wished to identify which CpGs are methylated as part of the pathology common to all types of cancers in a variety of tissues in many individuals. No data is included from cell lines or treated cells, and all are from either normal tissue not adjacent to the tumour or cancerous cells from the patient thus the difference in the methylation that we see is not due to cell culture conditioning or neighbouring cell contamination. Additional matched control tissues from other studies were also included so as to make the number of control data sets the same as the number of cancerous ones which is important for our numerical based analyses. The data series were tissue matched, cancer with the same tissue control as far as possible with no one tissue type representing more than 40% of the samples thus 60% common methylation state at any one CpG was chosen as the threshold for the analyses to mitigate as far as possible any bias in tissue or cell type (publicly available data in diverse normal tissue types due to ethical considerations being the limiting factor). All the data sets were from experiments carried out using the same platform. i.e. HumanMethylation450k so that differences are due to the sample and not to the platform used. CpG sites identification Samples from the datasets were stored in two files, which were read by a Java program to identify CpG sites with specific defined criteria. Each of the files was read line by line to produce vectors of beta values. Any vectors which satisfy the following criteria were selected for further analysis: CpG sites for which all the samples’ beta values were more than 0.8 were defined as Hypermethylated CpG sites and sites which had beta values of less than 0.2 were defined as Hypomethylated as described in [81] for variably methylated sites. In order to identify four classes of CpG sites, four classes were defined: 1. Class HyperCancer were sites which are hypermethylated in 60% of cancer samples and hypomethylated in 60% of normal samples 2. Class HypoCancer were sites which are hypomethylated in 60% of cancer samples and 60% hypermethylated in normal samples 3. Class AM were sites that are always hypermethylated (where 99% percent of the samples have beta-values more than 0.8) in both normal and cancer cells 4. Class NM were sites that are never methylated (where 99% of samples have a beta-value less than 0.1) in both normal and cancer cells. CpG sites in each class with more than 50% overlap were removed. Motif Discovery The MEME (Multiple EM for Motif Elicitation) software suite (http://meme.nbcr.net) was used motifs discovery. We used default MEME settings with ZOOPS (zero or one motif per sequence) parameter, for discovering motifs for each class of identified CpG sites. Sixty bps of flanking sequence around each CpG site was used as input for the MEME analysis for each class and five best motifs according to their E-value as calculated in the MEME probability matrix were selected for further analysis with custom designed Java program. 20 motifs (5 for each CpG class) were used as input to the MAST tool to align these motifs against the 653 CpG DNA sequences in the four classes. The MAST program removed 2 motifs which have more than 60% overlap with others and so finally 18 motifs were selected by MAST used for further analysis. Using motifs for Classification A Java program was developed to convert the MAST hit results to a feature matrix and the results used in the Weka package (http://www.cs.waikato.ac.nz/~ml/weka/) to evaluate the potential of using these motifs for classification of four classes of CpG sites. Using three different machine learning methods and 10 fold cross validation CpG sites were classified according to their motifs. The input matrix was the CpG sites with their corresponding class, and the features are motifs which appear in the flanking DNA. Similar methods have been used in previous studies [82, 83]. J48, logistic and support vector machines were used as a classification tools for this purpose. Acknowledgements We acknowledge the support in kind of Brunel University and staff in the Departments of Computer Science and Biosciences. Figure Legends Figures 1 and 2 Graphs to show the number of CpG sites in each class and the positional relationships to CpG Islands. Figure 1 showing the number as a proportion of the total in each position relative to the CpG subdivided into classes. Figure 2 showing the number as a proportion of the total in each class, subdivided into positions relative to the CpG. Figure 3 Graph to show the percentage of each class of CpG that are associated with a microRNA site. Figure 4 Graph to show the number of times a particular miRNA species coding sequence occurs in the DNA sequence in the different classes of CpGs identified in this study. Figure 5 Weka result for the most discriminative motif using the J48 algorithm to classify the CpGs into the 2 classes. 0.0045 0.004 0.0035 0.003 Hypomethylated in normal Hypermethylated in cancer Hypermethylated in normal hypomethylated in cancer Never methylated 0.0025 0.002 0.0015 0.001 0.0005 Always Methylated 0 island N_Shelf N_Shore S_Shelf S_Shore Figure 1 1.2 1 0.8 island N_Shelf N_Shore S_Shelf S_Shore none 0.6 0.4 0.2 0 Hypomethylated in normal Hypermethylated in cancer Figure 2 Hypermethylated in normal hypomethylated in cancer Never methylated Always Methylated Figure 3 Figure 4 Figure 5 Tables Table 1 Data series and the samples contained in them used in this study. All were obtained from GEO http://www.ncbi.nlm.nih.gov/gds/ Series GSE20945 Title of study Transient low doses of DNAdemethylating agents exert durable antitumor effects on hematological and epithelial tumor cells Evaluation of the Infinium Methylation 450K technology Tissue and samples used Primary leukaemia untreated samples GSE30338 IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype GSE36278 Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma Glioma tumour samples making sure only the tumour samples were included Primary glioblastoma and nonneoplastic brain samples fine GSE37965 DNA methylation profiling in breast cancer discordant identical twins identifies DOK7 as novel epigenetic biomarker. DNA methylation alterations exhibit Whole blood this is whole blood from cancer patients and normals not the tumours Lymph node Kidney Soft tissue GSE29290 GSE38240 Normal breast and breast cancer tumour samples intraindividual stability and interindividual heterogeneity in prostate cancer metastases GSE38266, GSE38268 GSE30870 GSE31848 GSE32148 GSE33233 GSE34486 GSE36064 GSE39141 GSE42118 Identification and functional validation of HPV-mediated hypermethylation in head and neck squamous cell carcinoma Distinct DNA methylomes of newborns and centenarians Recurrent variations in DNA methylation in human pluripotent stem cells and their differentiated derivatives Genome-wide peripheral blood leukocyte DNA methylation microarrays identified a single association with inflammatory bowel diseases Distinct DNA methylomes of newborns and centenarians. DNA methylation regulates lineagespecifying genes in primary lymphatic and blood endothelial cells Age-associated DNA methylation in pediatric populations Genome-wide DNA methylation profiling predicts relapse in childhood B-cell acute lymphoblastic leukaemia DNA methylation changes are a late event in acute promyelocytic leukemia and coincide with loss of transcription factor binding Liver Subdural Bone Adrenal Prostate Spleen Bladder Lung Blood metastasis samples HPV- HNSCC tumour samples Whole blood , Cord Blood samples used as normal controls Somatic tissue various tissue type samples Peripheral Blood of normal individual samples Whole blood samples used as controls Dermal blood endothelial cells from normal _buttock samples White blood cells from healthy individuals bone marrow mononuclear cells from healthy person samples bone marrow from healthy donors samples Table 2 Distribution of unique microRNA sequences in three classes number RNAcode NM number RNAcode AM number RNAcode HyperCancer number RNAcode HypoCancer miR205/205a 5 1b 2 miR-193/193b/193a1 3p 4 2 miR-141/200a 3 1 miR-153 miR-33a-3p/365/3652 3p 2 miR130ac/301ab/301b/301 b3 3p/454/721/4295/3666 1 2 2 2 2 4 miR-216b/216b-5p 5 miR-23abc/23b-3p 6 miR-96/507/1271 7 miR-490-3p 1 1 1 1 3 miR-183 4 miR-18ab/4735-3p 5 miR-223 6 miR-191 7 miR-150/5127 miR93/93a/105/106a/291 a3p/294/295/302abcde /372/373/428/519a/5 20be/520acd8 3p/1378/1420ac 9 miR-203 miR-140/140-5p/87610 3p/1244 11 miR-26ab/1297/4465 12 miR-455-5p 13 miR-551a 14 miR-145 15 miR-204/204b/211 16 miR-208ab/208ab-3p 17 miR-499-5p 3 2 miR-451 miR146ac/14 3 6b-5p miR122/122a 4 /1352 1 1 1 2 2 1 1 1 1 1 1 1 1 Table 3 The genes neighbouring the CpGs found in Class HyperCancer are listed along with their function as identified by Cormine software UCSC_RefGene_Name PPFIA1 EXD3 PTPRCAP Function Cell motility, apoptosis. invasion suppressor gene, cell division and chromosome partitioning, cell motility, gene silencing activity Protein tyrosine phosphatase receptor, apoptosis LOC100129637 TMC6 Unknown DNA repair BIN2 endocytosis C17orf101 MAP1D oxidoreductase activity aminopeptidase activity, phosphorylation cytoskeletal protein, cell migration, apoptosis endocytosis, phagocytosis, apoptosis cell migration exonuclease activity, cell division, signal transduction, DNA replication regulation of leukocyte activation, cell SORBS2 ELMO1 ERI3 LAG3 Cancer Involvement Amplified breast and head and neck cancers (cell trying to avoid invasion) None known Hypermethylated in many cancers. Implicated in tumorigenesis Unknown Variants seen in Cervical Cancer Abrogated in Myeloproliferative neoplasms None known Over expressed in colon cancer Downregulated in pancreatic, thyroid and cervical cancer Promote cell invasion in ovary, colon and brain cancer Increased in breast cancer Involved in many different proliferation, apoptosis PLCB2 phospholipase C activity, calcium ion binding, signal transduction apoptosis SPN regulation of inflammatory response to antigenic stimulus, induction of apoptosis by extracellular signals NAD+ ADP-ribosyltransferase activity, cell proliferation, apoptosis PARP10 MYO1G myosin complex, cell division, DNA hypermethylation CD6 PC Cell Adhesion Molecule (CAM), apoptosis cell proliferation intracellular signaling cascade, small GTPase mediated signal transduction, cell proliferation, apoptosis Regulation of actin cytoskeleton, cell proliferation apoptosis MAPK signaling pathway, Apoptosis, RIG-I-like receptor signaling pathway, Adipocytokine signaling pathway, regulation of protein amino acid phosphorylation carbonate dehydratase activity, zinc ion binding, cell proliferation, apoptosis regulation of protein amino acid phosphorylation, cell migration DNA binding, lipid transporter activity chloride channel activity, embryo development protein tyrosine phosphatase activity, cell proliferation, apoptosis regulation of Ras protein signal transduction, gene expression protein amino acid phosphorylation, cell growth, apoptosis zinc ion binding, RING type, apoptosis, DNA methylation negative regulation of adaptive immune response, positive regulation of cell death, apoptosis transcription repressor activity MIR365-1 RADIL Not known cell adhesion, forkhead and RAS RAPGEF1 NCKAP1L TRAF5 C3orf21 CA6 CCDC88C TNRC18 ANO8 PTPN7 TBC1D16 STK16 RFFL SPN cancers assisting in detection avoidance and resistance to apoptosis Highly expressed in Breast cancer promoting mitosis and migration of tumour cells Significantly expressed in lymphomas Inhibits transformation of cells, in KEGG small cell lung cancer Involved in survival leukaemia and breast cancer cell Aberrantly expressed in leukemia Upregulation in breast, lung, gastrointestinal and gynaecological cancers Down regulated in many cancers Expressed in lymphomas and small cell lung cancer None known Expressed in ovarian and breast cancers Involved in tumour invasion None known Over expressed in many cancers Implicated in blood cancers Involved in melanoma progression Over expressed in tumour cells Involved in myeloma Supressed in many tumours Upregulated in many tumours (renal, small cell lung, sarcoma) Not known None Known associated proteolysis, macromolecule catabolic process, cell proliferation, cell cycle lamin filament, cytoskeleton, cell cycle, methylation, apoptosis FBXL16 LMNB2 JAK3 Down regulated in many cancers Down regulated in prostate, gastric, skin and leukaemia cancers Upregulated in many cancers positive regulation of leukocyte activation, apoptosis, signal transduction, phosphorylation ATP-activated inward rectifier Upregulated in potassium channel activity, nasopharyngeal carcinoma vasodilation, apoptosis, gene expression KCNJ8 Table 4 The genes neighbouring the CpGs found in Class HypoCancer along with their function as described by Cormine software http://www.coremine.com/medical/. UCSC RefGene Name RPTOR C22orf9 NOS1AP RGS12 Function Androgen receptor activity, kinase activity, telomerase activity, kinase activity, cell growth, cell cycle, insulin signalling Not Known Signal transduction, gene expression, cell migration, cell proliferation Signal transduction, cell cycle. RNA interference, apoptosis, SNAP receptor activity Cancer Involvement Up regulated in multiple cancers None Known Associated with breast cancer progression Mutated in colorectal tumours Table 5 The number and proportion of CpGs associated with LINE elements and Alu repeats. Class No.CpGs in Class HyperCancer HypoCancer 52 7 Line Elements: Total in class (% of CpGs having one or more) 26 (33%) 12 (71%) Alu repeats: Total in class (% of CpGs having one or more) 49 (44%) 4 (29%) Table 6 The top 5 motifs, based on p-value, which were found near the four classes of CpG site and their length and sequences, as determined by MEME. Width id m1A m2A m3A m4A m5A m6AM m7AM m8AM m9AM m10AM m11C m12C m13C m14C m16NM m17NM m19NM m20NM class HyperCancer HyperCancer HyperCancer HyperCancer HyperCancer AM AM AM AM AM HypoCancer HypoCancer HypoCancer HypoCancer NM NM NM NM HyperCancer HypoCancer AM NM normal_hypomethylated_cancer_hypermethylated normal_hypermethyl_cancer_hypomethyl always methylated never methylated 11 19 29 11 15 50 29 48 41 29 13 14 13 8 15 15 10 27 motif proportion in the class 0.652173913 AAGACAGGAAG 0.190751445 GGGGAGGGGGGGGCGGAGG 1 ATTATTGAGTATCACTTTGTATATCTTTT 0.578947368 CACACCGTCCT 0.333333333 AGCAGGAGAAGCAGG 0.6875 TCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACC 0.875 GCTTTTTAGAGACGGAGTCTCGCTCTGTT 0.333333333 TGAGAGGCGCTTGCGGGCCAGCCGGAGTTCCCGGTGGGCATGGGCTTG 0.405405405 GGTGACGAGGCGCGACAGGGTGACGAGGCGCGATTGGGTGA 0.459459459 TGGGTGAGGAGGCGCGACTCGGTGATGAG 0.75 TTTAAATTCATTT 0.194444444 CTTCCAGGCTTGGT 0.666666667 TCCAAGGGACAGC 0.272727273 TGAGGAAT 0.736842105 TTTCCTTTTTCTTGT 0.95 AGTGCGCATGCGCAG 0.952380952 CACTTCCGGT 0.807692308 CGCGCGGCATGCCGGGACTTGTAGTTC References 1. Bird, A.P. and Wolffe,A.P. (1999) Methylation-induced repression— belts, braces, and chromatin. Cell, 99: 451-454. 2. Baylin S.B. (2005) DNA methylation and gene silencing in cancer Nature Clinical Practice Oncology http://lists.bilkent.edu.tr/~science/MBG523/Lectures/Epigenetics%20articles/D NA%20Methy.%20and%20Gene%20silenc.%20in%20cancer.pdf accessed 21/02/2014. 3. Heyn, H., Carmona, F.J., Gomez, A., Ferreira, H.J., Bell, J.T., Sayols, S., Ward, K., Stefansson, O.A., Moran, S., Sandoval, J., Eyfjord, J.E., Spector, T.D. And Esteller, M. (2013) DNA methylation profiling in breast cancer discordant identical twins identifies DOK7 as novel epigenetic biomarker. Carcinogenesis, 34(1): 102-108. 4. Fukushige S, Horii A. (2013) DNA methylation in cancer: a gene silencing mechanism and the clinical potential of its biomarkers. Tohoku J Exp Med., 229(3):173-85. 5. Klose R.J. and Bird A.P. (2006) Genomic DNA methylation: The mark and its mediators. Trends Biochem. Sci. 31: 89-97. 6. Das PM and Singal R. (2004) DNA methylation and cancer. Journal of Clinical Oncology, 22: 4632-4642 7. Taberlay PC, PA Jones. (2011) DNA methylation and cancer Epigenetics and Disease, - Springer. http://www.springer.com/cda/content/document/cda_downloaddocument/9783 764389888-c1.pdf?SGWID=0-0-45-1004851-p174022756 (accessed 17/02/14) 8. Shames, D. S., Girard, L., Gao, B., Sato, M., Lewis, C. M., et al. (2006) A genome-wide screen for promoter methylation in lung cancer identifies novel methylation markers for multiple malignancies. PLoS Medicine, 3: e486 9. Michaelson-Cohen R, Keshet I, Straussman R, Hecht M, Cedar H, Beller U. (2011) Genome-wide de novo methylation in epithelial ovarian cancer. Int J Gynecol Cancer. 21(2): 269-79. 10. Gama-Sosa,M.A., Slagel,V.A., Trewyn,R.W., Oxenhandler,R.,Kuo,K.C., Gehrke,C.W. and Ehrlich,M. (1983) The 5-methylcytosinecontent of DNA from human tumors. Nucleic Acids Res., 11: 6883–6894. 11. Feinberg,A.P. and Vogelstein,B. (1983) Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature, 301: 89–92. 12. Feinberg, A.P., Gehrke,C.W., Kuo,K.C. and Ehrlich,M. (1988) Reducedgenomic 5-methylcytosine content in human colonic neoplasia. Cancer Res., 48: 1159–1161. 13. Cho,D.H., Thienes,C.P., Mahoney,S.E., Analau,E., Filippova,G.N. and Tapscott,S.J. (2005) Antisense transcription and heterochromatin at the DM1 CTG repeats are constrained by CTCF. Mol. Cell, 20: 483-489. 14. McKinnell,I.W., Ishibashi,J., Le Grand,F., Punch,V.G., Addicks,G.C., Greenblatt,J.F., Dilworth,F.J. and Rudnicki,M.A. (2008) Pax7 activates myogenic genes by recruitment of a histone methyltransferase complex. Nat. Cell Biol., 10: 77-84. 15. De Biase,I., Chutake,Y.K., Rindler,P.M. and Bidichandani,S.I. (2009) Epigenetic silencing in friedreich ataxia is associated with depletion of CTCF (CCCTC-binding factor) and antisense transcription. PLoS One, 4: e7914. 16. Gebhard C, Benner C, Ehrich M, Schwarzfische L (2010) General transcription factor binding at CpG islands in normal cells correlates with resistance to de novo DNA methylation in cancer cells. Cancer Res; 70(4): 1398–407. 17. An Integrated Encyclopedia of DNA Elements in the Human Genome The ENCODE Project Consortium. (2012) Nature doi: 10.1038/nature11247 18. Bock,C. and Lengauer,T. (2008) Computational epigenetics. Bioinformatics, 24: 1-10. 19. Yamada,Y. and Satou,K. (2008) Prediction of genomic methylation status on CpG islands using DNA sequence features. WSEAS Transactions on Biology and Biomedicine, 5: 153-162. 20. Ali,I. and Seker,H. (2010) A comparative study for characterisation and prediction of tissue-specific DNA methylation of CpG islands in chromosomes 6, 20 and 22. Conf. Proc. IEEE Eng. Med. Biol. Soc., 18321835. 21. Previti,C., Harari,O., Zwir,I. and del Val,C. (2009) Profile analysis and prediction of tissue-specific CpG island methylation classes. BMC Bioinformatics, 10: 116. 22. Glass JL1, Fazzari MJ, Ferguson-Smith AC, Greally JM. (2009) CG dinucleotide periodicities recognized by the Dnmt3a-Dnmt3L complex are distinctive at retroelements and imprinted domains. Mamm Genome. 20(9-10): 633-43. 23. Grewal, S.I.S. and Jia,S. (2007) Heterochromatin revisited. Nat. Rev. Genet., 8: 35-46. 24. Filippova,G.N., Thienes,C.P., Penn,B.H., Cho,D.H., Hu,Y.J., Moore,J.M., Klesert,T.R., Lobanenkov,V.V. and Tapscott,S.J. (2001) CTCF-binding sites flank CTG/CAG repeats and form a methylation-sensitive insulator at the DM1 locus. Nat. Genet., 28: 335-343. 25. Lienert F, Wirbelauer C, Som I, Dean A, Mohn F, Schübeler D. (2011) Identification of genetic elements that autonomously determine DNA methylation states. Nat Genet. 43(11):1091-7 26. Okitsu CY, Hsieh CL. (2007) DNA methylation dictates histone H3K4 methylation. Mol Cell Biol. 27(7):2746-57 27. Ooi SK, Qiu C, Bernstein E, Li K, Jia D, Yang Z, Erdjument-Bromage H, Tempst P, Lin SP, Allis CD, Cheng X, Bestor TH.: (2007) DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature. 448(7154):714-7. 28. Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Schöler A, van Nimwegen E, Wirbelauer C, Oakeley EJ, Gaidatzis D, Tiwari VK, Schübeler D. (2011) DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 480(7378):490-5. 29. Weber M, Hellmann I, Stadler MB, Ramos L, Pääbo S, Rebhan M, Schübeler D. (2007) Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 39(4):45766. 30. Yuan,G. (2011) Prediction of epigenetic target sites by using genomic DNA sequence. In Anonymous Handbook of Research on Computational and Systems Biology: Interdisciplinary Applications. IGI Global, pp. 187-201. 31. Feltus, F.A., Lee, E.K., Costello, J.F., Plass, C. And Vertino, P.M. (2006) DNA motifs associated with aberrant CpG island methylation. Genomics, 87(5): 572-579. 32. Lu,L., Lin,K., Qian,Z., Li,H., Cai,Y. and Li,Y. (2010) Predicting DNA methylation status using word composition. Journal of Biomedical Science and Engineering, 3: 672-676. 33. Ghorbani M, Taylor SJ, Pook MA, Payne A. (2013) Comparative (computational) analysis of the DNA methylation status of trinucleotide repeat expansion diseases. J Nucleic Acids.; 689798. 34. McCabe,M.T., Lee,E.K. and Vertino,P.M. (2009) A multifactorial signature of DNA sequence and polycomb binding predicts aberrant CpG island methylation. Cancer Res., 69: 282-291. 35. Hervouet,E., Vallette,F.M. and Cartron,P.F. (2009) Dnmt3/transcription factor interactions as crucial players in targeted DNA methylation. Epigenetics, 4: 487-499. 36. Leung DC, Dong KB, Maksakova IA, Goyal P, Appanah R, Lee S, Tachibana M, Shinkai Y, Lehnertz B, Mager DL, Rossi F, Lorincz MC. (2011) Lysine methyltransferase G9a is required for de novo DNA methylation and the establishment, but not the maintenance, of proviral silencing. Proc Natl Acad Sci U S A. 108(14):5718-23. 37. Ooi SK1, Wolf D, Hartung O, Agarwal S, Daley GQ, Goff SP, Bestor TH. (2010) Dynamic instability of genomic methylation patterns in pluripotent stem cells. Epigenetics Chromatin. 3(1):17. 38. Rowe HM, Friedli M, Offner S, Verp S, Mesnard D, Marquis J, Aktas T, Trono D. (2013) De novo DNA methylation of endogenous retroviruses is shaped by KRAB-ZFPs/KAP1 and ESET. Development. 140(3):519-29. 39. Iorio,M.V., Piovan,C. and Croce,C.M. (2010) Interplay between microRNAs and the epigenetic machinery: An intricate network. Biochimica Et Biophysica Acta (BBA) - Gene Regulatory Mechanisms, 1799, 694-701. 40. Vrba L, Muñoz-Rodríguez JL, Stampfer MR, Futscher BW. (2013) miRNA Gene Promoters Are Frequent Targets of Aberrant DNA Methylation in Human Breast Cancer. PLoS ONE 8(1): e54398. 41. Huang DW, Sherman BT, Lempicki RA. (2009a) Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources. Nature Protoc.; 4(1):44-57. 42. Huang DW, Sherman BT, Lempicki RA. (2009b) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res.; 37(1):1-13. 43. Ho-Sui SJ, Mortimer J, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP and Wasserman WW. (2005) oPOSSUM: Identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res. 33(10):3154-64. 44. Inoue M1, Takahashi K, Niide O, Shibata M, Fukuzawa M, Ra C. (2005) LDOC1, a novel MZF-1-interacting protein, induces apoptosis. FEBS Lett. 579(3):604-8. 45. Hsieh YH, Wu TT, Huang CY, Hsieh YS, Liu JY. (2007) Suppression of tumorigenicity of human hepatocellular carcinoma cells by antisense oligonucleotide MZF-1. Chin J Physiol. 50(1):9-15 46. Rowland B D., Bernards R and Peeper D S. (2005) The KLF4 tumour suppressor is a transcriptional repressor of p53 that acts as a contextdependent oncogene Nature Cell Biology 7: 1074 - 1082 47. Weisenberger D J., Campan M, Long T I., Kim M, Woods C. (2005) Analysis of repetitive element DNA methylation by MethyLight. Nucleic Acids Res. 33(21): 6823–6836 48. Walters RJ, Williamson EJ, English DR, Young JP, Rosty C, Clendenning M, Walsh MD, Parry S, Ahnen DJ, Baron JA, Win AK, Giles GG, Hopper JL, Jenkins MA, Buchanan DD. (2013) Association between hypermethylation of DNA repetitive elements in white blood cell DNA and early-onset colorectal cancer. Epigenetics. 8(7):748-55. 49. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. (2002) UCSC Genome Browser: The human genome browser at UCSC. Genome Res. 6:996-1006. 50. Bailey T. L. (2006) MEME:discovering and analysing DNA and protein sequence motifs. Nucleic Acids Res. 34 51. Lin R, Li X, Li J, Zhang L, Xu F, Chu Y, Li J. (2013) Long-term cisplatin exposure promotes methylation of the OCT1 gene in human esophageal cancer cells. Dig Dis Sci. 58(3):694-8. 52. Wang Z, Zhu S, Shen M, Liu J, Wang M, Li C, Wang Y, Deng A, Mei Q. (2013) STAT3 is involved in esophageal carcinogenesis through regulation of Oct-1. Carcinogenesis. 34(3):678-88. 53. Croce C.M. (2009) Causes and consequences of microRNA dysregulation in cancer Nat. Rev. Genet., 10: 704–714 54. Esquela-Kerscher, F.J. Slack. (2006) Oncomirs - microRNAs with a role in cancer. Nat. Rev. Cancer, 6: 259–269 55. Esteller M. (2011) Non-coding RNAs in human disease Nat. Rev. Genet.,.12: 861–874 56. Suzuki H, Maruyama R, Yamamoto E, Kai M. (2012) DNA methylation and microRNA dysregulation in cancer Molecular Oncology 6: 567–578 57. Chalitchagorn K, Shuangshoti S, Hourpai N, Kongruttanachok N, Tangkijvanich P, Thong-ngam D, et al. (2004) Distinctive pattern of LINE-1 methylation level in normal tissues and the association with carcinogenesis. Oncogene.; 23:8841-6. 58. Schulz WA. (2006) L1 retrotransposons in human cancers. J Biomed Biotechnol.:83672. 59. Estecio MR, Gharibyan V, Shen L, Ibrahim AE, Doshi K, He R, et al. (2007) LINE-1 hypomethylation in cancer is highly variable and inversely correlated with microsatellite instability. PLoS One.;2:e399. 60. Hsiung DT, Marsit CJ, Houseman EA, Eddy K, Furniss CS, McClean MD, et al. (2007) Global DNA methylation level in whole blood as a biomarker in head and neck squamous cell carcinoma. Cancer Epidemiol Biomarkers Prev.; 16:108-14. 61. Cho NY, Kim BH, Choi M, Yoo EJ, Moon KC, Cho YM, et al. (2007) Hypermethylation of CpG island loci and hypomethylation of LINE-1 and Alu repeats in prostate adenocarcinoma and their relationship to clinicopathological features. J Pathol. 211:269-77. 62. Matsuzaki K, Deng G, Tanaka H, Kakar S, Miura S, Kim YS. (2005) The relationship between global methylation level, loss of heterozygosity, and microsatellite instability in sporadic colorectal cancer. Clin Cancer Res.; 11:8564-9. 63. Perrin D1, Ballestar E, Fraga MF, Frappart L, Esteller M, Guerin JF, Dante R. (2007) Specific hypermethylation of LINE-1 elements during abnormal overgrowth and differentiation of human placenta. Oncogene. 26(17):2518-24. 64. Pattamadilok J, Huapai N, Rattanatanyong P, Vasurattana A, Perrin D, Ballestar E, Fraga MF, Frappart L, Esteller M, Guerin JF, et al.: (2007) Specific hypermethylation of LINE-1 elements during abnormal overgrowth and differentiation of human placenta. Oncogene. 26:251824. 65. Pattamadilok J, Huapai N, Rattanatanyong P, Vasurattana A, Triratanachat S, Tresukosol D, et al. (2008) LINE-1 hypomethylation level as a potential prognostic factor for epithelial ovarian cancer. Int J Gynecol Cancer 18:711–7. 66. Moore LE, Pfeiffer RM, Poscablo C, Real FX, Kogevinas M, Silverman D, et al. (2008) Genomic DNA hypomethylation as a biomarker for bladder cancer susceptibility in the Spanish Bladder Cancer Study: a casecontrol study. Lancet Oncol. 9:359-66. 67. Smith IM, Mydlarz WK, Mithani SK, Califano JA. (2007) DNA global hypomethylation in squamous cell head and neck cancer associated with smoking, alcohol consumption and stage. Int J Cancer. 121:1724-8. 68. Subbalekha K, Pimkhaokham A, Pavasant P, Chindavijak S, Phokaew C, Shuangshoti S, et al. (2009) Detection of LINE-1s hypomethylation in oral rinses of oral squamous cell carcinoma patients. Oral Oncol. 45:184-91. 69. Karouzakis E, Gay RE, Michel BA, Gay S, Neidhart M. (2009) DNA hypomethylation in rheumatoid arthritis synovial fibroblasts. Arthritis Rheum. 60:3613-22. 70. Choi IS, Estecio MR, Nagano Y, Kim do H, White JA, Yao JC, et al. (2007) Hypomethylation of LINE-1 and Alu in well-differentiated neuroendocrine tumors (pancreatic endocrine tumors and carcinoid tumors). Mod Pathol. 20:802-10. 71. Roman-Gomez J, Jimenez-Velasco A, Agirre X, Castillejo JA, Navarro G, San Jose-Eneriz E, et al. (2008) Repetitive DNA hypomethylation in the advanced phase of chronic myeloid leukemia. Leuk Res. 32:487-90. 72. Lee HS, Kim BH, Cho NY, Yoo EJ, Choi M, Shin SH, et al. (2009) Prognostic implications of and relationship between CpG island hypermethylation and repetitive DNA hypomethylation in hepatocellular carcinoma. Clin Cancer Res. 15:812-20 73. Fiala E, Ehrlich M and. Laird P W: (2005) Association between hypermethylation of DNA repetitive elements in white blood cell DNA and early-onset colorectal cancer. Nucleic Acids Research, 33, 21: 6823– 6836 74. Weisenberger, D.J., Velicescu, M., Cheng, J.C., Gonzales, F.A., Liang, G., Jones, P.A. (2004) Role of the DNA methyltransferase variant DNMT3b3 in DNA methylation Mol. Cancer Res. 262–72 75. Magewu, A.N. and Jones, P.A. (1994) Ubiquitous and tenacious methylation of the CpG site in codon 248 of the p53 gene may explain its frequent appearance as a mutational hot spot in human cancer Mol. Cell. Biol. 14: 4225–4232 76. Zhipeng Wang,Shaojun Zhu,Min Shen,Juanjuan Liu, Meng Wang, Chen Li, Yukun Wang, Anmei Deng and Qibing Mei (2013) STAT3 is involved in esophageal carcinogenesis through regulation of Oct-1 Carcinogenesis 34 (3): 678-688. 77. Liao D (2009) Emerging roles of the EBF family of transcription factors in tumor suppression. Mol Cancer Res. 7(12):1893-901 78. Chen F, Song J, Di J, Zhang Q, Tian H, Zheng J. (2012) IRF1 suppresses Ki-67 promoter activity through interfering with Sp1 activation. Tumour Biol. 33(6):2217-25 79. Thompson VC. Day TK, Bianco-Miotto T, Selth LA, Han G, Thomas M, Buchanan G, Scher HI, Nelson CC; Australian Prostate Cancer BioResource, Greenberg NM, Butler LM, Tilley WD. (2012) A gene signature identified using a mouse model of androgen receptor-dependent prostate cancer predicts biochemical relapse in human disease. Int J Cancer. 131(3):66272 80. Deutsch AJ., Angerer H, Fuchs TE, Neumeister P. (2012) The Nuclear Orphan Receptors NR4A as Therapeutic Target in Cancer Therapy. Anticancer Agents Med Chem. 12(9):1001-14 81. Du P, Zhang X, Huang C-C, Jafari N, Kibbe W A, Hou L, and Lin S M (2010) Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 11: 587. 82. Bock, C., Walter, J., Paulsen, M. and Lengauer, T. (2007) CpG Island Mapping by Epigenome Prediction, PLoS Comput Biol vol. 3, no. 6, pp. E110 83. Wrzodek, C., Büchel, F., Hinselmann, G., Eichner, J., Mittag, F. and Zell, A. (2012) Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands, PLoS ONE , vol. 7, no. 4, pp. e35327
© Copyright 2026 Paperzz