Supplemental Materials Positional effects revealed in Illumina Methylation Array and the impact on analysis Chuan Jiao1, Chunling Zhang2, Rujia Dai1, Yan Xia1, Kangli Wang1, Gina Giase3, Chao Chen1,* and Chunyu Liu1, 3,* 1 The State Key Laboratory of Medical Genetics, Central South University, Changsha, Hunan, 410012, China 2 Center for Research Informatics, University of Chicago, Chicago, IL, 60607, USA 3 Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, 60607, USA * To whom correspondence should be addressed. Tel: (+1)3124132599; Email: [email protected]. Correspondence may also be addressed to Chao Chen. Tel: (+86)18874114280; Fax: (+86)0731-84478152; Email: [email protected]. Here we first processed the datasets by the same pipeline, and corrected the batch and positional effects by ComBat and lm function in R. Second, we use four evaluation methods to estimate the corrected methods of position, and we demonstrate that the positional effect exists in the Illumina HumanMethylation BeadChip. Last, to provide guidance when analyzing the Illumina Infinium HumanMethylation datasets, we proposed a method to control the artifact. Figure S1. The sample maps of Methyl450 (left) and Methyl27 (right). The numbers from one to twelve are the position identifiers we used in the analysis. Figures obtained from Illumina website (http://www.illumina.com/). Other datasets results We regarded the arrays as the batches in some datasets (GSE58885, GSE26133, BrainCloud and GSE38873), here we don’t list the results of the boxplot in different batches. Figure S2. The distribution in GSE74193 eight different processed datasets. (a) The line chart of average methylation levels in different positions. (b) The boxplot of methylation levels in different batches. Figure S3. Average methylation levels in different positions. (a) GSE58885 dataset (b) GSE38873 (c) GSE26133 dataset (d) BrainCloud data. Figure S4. PVCA results in 450k datasets. (a) GSE58885 dataset. PVCA estimated the contribution of each factor to the overall variation. We considered four possible sources of variation, including: days post-conception (DPC), sex, batches (Array) and positions (Position); as well as the weight of residual effect (resid) that known factors could not explain. (b) GSE74193 dataset. We considered six possible sources of variation, including: Age, Disease status (Dx), Race, Sex, Batches and Positions (Position); as well as the weight of residual effect (resid) that known factors could not explain. Figure S5. PVCA results in 27k datasets. (a) GSE38873dataset. PVCA estimated the contribution of each factor to the overall variation. We considered nine possible sources of variation, including: Age, Sex, Disease status (Dx), life time use of antipsychotics (LTantiPSY), Smoking, BrainPH, Postmortem interval (PMI), Batch and Position; as well as the weight of residual effect (resid) that known factors could not explain. (b) GSE26133 dataset. We considered four possible sources of variation, including: Gender, Array, Batch and Position; as well as the weight of residual effect (resid) that known factors could not explain. (c) BrainCloud dataset. We considered five possible sources of variation, including: Age, Sex, Race, Batch and Position; as well as the weight of residual effect (resid) that known factors could not explain. Figure S6. The comparison between different processed datasets in GSE74193. The Fig.S5a, S5b, S5c and S5d are used to compare the correction methods of positional effects, the Fig.S5e is used to compare the correction order of positional effects and batch effect, the Fig.S5f are used to evaluate the efficiency of correction of positional effects. (a) Pos(ComBat)_data versus Pos(lm)_Data , (b) BatchPos(ComBat)_data versus Batch(ComBat)Pos(lm)_data, (c) PosBatch(ComBat)_data versus Pos(lm)Batch(ComBat)_data, (d) PosBatch(ComBat)_data versus FN_data, (e) PosBatch(ComBat)_data versus BatchPos(ComBat)_data, (f) PosBatch(ComBat)_data versus Batch_data, (g) PosBatch(ComBat)_data versus Raw_data. The red lines mean y=x. The top left corner values reveal the Wilcoxon signed-rank one-tailed test result, the W is a test statistic means the sum of the signed ranks, which can be compared to a critical value from a reference table to get a pvalue.
© Copyright 2025 Paperzz