Supplementary Information A Network Biology Approach to Prostate Cancer Ayla Ergün, Carolyn A. Lawrence, Michael A. Kohanski, Timothy A. Brennan & James J. Collins 1 The MNI Algorithm The MNI algorithm operates in two phases to determine significant genetic mediators of a condition of interest, e.g., a disease. In phase one (network identification phase), a network is derived from an N M training set of microarray expression data, consisting of measurements of steady-state expression ratios of N transcripts in M experiments. In phase two (mediator determination phase), the trained regulatory network is used as a filter to determine the genes affected by a test condition. For this study, the training set consisted of 1144 expression profiles from solid human tumor samples and cancerous cell lines. As test profiles, we used 14 non-recurrent primary, 9 recurrent primary and 9 distant metastatic prostate cancer samples from La Tulippe et al (2002). In Sections 1.1-1.4, we present a summary of the MNI algorithm, which is described in greater detail by di Bernardo et al (2005). 1.1 Model Structure To predict the genetic mediators of a disease, the MNI algorithm first infers a model of regulatory influences in a cell. The model relates changes in gene transcript concentrations to each other, and it is not constrained by any apriori knowledge. The transcript synthesis rate is modeled as a function of the influence of all other transcript concentrations and external influences on the transcript i as follows: yi ui y j ij di yi . n (1) j 1 where yi represents the concentration of transcript i , nij represents the influence of transcript j on transcript i , di represents the degradation rate of transcript i , and ui represents the net external influences on the transcription rate of transcript i . Since concentrations and not transcription rates of mRNA are known, a simplifying assumption is made that transcript concentrations are measured under steady-state conditions. With this assumption, the model becomes: yi 0 ui y j ij di yi . n (2) j We compute the measurements relative to a baseline, and with the assumption that the degradation rate is the same for the baseline-level transcripts, we make the following transformation: yi u i yib uib nij yj j y . jb (3) By taking the logarithm of both sides of Equation 3 and by substituting variables and parameters, the model is reduced to a log-linear system of equations: a x ij j pi , where (4) j nij , aij nij 1, ji j i yj x j log 2 , and y jb u pi log 2 i . uib 2 The model coefficients aij , which constitute the connectivity matrix A , represent the influence of the concentrations of transcript j on the synthesis rate of transcript i . The variables x j are the log-transformed expression-change ratios of each transcript j , and constitute the columns of the training matrix, X . The variables pi are the net external influences on transcript i and constitute the rows of the matrix P . This external perturbations matrix P is used to determine the transcripts that are most inconsistent with the network, and therefore most likely genetic mediators of a disease. 1.2 Phase I: Network Identification The first task of the MNI algorithm is to learn the network model coefficients aij . For the algorithm, the training matrix X is known while the connectivity matrix A and the external perturbations matrix P are unknown. To estimate the network model A , with no prior knowledge on P , the MNI algorithm uses a recursive strategy. The algorithm starts by using a naïve model of the regulatory structure. Initially, A is specified to account only for selfdegradation: aij 1 for i j and aij 0 otherwise. This begins the first iteration, where P is estimated directly from A and X . An external influence is considered significant if it satisfies: pˆ il max( pˆ il ). 1l M (5) where is the significance threshold. We chose 0.25 for our study, i.e., an estimate of the external influence on transcript i is considered significant if it is greater than 25% of the maximum absolute value of pil in all experiments l 1,..., M . To finish the iteration, experiments with insignificant perturbations are removed from the training set, and a new connectivity matrix is determined, using linear regression to solve the following equation for aij: a x ij j pi . (6) j 3 This begins the second iteration, where pi is re-estimated using the newly calculated aij coefficients. This iterative estimation is repeated five times, or until convergence of the variables. 1.3 Phase II: Determination of Disease Mediators Once a network has been estimated, transcripts that are significantly perturbed in the test profile can be identified. A test expression profile, x jc , where the index jc represents the expression of transcript j in response to a test condition c for j 1,..., N , is tested against the network to determine its external influences using the following equation: aˆ ic x jc pˆ ic (7) j where pˆ it represents the estimated external influences on each transcript i . The significance of each perturbation is determined by calculating two z-scores based on the perturbation size and estimation error: z ic pˆ ic (normal) ic (8.1) z imc zic xic (modified) (8.2) The modified z-score is designed to boost the likelihood of including genes with significant changes in the test expression profile. In Equation 8.1, ic represents the standard deviation of the perturbation value, which is calculated by applying propagation of error to Equation 6, yielding: i2c ijk x jc xkc aˆij2 var( x jc ) , j k (9) j 4 where ijk are the elements of the covariance matrix calculated from aˆij . Transcripts are ranked according to the normal and modified z-scores, where those with the highest absolute z-score are determined to be those most significantly contributing to disease mediators in each case. The final list is chosen to be that with the highest mean z-score, between the normal or modified z-score lists. 1.4 Singular Value Decomposition For the training data set, we used a training matrix of size 12600 1144 . Therefore, calculating aij in Equation 7 is an under-determined problem. In order to find a unique solution, we used a dimensional reduction strategy based on singular value decomposition (SVD) to reduce the size of the training set. Using SVD, the training matrix, X , is decomposed as: X USV T (10) where U is an N M matrix, S is a diagonal matrix of dimension M M containing the singular values of X , and V is an M M matrix containing the principal components of the transcript expression profiles in columns. Q principal components based on the largest singular values are chosen ( Q M ). The Q profiles serve as the characteristic expression profiles for the N transcripts and describe most of the expression variation in the N transcripts. Using the approach specified by Everitt and Dunn (2001), each singular value was considered significant if its relative variance was greater than or equal to the threshold of 0.7/ n , where n =1144 is the number of experiments. The relative variance of a singular value is calculated as the square of that singular value divided by the sum of the squares of all of the singular values. Based on this approach, Q was set to 62 for this study. Thus, for network determination in this study, X was redefined as a 62 1144 matrix of characteristic (“metagene”) expression profiles. Using the characteristic profiles, X is approximated as follows: 5 X UQ SQV T (11) where U Q is an N Q matrix and contains only the first Q columns of U , and S Q is a Q M diagonal matrix of the largest Q singular values. These matrices are used to transform X into the metagene space and back into the N -dimensional transcript space to perform the network identification and genetic mediator determination. To improve the specificity of the algorithm, a “tournament” approach was adapted, which requires that the MNI algorithm is run iteratively on increasingly smaller subsets of the transcripts. For this study, three tournaments were conducted, and after each tournament, we kept the top third of the transcripts, i.e., those with the highest z-scores. A final tournament was conducted by applying MNI to the 200 transcripts with the highest z-scores. This procedure was repeated for 14 non-recurrent primary, 9 recurrent primary and 9 distant metastatic prostate cancer samples from La Tulippe et al (2002). We set to zero the z-score for transcripts that were not within the list of the top 200 transcripts identified as mediators for a given sample. To identify a characteristic list of genes within each group (i.e., nonrecurrent primary, recurrent primary and metastatic prostate cancer), the normal or modified z-scores (chosen by the algorithm), were averaged and ranked across samples and across transcripts for the corresponding genes. 2 Supplementary Results Several of the genes identified in the metastatic prostate cancer group, in addition to these associated with the AR signaling pathway, have been identified as key genetic mediators of prostate cancer (Supplementary Table 1). For example, AMACR, α-methylacyl-CoA racemase, is an enzyme that is consistently upregulated in metastatic prostate cancer (Luo et al, 2002; Rubin et al, 2002; Rubin et al, 2005). CDH17 is a member of the cadherin superfamily, which mediate cell-cell adhesion and have been implicated in prostate cancer metastasis and invasion (Tomita, 2004). VDR, the vitamin D receptor, is a member of the 6 steroid receptor family, and there are data indicating that the hormonal form of vitamin D promotes the differentiation and inhibits the proliferation, invasiveness, and metastasis of human prostatic cancer cells (Feldman et al, 2000). The biological actions of vitamin D are primarily mediated by the vitamin D receptor, and gene polymorphisms in VDR have been linked to a heightened risk of advanced prostate cancer (Stewart and Weigel, 2004; John et al, 2005). Several significant prostate cancer genes – 15-LOX-2, PLA2G2A, NPY, CRISP3, GDF15 – were identified by MNI as potential genetic mediators for the non-recurrent primary prostate cancer group alone (Supplementary Table 1). Expression of 15-LOX-2 (also known as ALOX15B), arachidonate 15-lipoxygenase type 2, is higher in healthy prostates than in prostate tumors, and appears to suppress cancer development (Bhatia et al, 2003). The expression of PLA2G2A, phospholipase A2, is elevated in neoplastic prostatic tissue and dysregulation of PLA2G2A may play a role in prostatic carcinogenesis (Jiang et al, 2002). NPY, neuropeptide Y precursor, regulates the growth of prostate cancer cells (Magni et al, 2001). CRISP3, cysteine-rich secretory glycoprotein 3, a member of the cysteine-rich secretory protein family, has been identified as a potential prostate biomarker (Kosari et al, 2002), and GDF15, growth differentiation factor 15 precursor, is associated with early prostate carcinogenesis (Cheung et al, 2004). 7 References Bhatia B, Maldonado CJ, Tang S, Chandra D, Klein RD, Chopra D, Shappell SB, Yang P, Newman RA & Tang DG (2003) Subcellular localization and tumor-suppressive functions of 15-lipoxygenase 2 (15-LOX2) and its splice variants. J. Biol. Chem. 278: 25091-25100 Cheung PK, Woolcock B, Adomat H, Sutcliffe M, Bainbridge TC, Jones EC, Webber D, Kinahan T, Sadar M, Gleave ME & Vielkind J (2004) Protein profiling of microdissected prostate tissue links growth differentiation factor 15 to prostate carcinogenesis. Cancer Res. 64: 5929-5933 di Bernardo D, Thompson MJ, Gardner TS, Chobot SE, Eastwood EL, Wojtovich AP, Elliott SJ, Schaus SE & Collins JJ (2005) Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat. Biotechnol. 23: 377-383 Everitt B S & Dunn G (2001) Applied Multivariate Data Analysis. Arnold, London Feldman D, Zhao XY & Krishnan AV (2000)Vitamin D and prostate cancer. Endocrinology 141: 5-9 Jiang J, Neubauer BL, Graff JR, Chedid M, Thomas JE, Roehm NW, Zhang S, Eckert GJ, Koch MO, Eble JN & Cheng L (2002) Expression of group IIA secretory phospholipase A2 is elevated in prostatic intraepithelial neoplasia and adenocarcinoma. Am. J. Pathol. 160: 667-671 John EM, Schwartz GG, Koo J, Van Den Berg D & Ingles SA (2005) Sun exposure, vitamin D receptor gene polymorphisms, and risk of advanced prostate cancer. Cancer Res. 65: 54705479 8 Kosari F, Asmann, YW, Cheville, JC & Vasmatzis G (2002) Cysteine-rich secretory protein3: a potential biomarker for prostate cancer. Cancer Epidemiol. Biomarkers Prev. 11: 14191426 LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V & Gerald WL (2002) Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res. 62: 4499-4506 Luo J, Zha S, Gage WR, Dunn TA, Hicks JL, Bennett CJ, Ewing CM, Platz EA, Ferdinandusse S, Wanders RJ, Trent JM, Isaacs WB & De Marzo AM (2002) Alphamethylacyl-coA racemase: a new molecular marker for prostate cancer. Cancer Res. 62: 2220-2226 Magni P & Motta M (2001) Expression of neuropeptide Y receptors in human prostate cancer cells. Ann. Oncol. 12: 27-29 Rubin MA, Bismar TA, Andren O, Mucci L, Kim R, Shen R, Ghosh D, Wei JT, Chinnaiyan AM, Adami HO, Kantoff PW & Johansson JE (2005) Decreased -methylacyl CoA racemase expression in localized prostate cancer is associated with an increased rate of biochemical recurrence and cancer-specific death. Cancer Epidemiol. Biomarkers Prev. 14: 1424-1432 Rubin MA, Zhou M, Dhanasekaran SM, Varambally S, Barrette TR, Sanda MG, Pienta KJ, Ghosh D & Chinnaiyan AM (2002) Alpha-methylacyl coenzyme A racemase as a tissue biomarker for prostate cancer. JAMA. 287: 1662-1670 Stewart LV & Weigel NL (2004) Vitamin D and prostate cancer. Exp. Biol. Med. 229: 277284 9 Tomita K (2000) Cadherin switching in human prostate cancer progression. Cancer Res. 60: 3650-3654 10
© Copyright 2026 Paperzz