TOXICOLOGICAL SCIENCES, 151(2), 2016, 447–461 doi: 10.1093/toxsci/kfw058 Advance Access Publication Date: March 29, 2016 Research Article Grouping 34 Chemicals Based on Mode of Action Using Connectivity Mapping K. Nadira De Abrew,*,1 Raghunandan M. Kainkaryam,* Yuqing K. Shan,* Gary J. Overmann,* Raja S. Settivari,‡ Xiaohong Wang,* Jun Xu,* Rachel L. Adams,* Jay P. Tiesman,* Edward W. Carney,‡,† Jorge M. Naciff,* and George P. Daston* *Mason Business Center, The Procter & Gamble Company, Cincinnati, Ohio 45040 and ‡Toxicology & Environmental Research and Consulting, The Dow Chemical Company, Midland, Michigan 48674 † Deceased. 1 To whom correspondence should be addressed. Fax: (513) 277-2311. E-mail: [email protected]. ABSTRACT Connectivity mapping is a method used in the pharmaceutical industry to find connections between small molecules, disease states, and genes. The concept can be applied to a predictive toxicology paradigm to find connections between chemicals, adverse events, and genes. In order to assess the applicability of the technique for predictive toxicology purposes, we performed gene array experiments on 34 different chemicals: bisphenol A, genistein, ethinyl-estradiol, tamoxifen, clofibrate, dehydorepiandrosterone, troglitazone, diethylhexyl phthalate, flutamide, trenbolone, phenobarbital, retinoic acid, thyroxine, 1a,25-dihydroxyvitamin D3, clobetasol, farnesol, chenodeoxycholic acid, progesterone, RU486, ketoconazole, valproic acid, desferrioxamine, amoxicillin, 6-aminonicotinamide, metformin, phenformin, methotrexate, vinblastine, ANIT (1-naphthyl isothiocyanate), griseofulvin, nicotine, imidacloprid, vorinostat, 2,3,7,8-tetrachloro-dibenzop-dioxin (TCDD) at the 6-, 24-, and 48-hour time points for 3 different concentrations in the 4 cell lines: MCF7, Ishikawa, HepaRG, and HepG2 GEO (super series accession no.: GSE69851). The 34 chemicals were grouped in to predefined mode of action (MOA)–based chemical classes based on current literature. Connectivity mapping was used to find linkages between each chemical and between chemical classes. Cell line–specific linkages were compared with each other and to test whether the method was platform and user independent, a similar analysis was performed against publicly available data. The study showed that the method can group chemicals based on MOAs and the inter–chemical class comparison alluded to connections between MOAs that were not predefined. Comparison to the publicly available data showed that the method is user and platform independent. The results provide an example of an alternate data analysis process for high-content data, beneficial for predictive toxicology, especially when grouping chemicals for read across purposes. Key words: connectivity mapping; toxicogenomics; 21st century tox. Perturbation of a biological system via an external insult results in a characteristic gene expression profile. This expression profile is unique to the biological system and insult under consideration. Querying a representative signature of this profile against other such profiles using pattern-matching methods provides an opportunity to identify both known and unknown connections between perturbed biological systems. This concept named “connectivity mapping (CMap)” was first described by Lamb et al. (2006) as a means to find connections between small molecules, diseases, and drugs. Following this landmark publication, the CMap concept has repeatedly been shown to be an effective tool to make such connections (Dudley et al., 2011; Hieronymus et al., 2006; Jahchan et al., 2013; Li et al., 2015; Vidovic et al., 2014; Zimmer et al., 2010). It is proposed that the field of toxicology could leverage this idea successfully used in the pharmaceutical industry to make connections between C The Author 2016. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. V For Permissions, please e-mail: [email protected] 447 448 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 small molecules, disease, and genes and reapply it to a predictive toxicology paradigm where connections between chemicals, adverse events, and genes are sought. The 2007 U.S. National Research Council published report: “Toxicity Testing in the 21st Century” (TT21C) called for a fundamental change in how we conduct toxicity testing (NRC, 2007). The report called for a paradigm shift from traditional toxicity testing methods based on high-dose animal studies to one based on in vitro methods typically using human cells in a high-throughput fashion (Stephens et al., 2012). The intent was to shift toxicology testing from one based on apical outcomes in animals to one based on mechanistic understanding in humans (NRC, 2007). Since the publication of TT21C, various terminologies have emerged that build on the original concept of mode of action (MOA) (Boobis et al., 2006, 2008; Dellarco and Wiltse, 1998). These include the TT21C concept of toxicity pathway (NRC, 2007), the OECD-driven concept of adverse outcome pathway (AOP) (Ankley et al., 2010; OECD, 2012; Vinken, 2013) and the ALTEX-driven concept of pathway of toxicity (Hartung, 2010). A common denominator among all of these definitions is the molecular initiation event (MIE), reflecting the desire to identify an event that can be detected in vitro and used to map a response to an AOP or MOA with the ultimate goal of predicting toxicity. In the case of an AOP, an MIE is defined as “the first anchor of an AOP and refers to the interaction of a chemical with a biological system at the molecular level, such as ligand–receptor interactions or binding to proteins and nucleic acids” (Vinken, 2013). It is well known that such interactions lead to gene expression changes (Nuwaysir et al., 1999), quantifying these gene expression changes immediately following the MIE could be diagnostic of the MIE at play. Connectivity mapping provides a well-defined systematic process to attempt this task. When the relationship between the CMap signature and an MIE is known, a gene signature-based MIE could be searched against other gene expression–based MIEs stored in a database. Because MIEs are inherently associated with an MOA/AOP, this provides a means to group chemicals based on MOA/AOP. The concept provides a practical option for a highthroughput, low-cost, non-animal method in predictive toxicology, underlined by mechanistic understanding of human relevant toxicology and in line with TT21C. The objective of the present study was to evaluate the possibility of using CMap in predictive toxicology to identify connections between the biological signatures of 34 chemicals tested on 4 different cell lines (34 4) and in the process provide specifics on how CMap may be used as an alternate method to assess highcontent data. All chemicals in the predefined MOA-based chemical classes with at least 2 chemicals grouped together in at least 1 cell line using this method. Interclass connections were also observed among some chemical classes. By using 4 cell lines, we were able to show that certain MOAs are unique to certain tissue types. Comparison of representative data from our study to the publicly available CMap database (Lamb et al., 2006) (http://www.broadinsti tute.org/cmap/) showed that the current method is user and platform independent. Overall, we believe that CMap provides a practical, actionable, nonanimal method for finding similarities between chemicals and potential MOAs. The concept has broad applicability, especially in supporting grouping and read across/informing MOA of a new chemical (ECHA, 2015; Wu et al., 2010). MATERIALS AND METHODS Chemicals and Reagents Bisphenol A (99.0%; catalog no. 239658), genistein (99.0%; catalog no. G6649), ethinyl-estradiol (catalog no. 46263), tamoxifen (catalog no. T5648), clofibrate (catalog no. C6643), dehydorepiandrosterone (95%; catalog no. 709549), troglitazone (catalog no. T2573), diethylhexyl phthalate (catalog no. 36735), flutamide (catalog no. F9397) trenbolone (catalog no. T3925), phenobarbital (catalog no. P1636), retinoic acid (catalog no. R2625), thyroxine (catalog no. T2376), 1a,25-dihydroxyvitamin D3 (catalog no. D1530), clobetasol (catalog no. C8037), farnesol (catalog no. F203), chenodeoxycholic acid (catalog no. C9377), progesterone (catalog no. P0130), RU486 (catalog no. M8046), ketoconazole (catalog no. K1003), valproic acid (catalog no. P4543), desferrioxamine (catalog no. D9533), amoxicillin (catalog no. A8523), 6-aminonicotinamide (catalog no. A68203), metformin (catalog no. D150959), phenformin (catalog no. P7045), methotrexate (catalog no. A6770), vinblastine (catalog no. V1377), ANIT (1-naphthyl isothiocyanate) (catalog no. N4525), griseofulvin (catalog no. G4753), nicotine (catalog no. N3876), and imidacloprid (catalog no. 37894) were all purchased from Sigma-Aldrich (St Louis, Missouri); vorinostat (catalog no. 10009929) was purchased from Cayman Chemicals (Ann Arbor, Michigan); and 2,3,7,8-tetrachloro-dibenzo-p-dioxin was custom ordered from Accustandard (New Haven, Connecticut) Concentration and Time Point Selection Concentration selection for each chemical treatment was based on either values used in the study by Lamb et al. (2006) (http:// www.broadinstitute.org/cmap/) or other existing literature (Table 1). Although RNA was isolated at 6-, 24-, and 48-hour time points, the 6-hour time point was chosen for gene array experiments in order to obtain a signature most likely representing the direct mechanism (MIE). Other studies have shown the importance of using early response genes in predictive toxicology (Zhang et al., 2014). This time point was also used by Lamb et al. (2006) in their original study. Cell Culture MCF7 cells MCF7 (human breast adenocarcinoma) cells were purchased from American Type Culture Collection (ATCC, Manassas, Virginia) and grown in phenol red free DMEM (Invitrogen, Carlsbad, California) supplemented with 10% fetal bovine serum (FBS) (Invitrogen), 100-U/ml penicillin and 100-lg/ml streptomycin and maintained at 37 C in an atmosphere of 5% CO2. Ishikawa cells Ishikawa cells (human endometrial adenocarcinoma) were grown per method described by Naciff et al. (2010). In brief, cells were routinely maintained in DMEM/F12 medium (Invitrogen) supplemented with 10% fetal calf serum (Hyclone, Logan, Utah), 100-U/ml penicillin-G, 100-mg/ml streptomycin, and 0.25-mg/ml amphotericin B (Invitrogen) and maintained at 37 C in an atmosphere of 5% CO2. HepaRG cells HepaRG (human hepatoma) cells were purchased from Biopredic International (Rennes, France). Differentiated HepaRG cells were grown following suppliers’ protocol (Biopredic) in Williams E medium (Invitrogen), supplemented with 2 mM Glutamax (Invitrogen), 10% ADD670 supplement (Biopredic International), and maintained at 37 C in an atmosphere of 5% CO2. Bisphenol A Genistein Ethinyl-estradiol Trenbolone Dehydroepiandrosterone ANIT (1-naphthyl isothiocyanate) Griseofulvin Nicotine Imidacloprid Metformin Phenformin Clofibrate Diethylhexyl phthalate (DHP) Phenobarbital Troglitazone Farnesol Chenodeoxycholic acid Valproic acid Vorinostat Retinoic acid Thyroxine 2,3,7,8-Tetrachloro-dibenzo-p-dioxin 1a,25-Dihydroxyvitamin D3 Clobetasol Progesterone RU486 Ketoconazole Desferrioxamine Amoxicillin 6-Aminonicotinamide Methotrexate Vinblastine Tamoxifen Flutamide RAR agonist TR agonist AhR agonist Vitamin D agonist Glucocoritcoid receptor agonist Progesterone receptor agonist Progesterone receptor antagonist Steroid synthesis inhibitor Iron chelator Idiosyncratic liver injury Glycolytic inhibitor Folate/1-carbon metabolism inhibitor Microtubule inhibitor Antiestrogen Antiandrogen HDAC inhibitors FXR receptor agonist CAR/PXR agonist PPAR agonist Oxidative phosphorylation/mitochondrial inhibitors Nicotinic acetylcholine receptor agonist Liver cholestasis inducers Androgen Estrogens, environmental estrogens Chemical Class DMSO DMSO DMSO DMSO DMSO DMSO DMSO DMSO DMSO H2O DMSO DMSO DMSO DMSO DMSO DMSO DMSO H2O DMSO DMSO DMSO DMSO DMSO DMSO DMSO DMSO Methanol DMSO NaOH DMSO DMSO DMSO DMSO DMSO Vehicle 1, 10, 100 lM 1, 10, 100 lM 10 nM, 100 nM, 1 lM 100 nM, 1 lM, 10 lM 1, 10, 100 nM 1, 10, 100 lM 100 nM, 1 lM, 10 lM 1, 10, 100 lM 10 lM, 100 lM, 1 mM 100 nM, 1 lM, 10 lM 1, 10, 100 lM 10 lM, 100 lM, 1 mM 1, 10, 100 lM 1, 10, 100 lM 1, 10, 100 lM 1, 10, 100 lM 1, 10, 100 lM 10 lM, 100 lM, 1 mM 1, 10, 100 lM 10 nM, 100 nM, 1 lM 10 nM, 100 nM, 1 lM 1, 10, 100 nM 1, 10, 100 nM 1, 10, 100 lM 1, 10, 100 lM 1, 10, 100 nM 1, 10, 100 lM 1, 10, 100 lM 10 lM, 100 lM, 1 mM 10 lM, 100 lM, 1 mM 1, 10, 100 lM 10 nM, 100 nM, 1 lM 100 nM, 1 lM, 10 lM 1, 10, 100 lM Dose (D3, D2, and D1) Skandrani et al. (2006) http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ Kang and Lee (2005) Olsen et al. (2004) http://www.broadinstitute.org/cmap/ Unpublished data http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ Wang et al. (2007) Tang et al. (2004) Hall et al. (2010) Kido et al. (2003) http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ Chalbos et al. (1991) http://www.broadinstitute.org/cma/ An et al. (1998) Li et al. (2007) Muniyappa et al. (2009) http://www.broadinstitute.org/cmap http://www.broadinstitute.org/cmap/ http://www.broadinstitute.org/cmap/ Vinggaard et al. (1999); http://www.broadinstitute.org/cmap/ Recchia et al. (2006) and Vivacqua et al. (2003) http://www.broadinstitute.org/cmap/ Rao et al. (2011) Maggiolini et al. (1999) Blankvoort et al. (2001) Thome-Kromer et al. (2003) Rathinasamy et al. (2010), http://www.broadinstitute.org/cmap/ References Pregnane X Receptor; RAR, Retinoic Acid Receptor; TR, Thyroid Hormone Receptor. a Range finding studies evaluating gene expression. Abbreviations: AhR, Aryl-hydrocarbon Receptor; CAR, Constitutive Androstane Receptor; DMSO, Dimethyl sulfoxide; FXR, Farnesoid X Receptor; HDAC, Histone deacetylase; PPAR, Peroxisome Proliferator-Activated Receptor; PXR, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Chemical TABLE 1. Chemicals, chemical classes, vehicle and doses used in study DE ABREW ET AL. | 449 450 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 HepG2 cells HepG2 (human hepatocellular carcinoma) cells were purchased from ATCC and grown in Eagle’s Minimum Essential Medium containing phenol red (ATCC) supplemented with 10% FBS (ATCC), 100-U/ml penicillin, and 100-lg/ml streptomycin (ATCC) and maintained at 37 C in an atmosphere of 5% CO2. Gene Expression Microarray Measurements MCF7, Ishikawa, HepG2 cells (5105 cells/ml, 1 ml/well of 12well cell culture plate (Corning, Corning, New York) and HepaRG cells (per manufacturer’s protocol [Biopredic International] in collagen coated 24-well plates [BD, Franklin Lakes, New Jersey] were treated with 3 concentrations of chemical (Table 1) or vehicle (Table 1) for 6, 24, or 48 hours. Total RNA was isolated using buffer RLT (Qiagen, Valencia, CA) and Agencourt RNAdvance Tissue XP beads (Beckman Coulter Inc, Danvers, Massachusetts) according to manufacturer’s protocol (Beckman Coulter Inc), and RNA integrity was validated using a NanoDrop 8000 Spectrophotometer (Wilmington, Delaware). Labeled cRNA was synthesized from 500 ng of total RNA using the Affymetrix (Santa Clara, California) IVT-Express labeling kit according to manufacturer’s instructions. Seven and a half microgram of labeled cRNA was fragmented manually and hybridized to Affymetrix Human Genome U219-96 arrays for 16 hours, washed, stained, and scanned using an Affymetrix GeneTitan (Santa Clara, California). The gene expression studies were performed in triplicate (n ¼ 3) with the cells for each replicate treated and harvested on separate days. Gene Expression Microarray Statistical Analysis The Affymetrix Human Genome U219-96 arrays used in this study has more than 49,000 probesets analyzing over 36,000 transcripts and variants, which represent more than 20,000 well-substantiated human genes. The complete gene expression data have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO) (Super series accession no.: GSE69851) The Affymetrix U219 GeneTitan measurements were preprocessed using the standard Robust Multiarray Averaging (RMA) method (Gautier et al., 2004; Irizarry et al., 2003). Of the 1416 samples (including controls) profiled, 44 samples were excluded for quality-control issues using a standard unsupervised clustering of samples based on their full expression profile (Jesse Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation. R package version 0.10. http://CRAN.R-project.org/package¼Rtsne). CMap Analysis The primary goal of the current CMap analysis was to validate and discover connections between the 34 chemicals under study using pattern-matching of their expression profiles. After RMA preprocessing, the log2 gene expression data were used to calculate fold-change of each perturbation sample (or instance, in CMap parlance) with respect to the average control instances in the corresponding batch. The fold-change matrix of all perturbation instances was used to produce the CMap rank matrix using the standard method described in the original CMap paper (Lamb et al., 2006). This rank matrix represented the database against which all signatures are scored. To generate signatures for each chemical (independently for each concentration or time point), a 2-sample t-test paired for instances tested in the same batch was run using the limma software (Smyth, 2005; Wettenhall et al., 2008). A 5% false-discovery rate (FDR) cut-off was used to generate up (positive fold-change) and down (negative fold-change) signatures. Because the cut-off used was quite stringent, the following allowance was made for chemicals that did not produce useable signatures. For chemicals that did not have as many as 250 probesets significant at this FDR level for either direction, a corresponding standard signature of the top 250 probesets by t statistic was used instead. Each signature was scored against the rank matrix generated earlier (Supplementary Table 2). Heatmaps were created using CMap scores with positive connections in red and negative connections in blue. Chemicals were grouped by chemical class, and the order of chemicals/ chemical class was positioned manually so as to preserve the order of the chemicals/chemical class in both horizontal and vertical directions (i.e., order of chemicals on the x-axis from left to right and the order of chemicals on the y- axis from top to bottom were set to be the same). Within a chemical, the 3 concentrations (D1, D2, and D3) were also positioned in the same order for both the x- and y-axes (for the x-axis, D1–D3 was set from left to right; for the y-axis, D1–D3 was set from top to bottom for each chemical). This resulted in a final plot where the order of the chemical/concentration combination on the x-axis from left to right and the chemical concentration combination on the y-axis from top to bottom were exactly the same. Within the plot, an area within 2 consecutive black lines represented the CMap score for a given chemical. Within each 2 consecutive black lines, the 3 columns (x-axis) or 3 rows (y-axis) represented the 3 concentrations D1, D2, and D3 from left to right or top to bottom. Chemicals are denoted by letters on x-axis and numbers on y-axis for easy reference (Supplementary Table 2; Figs. 1–4). Each black box within each of the heatmaps represents a grouping of 9 CMap scores (9 cells: magnified in Figure 1) for the corresponding concentration for the given chemical (e.g., A1 includes the CMap score for AD1–1D1, AD2–1D1, AD3–1D1, AD1– 1D2, AD2–1D2, AD3–1D2, AD1–1D3, AD2–1D3, and AD3–1D3) (Figs. 1–4). Each of the cells within a black box represents the average CMap score of a chemical-/concentration-specific signature read against each of the other chemical/concentrations (read along the x-axis and compared with each chemical/concentration on the y-axis). The color of the cell lies between red (similar) and blue (dissimilar), the intensity of the color represents how close the value is to þ2 (maximum possible [similar] CMap score) or 2 (minimal possible [dissimilar] CMap score) and is also represented as a color key in each figure (Figs. 1–4). The diagonal more intensely colored red cells represent “selfconnectivity” or how a signature relates to its own average profile. The values of these cells should ideally be close to a value of 2. The reason for the < 2 red color is due to the signature (yaxis) being picked using a t test and the scores of the replicates (x-axis) being averaged. Comparison of CMap Scores From Selected Chemicals to MIT’s Public CMap Database Next, we wanted to further validate the connections between the 34 chemicals used in this study and also wanted to understand the robustness of the method when using publicly available data sets. Signatures of chemicals common to our study and also used in the study of Lamb et al. (2006) were used to query the original CMap database and the resulting scores were analyzed for consistency. DE ABREW ET AL. | 451 FIG. 1. Heatmap of average connectivity mapping (CMap) scores for all concentrations in MCF7 Cells for 34 Chemicals. Signatures were created for each chemical concentration combination of the 34 chemicals using a 2-sample t test and a 5% false discovery rate. Signatures were compared with the CMap rank matrix, and average CMap scores were obtained as explained in the Materials and Methods section. Each CMap value was color coded based on a color key (top left corner: red: positive connection, blue: negative connection) and represented as a cell in the heatmap. Color-coded cells were grouped based on chemical class (chemical class key on bottom right) and ordered by concentration. Finally, chemical classes were graphed against each other so as to result in the same order of chemical classes for both x- and y-axes. Diagonal intensely pigmented line of cells represents self-connectivity. Black boxes represent CMap scores for all 3 concentrations against all 3 concentrations of another chemical (each black box contains 9 cells corresponding to the 9 different chemical/concentration combinations). The A1 block is magnified to show detail of the 9 cells. Only 17 of the 34 chemicals used in this study were available on the original CMap database (http://www.broadinstitute.org/ cmap/). Signatures generated for overlapping chemicals (amoxicillin, chenodeoxycholic acid, clobetasol, clofibrate, flutamide, genistein, griseofulvin, ketoconazole, metformin, methotrexate, phenformin, progesterone, tamoxifen, troglitazone, valproic acid, vinblastine, and vorinostat) were used to query the MIT CMap database (restricted to 3095 chemical instances tested on the MCF7 cell line). Because we used a different Affymetrix platform than the original paper of Lamb et al. (2006) (U219 vs U1333A2), to make the 2 Affymetrix platforms comparable, the probesets from the U219 platform used in the current study were filtered to match only those present on the older U133A2 platform used by the MIT database. Therefore, the resulting up/ down signatures differed slightly from those used in the original analysis described above. The resulting scores and ranking of chemicals in the MIT database were analyzed for consistency by looking for chemicals in the MIT database that matched the query chemical. Barview plots were created for each of the chemicals, where the “barview” was constructed from 3095 horizontal lines representing the chemical instances tested on the MCF7 cell line and each ordered by their connectivity score. Instances corresponding to treatment with the chemical of interest were denoted by black lines. All other instances were depicted based on their connectivity score; green: positive, gray: null and red: negative (Figure 6 and Supplementary Figure 1). RESULTS Chemical Selection for CMap Analysis We selected 34 chemicals to represent a broad range of known toxicological MOAs. The 34 chemicals fell into 24 different chemical 452 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 FIG. 2. Heatmap of average connectivity mapping (CMap) scores for all concentrations in Ishikawa cells for 34 chemicals. Signatures were created for each chemical concentration combination of the 34 chemicals using a 2-sample t test and a 5% false discovery rate. Signatures were compared with the CMap rank matrix, and average CMap scores were obtained as explained in the Materials and Methods section. Each CMap value was color coded based on a color key (top left corner: red: positive connection, blue: negative connection) and represented as a cell in the heatmap. Color-coded cells were grouped based on chemical class (chemical class key on bottom right) and ordered by concentration. Finally, chemical classes were graphed against each other so as to result in the same order of chemical classes for both x- and y-axes. Diagonal intensely pigmented line of cells represents self-connectivity. Black boxes represent CMap scores for all 3 concentrations against all 3 concentrations of another chemical (each black box contains 9 cells corresponding to the 9 different chemical/concentration combinations). classes. A chemical class was defined as a group of chemicals that shared a MOA (as defined based on the literature). Of the 24 chemical classes, 9 contained at least 2 chemicals, whereas 15 chemical classes were represented by a single chemical (Table 1). Identifying Intra–Chemical Class Positive Linkages for 34 Chemicals in 4 Different Cell Lines Of the 34 chemicals used in the study, 19 fell into chemical classes that contained at least 2 chemicals. The chemicals and the chemical classes they belong to are defined based on what is known about the MOAs of the chemicals in each class (Table 1). Ideally, chemicals within the same class should behave in a similar manner (similar biological signature), and the CMap score should be representative of this phenomena. To compare the CMap scores of a given class, all cells corresponding to the overlap of x- and y-axis for a given chemical class were compared (e.g., for estrogen, the 81 cells AD1–1D1, AD2–1D1, AD3–1D1, etc., within the 9 boxes A1, B1, C1, A2, B2, C2, A3, B3, and C3 were compared with each other). In general, chemicals within the same class showed positive linkages (cells within overlap areas were predominantly red): for example, the estrogen/environmental estrogen class of chemicals described by the 81 cells above were predominantly red in Figures 1–4. However, the strength of the positive linkage was different for each of the cells within the same overlap area for all 4 cell types (the color-scale of cells within the overlap area indicated higher similarity scores and showed differences from nonoverlap areas, these differences were unique for each cell type). Furthermore, if the same cell (e.g., AD1–1D1) was compared among the 4 cell lines, the strength of the positive linkage (intensity of red color) was different. Both of these outcomes were expected. The differences in the intensity of each cell in the same overlap area can be attributed to specificity and/or DE ABREW ET AL. | 453 FIG. 3. Heatmap of average connectivity mapping (CMap) scores for all concentrations in HepaRG cells for 34 chemicals. Signatures were created for each chemical concentration combination of the 34 chemicals using a 2-sample t test and a 5% false discovery rate. Signatures were compared with the CMap rank matrix, and average CMap scores were obtained as explained in the Materials and Methods section. Each CMap value was color coded based on a color key (top left corner: red: positive connection, blue: negative connection) and represented as a cell in the heatmap. Color-coded cells were grouped based on chemical class (chemical class key on bottom right) and ordered by concentration. Finally, chemical classes were graphed against each other so as to result in the same order of chemical classes for both x- and yaxes. Diagonal intensely pigmented line of cells represents self-connectivity. Black boxes represent CMap scores for all 3 concentrations against all 3 concentrations of another chemical (each black box contains 9 cells corresponding to the 9 different chemical/concentration combinations). potency of ligands to the receptor/enzyme defined by the chemical class. This fact is further confirmed by the color-scale being correlated with concentration for a given chemical in a specific cell line (e.g., intensity of red color for AD1–1D1, AD2–1D1, and AD3–1D1 are different). The difference in the pattern of intensity of each cell within the overlap area among the different cell lines can be attributed to the “completeness” of the pathway of interest (molecular pathway underlying the MOA). Certain cell types may lack or have significantly fewer copies of certain molecular components, of the pathway of interest, consequently the pathway will show a quantitatively lower response in such cell types when exposed to the same concentration of chemical. For example, it is well known that the Farnesoid X receptor (FXR) is predominantly expressed in the liver and kidney (Forman et al., 1995). In our study, the intensity of the selfconnectivity for the FXR agonists Farnesol and Chendeoxycholic acid (O16, P16, O17, and O18) are much more pronounced in the 2 hepatocyte cell lines HepG2 (Figure 4) and HepaRG (Figure 3) when compared with the other 2 non-hepatocyte cell lines (Figs. 1 and 2). This phenomenon can be further corroborated by observing the changes seen with increasing concentration. For example, if components representing a certain pathway are low in abundance in a certain cell line, no response might be observed at the lower concentration (D3); however, at a higher concentration (D1), the expected response might be observed. At the lower concentration (D3), the number of ligand molecules may not be sufficient to interact with the limited number of receptor/enzyme copies to instigate a statistically significant downstream gene expression change/activate pathway. However, at the higher concentration, the number of available ligand molecules may be sufficient to result in such an outcome (Figs. 1 and 5). 454 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 FIG. 4. Heatmap of average connectivity mapping (CMap) scores for all concentrations in HepG2 cells for 34 chemicals. Signatures were created for each chemical concentration combination of the 34 chemicals using a 2-sample t test and a 5% false discovery rate. Signatures were compared with the CMap rank matrix, and average CMap scores were obtained as explained in the Materials and Methods section. Each CMap value was color coded based on a color key (top left corner: red: positive connection, blue: negative connection) and represented as a cell in the heatmap. Color-coded cells were grouped based on chemical class (chemical class key on bottom right) and ordered by concentration. Finally, chemical classes were graphed against each other so as to result in the same order of chemical classes for both x- and yaxes. Diagonal intensely pigmented line of cells represents self-connectivity. Black boxes represent CMap scores for all 3 concentrations against all 3 concentrations of another chemical (each black box contains 9 cells corresponding to the 9 different chemical/concentration combinations). Identifying Inter–Chemical Class Positive/Negative Linkages for 34 Chemicals in 4 Different Cell Lines As described above, the known intra–chemical class positive linkages of CMap scores were expected. Next we wanted to determine if novel inter-chemical class/positive and negative linkages could be identified via our analysis. In order to make this comparison, for each heatmap, the CMap score (cell-color) within the same chemical class was compared with the scores (cell colors) in the same row for the other 33 chemicals and 3 concentrations. Intensely colored red or blue cells on the same row were an indication of strong similarity or dissimilarity of the chemical identified by the row to the chemical identified by the column (read across rows and compared with each column). This is illustrated by cell type in Figures 1–4, discussed further below. As a first step in this analysis, CMap scores for agonists and antagonists for the same receptor were compared. The 34 chemicals included agonists and antagonists for 3 receptors: estrogen receptor (ER), androgen receptor, and progesterone. In general, the antagonists behaved in an opposing manner to agonists (blue/white cells vs red cell colors), and the negative linkage was most prominent at the highest concentration (D1). For example, the antiestrogen tamoxifen exhibited negative linkages for the ER agonists bisphenol A, genistein, and ethinylestradiol for the highest concentration D1 (Figure 5). However, the phenomena could not be considered a complete response (cells for antagonists were not completely blue when cells for agonist were red and vice versa) and was highly concentration and cell type specific (compare D1, D2, and D3 for the following within each figure [Figs. 1–4]: anti estrogen: tamoxifen vs estrogens: bisphenol A, genistein and ethinyl-estradiol, anti androgen: flutamide vs androgens: trenbolone, dehydroepiandrosterone, antiprogesterone: RU486 vs progesterone. Compare same agonist antagonist pairs between Figs. 1–4). DE ABREW ET AL. | 455 FIG. 5. Heatmap of average connectivity mapping (CMap) scores for D1 concentration in MCF7 cells for 34 chemicals. Signatures were created for the D1 concentration of MC7 cells and a CMap score generated by comparing this signature to a CMap rank matrix of MCF7 D1 as explained in the Materials and Methods section. Each CMap value was color coded based on a color key (top left corner: red: positive connection, gray: null connection, blue: negative connection) and represented as a cell in the heatmap. Color-coded cells were grouped based on chemical class (chemical class key on bottom right). Finally, chemical classes were graphed against each other so as to result in the same order of chemical classes for both x- and y-axes. Diagonal intensely pigmented line of cells represents self-connectivity. Black boxes represent CMap scores for a given chemical class. These observations could be explained by the nature of the antagonists. None of the antagonists picked for the study were complete antagonists of the receptor, as such their responses were not expected to be completely opposite to the agonist response. The inconsistent agonist antagonist behavior among the 4 different cell types was expected and can be attributed to how well the respective pathways were represented in each of the cell types (completeness of pathway). The HepG2 profile (Figure 4) exhibited 5 distinct patches of red (area 1: P16, Q16, P17, and P17; area 2: W16, X16, Y16, Z16, W17, X17, Y17, Z17, AA16, and AA17; area 3: R18, S18, R19, and S19; area 4: P23, Q23, P24, Q24, P25, Q25, P26, Q26, P27, and Q27; area 5: W23, X23, Y23, Z23, AA 23, W24, X24, Y24, Z24, AA 24, W25, X25, Y25, Z25, AA 25, W26, X26, Y26, Z26, AA 26, W27, X27, Y27, Z27, and AA27). When analyzed these patches of red alluded to inter–chemical class positive linkages in the HepG2 cell line. The chemical class FXR receptor agonist (area 1) showed positive linkages with vitamin D agonist, glucocorticoid receptor agonist, progesterone receptor agonist, progesterone receptor antagonist, and steroid synthesis inhibitors (area 2). Similarly, vitamin D agonist, glucocorticoid receptor agonist, progesterone receptor agonist, progesterone receptor antagonist, and steroid synthesis inhibitors (area 4) showed positive linkages with each other (area 5). The same exact patterns (i.e., positive linkages) but at a lesser intensity were observed for the HepaRG (Figure 3) cell line but not for the MCF7 and Ishikawa cell lines (Figs. 1 and 2). These comparisons provided another example of cell type (hepatocyte cell line)–specific inter–chemical class positive linkages. Enriching CMap Technology by Assessing CMap Profiles in Multiple Cell Lines Per TT21C, a toxicity pathway is defined as a normal cellular response pathway that is expected to result in an adverse health effect when sufficiently perturbed (NRC, 2007). In order to ensure that all possible adverse outcomes of a chemical have been accounted for, one would need to assess the effects of a chemical on every possible cellular pathway. Although it is not known how many such cellular pathways actually exist (Lamb 456 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 et al., 2006), an approach that has been implemented by many, including Lamb et al. (2006), is to look at gene signatures in multiple cell types. The intent is to achieve maximum coverage of the genome by the overlap of the differential gene expression profiles of different cell lines. In an attempt to follow a similar process, we picked 4 different cell lines for our study and retained the same experimental and data analysis procedures across all 4 of them. The overall heatmaps for each of the cell lines were unique (Figs. 1–4). The heatmap for MCF7 (Figure 1) exhibited predominantly positive linkages (red color) illustrating that this cell line responded to most of the 34 chemicals (ligands) quite well. Although the MCF7 cell line showed the most sensitivity, this resulted in reduced specificity. However, if the concentration was teased out and the highest concentration (D1) was plotted for the MCF7 cell line (Figure 5), a much clearer picture of the linkages could be observed. This showed the significance of dose in making the right connections when using this method. In contrast, the other 3 cell lines (Figs. 2–4) showed high specificity and lesser sensitivity (visualized as patches of red and blue vs an overall red or blue heatmap). Within each of the heatmaps, distinct rows that showed no linkages (white/light red or white/light blue colored rows) could be identified. For each cell line, exposure to completely different chemicals resulted in a “no-linkage” response and was mostly observed at the highest exposure concentration tested (D1) (areas of white/light red or white/light blue colored rows were different for Figs. 1–4. This difference was more pronounced when the D1 row was compared between Figs. 1–4). The uniqueness of the no-linkage chemicals for each cell line was an indication that the cellular pathways were not acting in a similar manner among the 4 cell types. One possible interpretation for the prominence of the phenomena at the highest concentration (D1) could be attributed to the activation of non-specific pathways (e.g., cytotoxicity pathways) at the higher concentration. In either case, the 4 cell lines behaved dissimilarly, demonstrating the need to do these studies in different cell lines in parallel to capture the full depth of the biology/toxicology that is taking place. Comparison of CMap Scores of Representative Chemicals From MCF7 Line Against Broad Institute CMap Database In their original paper, Lamb et al. (2006) showed that connections between drugs disease and genes can be established independent of gene array platform, cell type, and concentration We were interested in exploring if this concept held true for our study. The 17 of 34 chemicals available on the CMap database were queried against the CMap data and presented as barview plots (Figure 6 and Supplementary Figure 1). Although some variability was observed in terms of the connections, in general, more positive connections (as opposed to null and negative) were observed between our signature and the CMap database for all 17 chemicals. In general, the number of positive connections seemed to increase with increasing concentration (black lines move into green area with increasing concentration). Of the 4 cell lines used in our study, only MCF7 was used by the CMap group hence the comparison only used this cell line. DISCUSSION Since the publication of the seminal report, TT21C by the NRC (2007), the focus of toxicology research has shifted from one where chemicals are defined based on the diseases and health effects they trigger to one where chemicals are defined based on the biology that is driving the underlying toxicity (Kavlock et al., 2012). This is realized by defining MOAs (toxicity pathways) that lead to understanding the chemical biological interactions that are taking place upstream of the apical events that are observed (Kavlock et al., 2012). Through programs such as ToxCast and Tox21 government agencies such as the U.S. EPA and the U.S. FDA have already embraced this idea (Collins et al., 2008; Dix et al., 2007; Kavlock et al., 2012; Tice et al., 2013). These initiatives are defining methods to identify MOAs (or responses that may eventually be used to inform MOAs) of chemicals using high-throughput approaches (HTS). ToxCast and Tox21 incorporate a myriad of in vitro assays that range from cell-free biochemical assays to complex cell culture systems to small model organisms (Kavlock et al., 2012; Knudsen et al., 2013) to measure a range of endpoints from protein–protein binding to high-content cell imaging to multiplexed transcription factor reporter assays, resulting in excess of 650 different assay readouts for a given chemical (Judson et al., 2010; Knudsen et al., 2011) (http://www.epa.gov/ncct/Tox21/). Following the modeling of concentration-response data for each of these assay readouts, AC50 values (concentration causing half maximal response) are calculated from curve fits to Hill equations (Tice et al., 2013). Due to the direct and indirect measurements associated with the different assays used in ToxCast/Tox21, the AC50 values of any given tested compound could have a range spanning 4 Log folds (Wetmore et al., 2012), ultimately affecting interpretation of biological specificity (point of departure [POD]). Methods to distinguish between AC50 values that are directly related to a biological pathway that contributes to an ultimate adverse response versus ones that are not are unavailable to date. In our current study, we attempt to assess the MOA of chemicals using a high-content (gene expression) approach. Although both high-throughput (in vitro assays) and highcontent (gene expression) methods attempt to address the same question of identifying an MOA for a given chemical biological interaction, we believe the CMap approach outlined in this paper can help address a fundamental issue related to the large range associated with the Tox21/Toxcast AC50s data sets: namely, ruling in/out of the appropriate “assay set” for a particular chemical. Connectivity mapping could be used to either tailor the relevant in vitro assays for a particular chemical or be used as a tool to weed out outlier AC50 data sets during data analysis. We predefined MOA-related chemical classes for the 34 chemicals used in this study. For some of the classes, there is a close connection with a known MIE (e.g., receptor binding), whereas others are defined at a higher level and a specific MIE is less clear due to a number of reasons, including a lack of detailed knowledge on the pathways and the nonspecificity of the pathways involved (e.g., idiosyncratic liver injury or oxidative phosphorylation/mitochondrial inhibitors). We believe that chemical-mediated pathway perturbations (toxicological MOAs) can be grouped into 2 main areas based on the selectivity of the biological action triggered by the chemical: (1) Pathway perturbations mediated via nonselective binding to intracellular molecules (mediated via strong interactions (covalent binding) between chemicals (ligands) and macromolecules (DNA, protein, lipids etc.); and (2) Pathway perturbations mediated via selective agonism/antagonism (mediated through interactions of chemicals [ligands] and biological molecules [receptors, enzymes etc.]) (Daston et al., 2015). Although many DE ABREW ET AL. | 457 FIG. 6. Barview of connectivity mapping (CMap) scores for selected chemicals when compared with data from CMap database. Signatures were created for all 3 concentrations of genistein, tamoxifen, metformin, phenformin, valproic acid, vorinostat, methotrexate, and troglitazone and a CMap score generated by comparing this signature to a CMap rank matrix created using the externally available CMap data (http://www.broadinstitute.org/cmap/) as explained in the Materials and Methods section. The barview is constructed from 3095 horizontal lines representing chemical instances tested on the MCF7 cell line each ordered by their connectivity score. Instances corresponding to treatment with the same chemical as the signature are denoted by black lines. The number of instances is listed in parenthesis next to the title. The colors applied to the remaining instances reflect the sign of the score. Green, gray, and red representing positive, null, and negative connections, respectively. robust in vitro methods to assess the toxicity of chemicals within the “nonselective binding” group exists to date (Ames et al., 1973; Gerberick et al., 2004; OECD 2010, 2014) in vitro methods for assessing the toxicity of the “selective agonism/antagonism” group has not been totally realized. Weak interactions cover a large range of specific targets, such that it has not been possible to assemble a set of in vitro assays to quantify these interactions reliably. The 34 chemicals used in this study were chosen with a bias toward the latter group, due to both the imminent need to address this category of chemicals and our belief that CMap provides an ideal tool to attempt this task. For the purposes of this manuscript, cell line–specific CMap scores for all concentrations are depicted as a single heatmap (Figs. 1–4). This was done to provide the best visual representation for a manuscript format. However, this visual can be somewhat confusing: the high density of the data may lead to a lack 458 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 of clarity when trying to pick the most relevant concentration to find connections between chemicals. Although noted as a drawback when formatting for a manuscript, we do not expect this aspect to be a significant setback in day-to-day practice of the method. In practice, all CMap scores would be tabulated, sorted by ascending order, and the most relevant concentration corresponding to the highest positive/negative CMap score picked, avoiding the need to rely on a visual representation such as a heatmap. In traditional animal-based toxicity testing, a chemical usually produces adverse effects in multiple hazard domains (developmental, neurological, reproductive, etc.) Among these adverse events, the effect that occurs at the lowest dose and is biologically and/or statistically significant is considered the critical effect. Based on this critical effect, a POD (e.g. NOAEL, LOAEL, BMD, etc.) is identified or modeled and represents the point at which a low-dose extrapolation to a health reference value begins (Faustman and Omenn, 2001). Various casespecific uncertainty factors are used to calculate a reference dose (RfD) from this POD. This RfD is an estimate of daily exposure to the chemical that is assumed to be without adverse health impact on the human population (Faustman and Omenn, 2001). If the TT21C paradigm were applied to the above quantitative risk assessment (QRA) process, each of the adverse events in the different hazard domains could be viewed as manifestations of perturbations in one or more biological pathways (MOAs). The critical effect could potentially be defined as the MOA that is perturbed at the lowest concentration. The key here is to clearly understand that a perturbation is linked to an AOP given that cellular perturbations may reflect adaptive responses that would not manifest as toxicity. In order to replace the historical animal model-based QRA methods with MOAbased methods, it is imperative that all pathways are covered so as to not miss out on the pathway that might be primarily responsible for the critical effect. Although no one knows how many such MOAs/pathways exist in humans (Lamb et al., 2006), one way to assure coverage is by using multiple cell lines, each potentially containing the relevant machinery for the activation of a finite number of pathways, and the overlap providing nearcomplete coverage. Although theoretical, this was the fundamental thinking behind the selection of multiple cell lines for our study. Two aspects factored into making the decision of what cell lines to use for this study: (1) The relevance of the tissue that the cell line originated as a potential target of toxicity following systemic exposure to a chemical; (2) The presence of target receptors, enzymes, and ion channels for the 34 chemicals in the cell line. It is clear that each cell line has its unique sensitivity and specificity for the 34 chemicals (Figs. 1–4). However, it is also understood that these 4 cell lines do not provide complete coverage for all MOAs of interest. Further analysis looking at known and unknown chemicals in different chemical spaces and testing these against a broad battery of cell lines is needed to provide a more comprehensive answer as to how many cell lines are needed to obtain a realistic characterization of in vivo toxicity. A comparison of the CMap signatures among the 3 concentrations for each of the 34 chemicals shows that concentration has a significant impact on the gene expression profile and thereby the CMap signature of the specific chemicals in each of the cell lines (Figs. 1–4 and Supplementary Table 2). In our study, we observed many instances where 1 concentration exhibited the opposite linkage to the remaining concentrations (e. g., 2 concentrations showed positive linkage, whereas 1 concentration showed negative linkage or vice versa). The negative linkage of tamoxifen, an ER antagonist, with the ER agonists bisphenol A, genistein, and ethinyl-estradiol at the highest dose (D1), which was not as prominent at the mid (D2) and low (D3) concentration (Figure 1 column AGD1 compared with the rest of AG1, AG2, and AG3) is a good example of this phenomenon. A separate heatmap for the high concentration (D1) in MCF7 cells further emphasizes this concentration phenomenon (Figure 5). Mode of actions/AOPs are traditionally defined as linear processes. Considering the complexity of toxicological processes, this simplistic definition has been criticized and has been acknowledged as a deficiency (Vinken, 2013). Concentrationdependent activation of parallel pathways (activation of new pathways) and crossing over of pathways (recruitment of new pathways) can lead to different adverse outcomes (Vinken, 2013). Hence, it is important to characterize the dose–response curve and to understand at which doses nonspecific responses driven by generalized cytotoxicity emerge. For risk assessment, the goal would be to compare the estimated systemic exposure to the in vitro response occurring at the lowest dose that is linked to an adverse biological response, but for hazard characterization, chemicals would first need to be grouped considering all relevant responses below those associated with high-dose nonspecific toxicity. A second step prior to risk assessment would be to understand which low-dose responses are relevant to subsequent adverse outcomes. An in vitro-to-in vivo extrapolation method such as the one described by Wetmore et al. (2012, 2013, 2014; Thomas et al., 2013) could be an option for figuring out the relevant in vitro concentration based on the known in vivo human exposure scenario. For a study such as this where a majority of the MOAs are receptor mediated, the relevant in vitro dose would reflect the constant concentration (Css), other in vitro scenarios may require a dose reflecting the maximum concentration (Cmax), or area under the curve (AUC). After the relevant in vitro concentration is calculated, it is imperative that multiple concentrations flanking this calculated concentration be tested in order to capture the MOA-based phenomena that may be responsible for the critical endpoint. This would allow for conducting the in vitro experiment within a human relevant dose range (i.e., facilitate testing in the correct dose range 10–100-fold range vs 10 000–100 000-fold range, etc.) while still generating enough data points to account for a statistically significant result after accounting for other uncertainties such as in vitro biokinetics (Blaauboer, 2010). The distinct patches of positive linkages (areas of red) in the HepG2 cell line (Figure 4) provided a clear example of the strength of the current method. In this cell line, the chemical class FXR receptor agonist showed positive linkages with vitamin D agonist, glucocorticoid receptor agonist, progesterone receptor agonist, progesterone receptor antagonist, and steroid synthesis inhibitors. An examination of the current literature shows that these receptor classes (and hence their ligands) are distinctly related. The FXR, vitamin D (VDR), glucocorticoid (GR), and progesterone (PR) receptors are all part of the large group of receptors known as the nuclear receptor (NR) family. Within this class, a number of subclasses exist that are named on the basis of a phylogenetic tree (defined based on the evolution of 2 well-conserved domains of NR family members) (Bridgham et al., 2010; Nuclear Receptors Nomenclature Committee, 1999). Among the 4 receptor classes of interest, FXR and VDR fall into the NR sub family 1I, GR and PR fall into the subfamily 3C (Bridgham et al., 2010; Nuclear Receptors Nomenclature Committee, 1999). The other class of compounds: steroid synthesis inhibitors are also related to these receptors. Because steroids are a main class of ligands for the NR family, one could DE ABREW ET AL. assume that the inhibition of steroid synthesis could indirectly affect the functioning of the above-listed receptor families. The other key insight from the result is the lack of positive linkages between FXR, VDR, GR, PR agonist/antagonist, and the ER agonist/antagonist in the HepG2 cell line (Figure 4). The ER falls into the NR sub family 3A and is very closely related to the group 3C members; however, no linkage is observed in the HepG2 result (Figure 4). Liver cells do not express ER in abundance (Grandien et al., 1997), hence pathways related to the ER may not be as active in this cell line resulting in no linkage. The other interesting phenomena observed was the preservation of the positive correlation described above between the 2 liver cell lines—HepG2 (Figure 4) and HepaRG (Figure 3)—albeit at a much lower intensity in the HepaRG cell line. The HepaRG cell line is known to have a metabolic competency parallel to primary hepatocytes (Gerets et al., 2012), these differences in metabolic capacity (Aninat et al., 2006; Gerets et al., 2012; Guillouzo et al., 2007; Jennen et al., 2010) may have a role in the subtle differences seen in the results of these 2 cell lines. Overall, these observations provide further evidence for the need to perform these studies in multiple cell lines, where the machinery for certain pathways are uniquely expressed, paying particular attention to in vitro conditions that better mimic in vivo toxicokinetics. We observed mostly positive correlation between CMap scores generated from our data when compared with a rank matrix of the externally available data for the MCF7 cell line from the Broad Institute when the number of instances (n) for a particular chemical was adequate (Figure 6). Although the highest concentration (D1) provided the most positive correlation for most chemicals (Figure 6 and Supplementary Figure 1), there were a couple of instances where the medium concentration (D2) resulted in better positive correlation than D1, for example, tamoxifen, clobetasol, progesterone, ketoconazole. Although difficult to provide a direct explanation due the low number of instances of overlap for each chemical (“n” for each chemical in Figure 6), the key point here is the need to use the relevant in vitro concentration that can be extrapolated to the in vivo situation. We understand that the overall survey and statements made regarding the data set presented in this manuscript is general in nature. The density and richness of the data set for 34 chemicals using 3 concentrations in 4 different cell lines provides an opportunity to perform in-depth CMap analysis for each chemical used in this study. This was not attempted and was not the intent of the current publication, rather the intent was to evaluate the possibility of using CMap in predictive toxicology to identify connections between the biological signatures of 34 chemicals and provide specifics on how CMap may be used as an alternate method to assess high-content data. The complete gene expression data for this study have been deposited in GEO (super series accession no.: GSE69851), and all CMap scores for all chemicals and concentrations are provided as Supplementary Table 2. Readers are encouraged to perform an in-depth analysis on specific chemicals to understand the nuances of using CMap as a tool to understand MOA-based clustering of chemicals. We believe that this level of detail could be a second tier analysis that can be performed using the same data set following the “screening” level analysis described in this manuscript. Furthermore, we understand that the 34 chemicals used in this study have well-defined MOAs, and that the method was successful in MOA-based classification of these known chemicals. We recognize that the method may need augmenting when using chemicals with unknown MOAs. In conclusion, we show that CMap can be used as a robust filtering | 459 tool to assess high-content data. The study provides evidence that CMap can be applied for predictive toxicological purposes. Current limitations and future needs to improve the tool are also highlighted. These include the need to expand the number of cell lines used for a given chemical and the need to demonstrate the validity of the method via case studies using chemicals from different chemical spaces. In addition, comparison of data from selected chemicals from our study to the publicly available Broad Institute data base (http://www.broadinstitute. org/cmap/) showed good correlation (Figure 6, P values) when the number of instances (n) for a particular chemical was adequate, confirming that the method is user and platform independent. Development of a large shared database such as the one maintained by the Broad Institute (http://www.broadinsti tute.org/cmap) for toxicologically relevant chemicals will provide further impetus for broad range adoption of methods such as this in the future. With the increased need to comprehend big data in toxicology, this type of approach may provide further clarity to HTS efforts such as ToxCast and Tox21. SUPPLEMENTARY DATA Supplementary data are available online at http://toxsci. oxfordjournals.org/. ACKNOWLEDGMENTS The authors wish to thank Karen Blackburn and Catherine Mahony for their review of the manuscript. REFERENCES Ames, B. N., Lee, F. D., and Durston, W. E. (1973). An improved bacterial test system for the detection and classification of mutagens and carcinogens. Proc. Natl. Acad. Sci. U.S.A. 70, 782–786. An, W. G., Kanekal, M., Simon, M. C., Maltepe, E., Blagosklonny, M. V., and Neckers, L. M. (1998). Stabilization of wild-type p53 by hypoxia-inducible factor 1alpha. Nature 392, 405–408. Aninat, C., Piton, A., Glaise, D., Le Charpentier, T., Langouet, S., Morel, F., Guguen-Guillouzo, C., and Guillouzo, A. (2006). Expression of cytochromes P450, conjugating enzymes and nuclear receptors in human hepatoma HepaRG cells. Drug Metab. Dispos. 34, 75–83. Ankley, G. T., Bennett, R. S., Erickson, R. J., Hoff, D. J., Hornung, M. W., Johnson, R. D., Mount, D. R., Nichols, J. W., Russom, C. L., Schmieder, P. K., et al. (2010). Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment. Envir. Toxicol. Chem. 29, 730–741. Blaauboer, B. J. (2010). Biokinetic modeling and in vitro-in vivo extrapolations. J. Toxicol. Environ. Health B Crit. Rev. 13, 242–252. Blankvoort, B. M., de Groene, E. M., van Meeteren-Kreikamp, A. P., Witkamp, R. F., Rodenburg, R. J., and Aarts, J. M. (2001). Development of an androgen reporter gene assay (AR-LUX) utilizing a human cell line with an endogenously regulated androgen receptor. Anal. Biochem. 298, 93–102. Boobis, A. R., Cohen, S. M., Dellarco, V., McGregor, D., Meek, M. E., Vickers, C., Willcocks, D., and Farland, W. (2006). IPCS framework for analyzing the relevance of a cancer mode of action for humans. Crit. Rev. Toxicol. 36, 781–792. Boobis, A. R., Doe, J. E., Heinrich-Hirsch, B., Meek, M. E., Munn, S., Ruchirawat, M., Schlatter, J., Seed, J., and Vickers, C. (2008). 460 | TOXICOLOGICAL SCIENCES, 2016, Vol. 151, No. 2 IPCS framework for analyzing the relevance of a noncancer mode of action for humans. Crit. Rev. Toxicol. 38, 87–96. Bridgham, J. T., Eick, G. N., Larroux, C., Deshpande, K., Harms, M. J., Gauthier, M. E., Ortlund, E. A., Degnan, B. M., and Thornton, J. W. (2010). Protein evolution by molecular tinkering: Diversification of the nuclear receptor superfamily from a ligand-dependent ancestor. PLoS Biol. 8, Chalbos, D., Galtier, F., Emiliani, S., and Rochefort, H. (1991). The anti-progestin RU486 stabilizes the progestin-induced fatty acid synthetase mRNA but does not stimulate its transcription. J. Biol. Chem. 266, 8220–8224. Collins, F. S., Gray, G. M., and Bucher, J. R. (2008). Toxicology. Transforming environmental health protection. Science 319, 906–907. Daston, G., Knight, D. J., Schwarz, M., Gocht, T., Thomas, R. S., Mahony, C., and Whelan, M. (2015). SEURAT: Safety Evaluation Ultimately Replacing Animal Testing–recommendations for future research in the field of predictive toxicology. Archiv. Toxicol. 89, 15–23. Dellarco, V. L., and Wiltse, J. A. (1998). US Environmental Protection Agency’s revised guidelines for Carcinogen Risk Assessment: Incorporating mode of action data. Mut. Res. 405, 273–277. Dix, D. J., Houck, K. A., Martin, M. T., Richard, A. M., Setzer, R. W., and Kavlock, R. J. (2007). The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95, 5–12. Dudley, J. T., Sirota, M., Shenoy, M., Pai, R. K., Roedder, S., Chiang, A. P., Morgan, A. A., Sarwal, M. M., Pasricha, P. J., and Butte, A. J. (2011). Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 3, 96ra76. ECHA. (2015). Grouping of substances and read-across. Available at: http://echa.europa.eu/support/grouping-of-substancesand-read-across. Accessed December 10, 2015. Faustman, E. M., and Omenn, G. S. (2001). Risk assessment. In Casarett and Doull’s Toxicology: The Basic Science of Poisons (L. J. Casarett, C. D. Klaassen, and J. Doull, Eds.), 6 ed., pp. 91–92. McGraw-Hill Medical Pub. Division, New York, NY. Forman, B. M., Goode, E., Chen, J., Oro, A. E., Bradley, D. J., Perlmann, T., Noonan, D. J., Burka, L. T., McMorris, T., Lamph, W. W., et al. (1995). Identification of a nuclear receptor that is activated by farnesol metabolites. Cell 81, 687–693. Gautier, L., Cope, L., Bolstad, B. M., and Irizarry, R. A. (2004). affy– analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315. Gerberick, G. F., Vassallo, J. D., Bailey, R. E., Chaney, J. G., Morrall, S. W., and Lepoittevin, J. P. (2004). Development of a peptide reactivity assay for screening contact allergens. Toxicol. Sci. 81, 332–343. Gerets, H. H., Tilmant, K., Gerin, B., Chanteux, H., Depelchin, B. O., Dhalluin, S., and Atienzar, F. A. (2012). Characterization of primary human hepatocytes, HepG2 cells, and HepaRG cells at the mRNA level and CYP activity in response to inducers and their predictivity for the detection of human hepatotoxins. Cell Biol. Toxicol. 28, 69–87. Grandien, K., Berkenstam, A., and Gustafsson, J. A. (1997). The estrogen receptor gene: Promoter organization and expression. Int. J. Biochem. Cell Biol. 29, 1343–1369. Guillouzo, A., Corlu, A., Aninat, C., Glaise, D., Morel, F., and Guguen-Guillouzo, C. (2007). The human hepatoma HepaRG cells: A highly differentiated model for studies of liver metabolism and toxicity of xenobiotics. Chem. Biol. Interact. 168, 66–73. Hall, J. M., Barhoover, M. A., Kazmin, D., McDonnell, D. P., Greenlee, W. F., and Thomas, R. S. (2010). Activation of the aryl-hydrocarbon receptor inhibits invasive and metastatic features of human breast cancer cells and promotes breast cancer cell differentiation. Mol. Endocrinol. 24, 359–369. Hartung, T. (2010). Evidence-based toxicology - the toolbox of validation for the 21st century? Altex 27, 253–263. Hieronymus, H., Lamb, J., Ross, K. N., Peng, X. P., Clement, C., Rodina, A., Nieto, M., Du, J., Stegmaier, K., Raj, S. M., et al. (2006). Gene expression signature-based chemical genomic prediction identifies a novel class of HSP90 pathway modulators. Cancer Cell 10, 321–330. Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucl. Acids Res. 31, e15. Jahchan, N. S., Dudley, J. T., Mazur, P. K., Flores, N., Yang, D., Palmerton, A., Zmoos, A. F., Vaka, D., Tran, K. Q., Zhou, M., et al. (2013). A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov. 3, 1364–1377. Jennen, D. G., Magkoufopoulou, C., Ketelslegers, H. B., van Herwijnen, M. H., Kleinjans, J. C., and van Delft, J. H. (2010). Comparison of HepG2 and HepaRG by whole-genome gene expression analysis for the purpose of chemical hazard identification. Toxicol. Sci. 115, 66–79. Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., Reif, D. M., Rotroff, D. M., Shah, I., Richard, A. M., and., et al. (2010). In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environ. Health Perspect. 118, 485–492. Kang, S. C. and Lee, B. M. (2005). DNA methylation of estrogen receptor alpha gene by phthalates. J. Toxicol. Environ. Health A 68, 1995–2003. Kavlock, R., Chandler, K., Houck, K., Hunter, S., Judson, R., Kleinstreuer, N., Knudsen, T., Martin, M., Padilla, S., Reif, D., et al. (2012). Update on EPA’s ToxCast program: Providing high throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 25, 1287–1302. Kido, S., Inoue, D., Hiura, K., Javier, W., Ito, Y., and Matsumoto, T. (2003). Expression of RANK is dependent upon differentiation into the macrophage/osteoclast lineage: Induction by 1alpha,25-dihydroxyvitamin D3 and TPA in a human myelomonocytic cell line, HL60. Bone 32, 621–629. Knudsen, T., Martin, M., Chandler, K., Kleinstreuer, N., Judson, R., and Sipes, N. (2013). Predictive models and computational toxicology. Methods Mol. Biol. 947, 343–374. Knudsen, T. B., Houck, K. A., Sipes, N. S., Singh, A. V., Judson, R. S., Martin, M. T., Weissman, A., Kleinstreuer, N. C., Mortensen, H. M., Reif, D. M., et al. (2011). Activity profiles of 309 ToxCast chemicals evaluated across 292 biochemical targets. Toxicology 282, 1–15. Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., Lerner, J., Brunet, J. P., Subramanian, A., Ross, K. N., et al. (2006). The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935. Li, J., Zheng, S., Chen, B., Butte, A. J., Swamidass, S. J., and Lu, Z. (2015). A survey of current trends in computational drug repositioning. Brief. Bioinformatics. Li, P. Y., Chang, Y. C., Tzang, B. S., Chen, C. C., and Liu, Y. C. (2007). Antibiotic amoxicillin induces DNA lesions in mammalian cells possibly via the reactive oxygen species. Mut. Res. 629, 133–139. DE ABREW ET AL. Maggiolini, M., Donze, O., Jeannin, E., Ando, S., and Picard, D. (1999). Adrenal androgens stimulate the proliferation of breast cancer cells as direct activators of estrogen receptor alpha. Cancer Res. 59, 4864–4869. Maggiolini, M. (2004). Xenoestrogens and the induction of proliferative effects in breast cancer cells via direct activation of oestrogen receptor alpha. Food Addit. Contam. 21, 134–144. Muniyappa, H., Song, S., Mathews, C. K., and Das, K. C. (2009). Reactive oxygen species-independent oxidation of thioredoxin in hypoxia: Inactivation of ribonucleotide reductase and redox-mediated checkpoint control. J. Biol. Chem. 284, 17069–17081. Naciff, J. M., Khambatta, Z. S., Reichling, T. D., Carr, G. J., Tiesman, J. P., Singleton, D. W., Khan, S. A., and Daston, G. P. (2010). The genomic response of Ishikawa cells to bisphenol A exposure is dose- and time-dependent. Toxicology 270, 137–149. NRC. (2007). Toxicity Testing in the 21st Century: A Vision and a Strategy. National Academies Press, Washington, DC. Nuclear Receptors Nomenclature Committee. (1999). A unified nomenclature system for the nuclear receptor superfamily. Cell 97, 161–163. Nuwaysir, E. F., Bittner, M., Trent, J., Barrett, J. C., and Afshari, C. A. (1999). Microarrays and toxicology: The advent of toxicogenomics. Mol. Carcinog. 24, 153–159. OECD. (2010). Test No. 487: In Vitro Mammalian Cell Micronucleus Test. OECD Publishing, Paris. OECD. (2012). Draft Template, and Guidance on Developing and Assessing the Completeness of Adverse Outcome Pathways (AOPs). OECD Publishing, Paris. OECD (2014). Test No. 473: In Vitro Mammalian Chromosomal Aberration Test. OECD Publishing, Paris. Olsen, C. M., Meussen-Elholm, E. T., Roste, L. S., and Tauboll, E. (2004). Antiepileptic drugs inhibit cell growth in the human breast cancer cell line MCF7. Mol. Cell. Endocrinol. 213, 173–179. Rao, X., Di Leva, G., Li, M., Fang, F., Devlin, C., Hartman-Frey, C., Burow, M. E., Ivan, M., Croce, C. M., and Nephew, K. P. (2011). MicroRNA-221/222 confers breast cancer fulvestrant resistance by regulating multiple signaling pathways. Oncogene 30, 1082–1097. Rathinasamy, K., Jindal, B., Asthana, J., Singh, P., Balaji, P. V., and Panda, D. (2010). Griseofulvin stabilizes microtubule dynamics, activates p53 and inhibits the proliferation of MCF-7 cells synergistically with vinblastine. BMC Cancer 10, 213. Recchia, A. G., Vivacqua, A., Gabriele, S., Carpino, A., Fasanella, G., Rago, V., Bonofiglio, D., Skandrani, D., Gaubin, Y., Beau, B., et al. (2006). Effect of selected insecticides on growth rate and stress protein expression in cultured human A549 and SHSY5Y cells. Toxicol. In Vitro 20, 1378–1386. Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., and Smyth, G. K. (2008). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl. Acids Res. 43, e47 Smyth, G. (2005). Limma: Linear models for microarray data. In Gentleman, Robert, Carey, Vincent J., Huber, Wolfgang, Irizarry, Rafael A., and Dudoit, Sandrine. Bioinformatics and Computational Biology Solutions Using R and Bioconductor, pp. 397–420. Springer, New York, NY. Stephens, M. L., Barrow, C., Andersen, M. E., Boekelheide, K., Carmichael, P. L., Holsapple, M. P., and Lafranconi, M. (2012). Accelerating the development of 21st-century toxicology: Outcome of a Human Toxicology Project Consortium workshop. Toxicol. Sci. 125, 327–334. Tang, H. Y., Lin, H. Y., Zhang, S., Davis, F. B., and Davis, P. J. (2004). Thyroid hormone causes mitogen-activated protein | 461 kinase-dependent phosphorylation of the nuclear estrogen receptor. Endocrinology 145, 3265–3272. Thomas, R. S., Philbert, M. A., Auerbach, S. S., Wetmore, B. A., Devito, M. J., Cote, I., Rowlands, J. C., Whelan, M. P., Hays, S. M., Andersen, M. E., et al. (2013). Incorporating new technologies into toxicity testing and risk assessment: Moving from 21st century vision to a data-driven framework. Toxicol. Sci. 136, 4–18. Thome-Kromer, B., Bonk, I., Klatt, M., Nebrich, G., Taufmann, M., Bryant, S., Wacker, U., and Kopke, A. (2003). Toward the identification of liver toxicity markers: A proteome study in human cell culture and rats. Proteomics 3, 1835–1862. Tice, R. R., Austin, C. P., Kavlock, R. J., and Bucher, J. R. (2013). Improving the human hazard characterization of chemicals: A Tox21 update. Environ. Health Perspect. 121, 756–765. Vidovic, D., Koleti, A., and Schurer, S. C. (2014). Large-scale integration of small molecule-induced genome-wide transcriptional responses, Kinome-wide binding affinities and cellgrowth inhibition profiles reveal global trends characterizing systems-level drug action. Front. Genet. 5, 342. Vinggaard, A. M., Joergensen, E. C., and Larsen, J. C. (1999). Rapid and sensitive reporter gene assays for detection of antiandrogenic and estrogenic effects of environmental chemicals. Toxicol. Appl. Pharmacol. 155, 150–160. Vinken, M. (2013). The adverse outcome pathway concept: A pragmatic tool in toxicology. Toxicology 312, 158–165. Vivacqua, A., Recchia, A. G., Fasanella, G., Gabriele, S., Carpino, A., Rago, V., Di Gioia, M. L., Leggio, A., Bonofiglio, D., Liguori, A., et al. (2003). The food contaminants bisphenol A and 4nonylphenol act as agonists for estrogen receptor alpha in MCF7 breast cancer cells. Endocrine 22, 275–284. Wang, X. J., Hayes, J. D., Henderson, C. J., and Wolf, C. R. (2007). Identification of retinoic acid as an inhibitor of transcription factor Nrf2 through activation of retinoic acid receptor alpha. Proc. Natl. Acad. Sci. U.S.A. 104, 19589–19594. Wetmore, B. A., Allen, B., Clewell, H. J., III, Parker, T., Wambaugh, J. F., Almond, L. M., Sochaski, M. A., and Thomas, R. S. (2014). Incorporating population variability and susceptible subpopulations into dosimetry for high-throughput toxicity testing. Toxicol. Sci. 142, 210–224. Wetmore, B. A., Wambaugh, J. F., Ferguson, S. S., Li, L., Clewell, H. J., III, Judson, R. S., Freeman, K., Bao, W., Sochaski, M. A., et al. (2013). Relative impact of incorporating pharmacokinetics on predicting in vivo hazard and mode of action from highthroughput in vitro toxicity assays. Toxicol. Sci. 132, 327–346. Toxicology Wetmore, B. A., Wambaugh, J. F., Ferguson, S. S., Sochaski, M. A., Rotroff, D. M., Freeman, K., Clewell, H. J., III, Dix, D. J., Andersen, M. E., et al. (2012). Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol. Sci. 125, 157–174. Wu, S., Blackburn, K., Amburgey, J., Jaworska, J., and Federle, T. (2010). A framework for using structural, reactivity, metabolic and physicochemical similarity to evaluate the suitability of analogs for SAR-based toxicological assessments. Regul. Toxicol. Pharmacol. 56, 67–81. Zhang, J. D., Berntenis, N., Roth, A., and Ebeling, M. (2014). Data mining reveals a network of early-response genes as a consensus signature of drug-induced in vitro and in vivo toxicity. Pharmacogenomics J. 14, 208–216. Zimmer, M., Lamb, J., Ebert, B. L., Lynch, M., Neil, C., Schmidt, E., Golub, T. R., and Iliopoulos, O. (2010). The connectivity map links iron regulatory protein-1-mediated inhibition of hypoxia-inducible factor-2a translation to the anti-inflammatory 15-deoxydelta12,14-prostaglandin J2. Cancer Res. 70, 3071–3079.
© Copyright 2026 Paperzz