Computational Aspects of High-Throughput Screening planning and analysis Introduction Welcome to the „Living Document‟ for the EMBL Advanced Course - Computational Aspects of High-Throughput Screening planning and analysis. This document is intended to be usable by all participants and presenters on the course. Please add any information you wish to it so it can be shared with everyone else in attendance. Introduction Networking Questions Questions from ChEMBL intro Questions about ZINC and from John Irwin‟s session Questions about Chemical Space from Peter Ertl‟s session Questions on Chemical Representation from George Papadatos Questions on Designing Focussed Libraries from Steffen Renner‟s session Questions on virtual screening from Anna Linusson‟s session Questions on assay validation from David Murray‟s session Questions on Shaping a Screening file from Jeremy Everett‟s session Questions of Screening Workflows from Joe Lewis‟s session Questions on EU-Openscreen from Per-Anders session Resources Networking If you are happy for others on the course to contact you, then please leave details here (email, webpage, LinkedIn etc…) Tom Hancocks - Course Organiser http://www.ebi.ac.uk/about/people/tom-hancocks http://uk.linkedin.com/pub/tom-hancocks/2a/956/bb2 [email protected] Twitter: @tehancocks Chris Kuffer - Participant PhD student / Max-Planck-Institute of Biochemistry [email protected] Aurianne Lescure - Participant Curie Institute HCS platform - Robotic engineer, data analysis [email protected] Silvia Lorente-Cebrián - Participant Post-doc / Karolinska Institute [email protected] Tugrul DORUK - Participant Post-doc / Umea University [email protected] David Andersson - Participant Researcher / Umea University [email protected] Jeremy Everett - Speaker Professor at University of Greenwich http://www2.gre.ac.uk/about/schools/science/about/departments/pces/staff/profes sor-jeremy-everett [email protected] Caroline BARETTE- participant PhD, CMBA‟s HTS facility operational manager, CEA-INSERM-Grenoble University, France [email protected] Anna-Lena Gustavsson - participant Computational Chemist at the Lab. for Chemical Biology Karolinska Institutet, part of the Chemical Biology Consortium Sweden www.cbcs.se [email protected] Arafath Najumudeen - participant Turku Centre for Biotechnology, Finland [email protected] Matthias Kolberg – participant Researcher at Oslo University Hospital, Norway [email protected] Ana Kitanovic - Participant Screening and Automation Scientist at Laboratory Automation Technology (LAT) Screening Facility, German Center for Neurodegenerative Deseases (DZNE), Bonn [email protected] Pamela Gatto HTS Facility, Center for Integrative Biology, University of Trento http://www.unitn.it/en/cibio [email protected] Guochao Wu- Participant research engineer, working on yeast screening, with collection of deletion mutants of yeast Department of Chemistry, Umea University, Sweden [email protected] Bernd Boidol - Participant PhD Student (PLACEBO Lab), CeMM Research Center for Molecular Medicine Vienna [email protected] Martijn fiers - Participant WUR\Plant Research International, Wageningen, The Netherlands [email protected] Erik Vrij Tissue Regeneration - MIRA, University of Twente, the Netherlands [email protected] Evgeny Kulesskiy - Participant High Throughput Biomedicine Unit, FIMM, Helsinki, Finland http://www.fimm.fi/en/technologycentre/htb/ [email protected] Steffen Renner - Speaker - Novartis - [email protected] Anne Hersey - Speaker - EMBL-EBI - [email protected] George Papadatos - Speaker - EMBL-EBI - [email protected] - http://uk.linkedin.com/in/georgepapadatos Questions If you have a question (but are too afraid to ask!) then please type it up here. If you are able to answer someone else‟s question, or have additional things to ask, then please feel free to add them to the document Questions from ChEMBL intro How does ChEMBL create the chemical structure for each compound? o Drawn by curators for each entry! o There are computational methods - but they are not always accurate o Need to map structures with biological value - difficult o Would be nice if journals supplied information of chemical structures when publishing o Occasional find errors in journal formulas - report back to authors So there is some checking of structures, but not foolproof Can you enter a chemical structure and get out all the targets out? o Yes! Search against name, or drawn your own structure and search that. What IDs do you accept for protein searching? o UniProt ID, Short Names o ChEMBL assigns a ChEMBL ID to each UniProt entry Does ChEMBL only return targets that are proteins? o No, there is lots of data on cell-based assays as well o Can search for compounds tested against cell lines o Nucleic acids also valid targets Do you have a plan to cover patents as well in the database? o For example it could be really interesting to cover Chinese patents as well. o There are links to patent databases via UniChem in the compound report card, e.g. https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL111 and scroll to the bottom o ZINC also includes some patents, limited to openly available sources o SureChem - https://surechem.com/ - is one of the best places to search for patents How can you submit multiple searches to ChEMBL o Best to use web services via command line or to create a local installation of ChEMBL o There are example of this in KNIME How to search ChEMBL for “which target has the most compounds in common with IRAK4 at 10uM or better” ? What other targets do IRAK4 compounds hit? (what polypharmacology should I be looking out for?) Are any of the compounds for IRAK4 also approved drugs? Are any of them natural products? Are any of them metabolites? If not, what is the closest metabolite to a known IRAK4 ligand? How to take a query you have done in the ChEMBL interface and turn it into a script? which compounds in ChEMBL are the most promiscuous? For which targets is there at least one compound that binds at 1uM or better? in the drawing tool, can you enter SMILES and then edit the molecule graphically before submitting the search? If you are doing a systematic survey for a list of compounds and their targets, what thresholds of ADMET properties to use that are applicable to all the compounds or should one set individual thresholds for each compound? Questions about ZINC and from John Irwin’s session Things to consider when using ZINC to order compounds and libraries o Remember to not trust your data! There are errors in every database! o Also be careful and check the quality of compounds Good vendors can provide spectra to show the quality (it might cost) o Not all compounds are available as stock from some vendors and they may have to prepare them when ordered. This can take lots of time! o You get what you pay for - cheap libraries might not be great quality! Can ZINC provide a list of suppliers for each compound? o Yes o They are listed alphabetically, so no weighting of which suppliers are better Does SEA store any search information o Yes, just like Google does o For privacy concerns you can buy the software and run it locally SEA - Limited by IP address to 40x a day, free to use in single cases o For HTS, can purchase If you have 30-50 compounds to test at once, can it tell you which pathways you are hitting o No but it will tell you the target you are hitting o Compounds need to be in ChEMBL for SEA to actually return something o Natural compounds will return null hits as they are not well defined in chemical space Questions about Chemical Space from Peter Ertl’s session 2000 viable drug targets, many possible drugs o „Bioavailability area‟ - small area of molecular property space o Drug-likeness and NP-likeness o ChemGPS - from AZ - molecules with “extremeproperties ” used as reference o http://chemgps.bmc.uu.se/batchelor/queue.php?show=submit o Reference from current drugs on the market - these have already been cleared for clinical use. So find something similar? o Natural product-likeness - becoming increasingly important Molecules produced by living organisms - primary metabolites (fatty acids, sugars…) and secondary metabolites (unique to an organism, antibiotics, marine toxins…) Long natural selection process to optimize bio-interaction o „Grey area‟ between synthetic, natural and bioactive molecules - key are for drug discovery What is the root ring of Gleevec in Ertl‟s hierarchy? What about bioisosteres in Ertl’s classification system? o analysis of bioactivity databases (ChEMBL) can provide many ideas for new bioisosteres Isn‟t the decision to use secondary metabolites and not primary metabolites arbitrary? In a real drug discovery project how much is found in the target class, how much is found elsewhere? o This process is used early on in the project to find basic information o Later on incorporate 3D structures and modelling o But classification on bioactivity is important? Do natural products have to be considered with regards to membrane transport? o Some known structural features for transport (like sugars) and needs to be thought about o Isn‟t easy, needs to be considered new for each case Will we ever have an overlap between synthetic chemistry and natural products? o Non-natural, natural products - making simpler compounds Do you characterise cyto-toxic compounds when create potential libraries and exclude them? o Yes, large set of substructures excluded from screens due to these properties Questions on Chemical Representation from George Papadatos Word of warning! Chemical names - there are many names, different people use different names o o o Trivial, IUPAC, synonyms, trade names… Language and spelling can confound the naming Large molecules have complex IUPAC names, easy to make a mistake o CAS numbers, not derived from structures, proprietary and human assigned Only 7900 freely available Molecules as a graph - atoms as nodes o SMILES Simplified Molecular Input Line Entry System Linear format, concise and convenient o MDL mol files 30+ lines of text can become very clumsy to use o InChI International Chemical Identifier InChIKey - hashed representation o SMARTS Extension of SMILES for substructure searching Find or filter out certain structures Does CIR deal with large scale queries? Molecular descriptors - lots of them, this is goods and bad! o Connectivity o Pharmacophores o Shapes Questions on Designing Focussed Libraries from Steffen Renner’s session Consciously decide what you want to screen before beginning o Don‟t screen everything! How many compounds to screen? 200,000 is a focussed screen o Balance between getting hits and testing redundant space o Conduct pilot screens for larger screens o Iterative screens and learn from results o <450 MW; leadlikeness, water soluble, membrane permeable o Hitlists o Stereocentres o Non-exclusive compounds - don‟t want someone to beat you to it! Drug Likeness -the QED value is a parameter provided in the ChEMBL database Combine results from all previous screens o Gain data on #assays and times a compound has been active o Can be done for each different technology o PAINS filter Screening at different concentrations and in different conditions o Screen with low conc of detergents, removes aggregate interactions o Still many potential factors Is there a way to calculate redox screening potential? o Not aware of it o Can be confusing information, but could be useful Large and lipophilic molecules might not be great targets o Reasons why are not always clear o Lots of debate on whether this is true or not! Questions on virtual screening from Anna Linusson’s session Binding affinities are based on free energy of ligand/target system; combined energy vs separate energy of both components o Enthalpy - Imagine it as two lovers o Binding entropy - what keeps the components together? Interactions occur in an aqueous environment Need development and validation of screening before conducting screen o Base on existing knowledge Virtual screening - has existed in the literature since the mid 1990‟s Methodology o Ligand-based Need known molecules o Structure-based Need 3D model of target Why is NMR not a better way to gain knowledge of a protein’s structure? o NMR has low resolution How useful is VS for finding targets you don’t want to bind to? o techniques in presentation not great o Other methods might be useful Could you quantify knowledge to prepare for a screen? o Look in more detail and refine the question you are trying to ask o Iteratively build up more information after you get first results o Be careful though! Questions on assay validation from David Murray’s session A 0.1% hit rate in a 1m strong library gives 1000 active compounds o No assay is 100% accurate o 99% accurate 10 false negatives 9990 false positives! o Technology artifacts makes things difficult Don‟t always believe how good an assay claims to be! HTS isn‟t rubbish though! o Treat everything with caution o Every HTS experiment will be a big investment, make sure you get what you can Controls o Neutral controls Steady state control o Scale-reference controls supra-maximal conc of a compound Assay must be good enough to call activity o Validate, don‟t be pressured into running the HTS Standard QC o Signal to Background - remember error rates! Z‟/Z factor article on Wikipedia is [currently] incorrect! o Z‟ of 1 is best score Z‟ of 0.5 is a marginal assay score Some assays will be consistently above 0.5 Cell-based assays are less accurate; 0.3 is often seen Replicate assays can reduce error a lot Once = 29% Four times = 50% o Don‟t go chasing Z-values and try and improve results to reach a threshold Robust statistics o DON‟T USE MEANS in your statistical analysis - use median instead. o o o Questions on Shaping a Screening file from Jeremy Everett’s session Plate-based diversity subset o Ordering of a subset on a plate in order of physical properties o Rule of 40 All criteria are multiples of 40 200+ compounds on a plate Binned compounds on each plate o BCUTS - cell based description of chemical space Plate-based properties vs individual compound basis o Order plates in each subset to have maximal coverage of chemical space Aim for double coverage Computational problem for calculating this! Many permutations! Take top plates; best plates move to the top of the pile 17 iterations gave optimal order Did you return to missed sample plates and find out why they were missed? o No not done. Very large screening file. Interesting question, but not a question needed to be solved. How many compounds do you need to screen? o Validated structure and knowing it is important o Knowing how much compound in a well is essential to calculate IC50 values Introduce molecular redundancy o relationship between molecular fingerprint to known active and chance of finding biological activity o Belief Theory - work out molecules ended to find one active Defined „sure‟ as 95% certainty of activity Probability of getting an active Have you done the PBDS in practice? o Yes, become a major part of Pfizer‟s work and economic considerations Questions of Screening Workflows from Joe Lewis’s session Questions on EU-Openscreen from Per-Anders session EU-Openscreen is a Chemical Biology Research Infrastructure for Europe o EU-Openscreen - http://www.eu-openscreen.de/ o Facilitate and support basic research o Currently in last month of preparatory phase o Current funding by EU Operational phase - support from host nation research councils User pays for screens o Allow access to equipment, experience and training for scientists across Europe Support at each stage of pre/post-screening process o Development of standards, automation and reproducible practices o Training course in Stockholm - http://www.ucmr.umu.se/researchschool/course-catalogue/1557-assay-development-in-high-throughputscreening-2-ects.html o Open access of tools and data o Covers all of the life sciences - food, ecology, animals, pest control, medicine More information on ESFRI http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri BioMedBridges co-ordinates interactions between ESFRI infrastructures, more information can be found here: http://www.biomedbridges.eu/ Resources If you have any useful links, resources or information to share with the rest of the course then please add it here. 1. http://zinc.docking.org - Free tools for ligand discovery a. JI - interaction of Medicinal Chemical space with ZINC. Can also include analogs of these compounds. 2. https://www.ebi.ac.uk/chembl/ - Open access and free med. chemistry SAR data . http://www.ebi.ac.uk/training/online/course/chembl-quick-tour - ChEMBL „Quick Tour‟ e-learning tutorial from EBI Train Online a. pCHEMBL score -this is -log(IC50,Ki,EC50) and attempts to put a number of roughly comparable concentration dependent endpoints on a standard scale i. This is a subset of possible measured bioactivities 3. https://www.ebi.ac.uk/unichem/ - UniChem resource 4. http://www.ebi.ac.uk/pdbe/ - PDBe (Protein DataBase Europe) is a database of 3D protein structures curated at the EBI. Links between PDBe and ChEMBL. 5. http://advisor.docking.org - Small molecule aggregation historical record database. 6. http://sea.docking.org - Predict the biological targets of small molecules (e.g. from phenotypic hits) 7. Characterization of Chemical Space http://eu.wiley.com/WileyCDA/WileyTitle/productCd-3527318526.html 8. SMARTSviewerhttp://smartsview.zbh.uni-hamburg.de/ 9. http://cactus.nci.nih.gov/chemical/structure Chemical representation converters 10. Someone asked if we run similar courses to this at EBI. We run this one every year but it is closed for applications this year. https://registration.hinxton.wellcome.ac.uk/display_info.asp?id=386 11. http://www.ebi.ac.uk/training/online/ 12. www.knime.org 13. Rajashi Guha‟s slideshare account. Check out some of his resources if you are interested in seeing what he would have covered: http://www.slideshare.net/rguha/presentations
© Copyright 2026 Paperzz