Target Identification – Hands On Part 1 Aroon Hingorani/Anna Gaulton Explore the GWAS catalogue • The GWAS catalogue collates SNP – trait association data from published genome-wide association studies. It currently contains information from more than 2000 publications. Information is also provided about the significance of the association (p-value), location (e.g., protein-coding, intergenic, intronic) and the gene/genes mapped to this location. • https://www.ebi.ac.uk/gwas/ Search for disease/trait of interest Studies matching the search term Search for disease/trait of interest Associations matching the search term Search for disease/trait of interest Catalog traits matching the search term Download for further filtering/sorting Gene assigned by author Gene assigned by Alter p-value threshold GWAS catalog Refine traits Type/location of variant e.g., missense changes protein sequence, intergenic = between genes Exercise 1 • Search the GWAS catalogue for a disease of interest: • E.g., ‘coronary heart disease’ or ‘type 2 diabetes’ • Filter the results to include only those associations with a pvalue less than 10-8 • Identify SNPs that are within genes and likely to affect protein sequence (e.g., missense or frameshift mutations) • Hint, it may help to download to excel and sort the data here • Identify the genes that these SNPs lie within • You can also search with the rs number in Ensembl if you need more information about the location of the SNP: http://www.ensembl.org/index.html Protein Sequence/Structure Information • UniProt contains sequence and functional information for proteins (e.g., Gene Ontology annotation, protein family information, subcellular location, expression, interactions) www.uniprot.org • PDBe contains protein structure information: www.ebi.ac.uk/pdbe Keyword/accessio n search Sequence search Domain/Family Annotation • InterPro (protein family database which contains signatures for different protein families and domains from a number of different resources) www.ebi.ac.uk/interpro • Pfam (protein domain database) http://pfam.xfam.org • A list of Pfam domains believed to bind small molecules: https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps/evidence/ Exercise 2 • Find out more about the proteins encoded by these genes • What are their functions? Is there a known link to the disease of interest? • What are their subcellular locations? • What protein families do they belong to? What domains do they contain? • Are these domains known to bind small molecules? • Do they have structures available in the Protein Data Bank? Answers Answers • KCNJ11 (also called Kir6.2) Inward rectifier K+ channel, forms ATP-sensitive K channel with ABCC9 subunit (sulfonylurea receptor 2). Known link to several diabetic conditions. Membrane protein, found at plasma membrane. Contains IRK (PF01007) domain. Not on PPDMS list of known small mol binding domains. No crystal structure, but structures for other family members (e.g., 3sya). • PPARG (Peroxisome proliferator-activated receptor gamma/NR1C3) Nuclear hormone receptor, transcription factor and key regulator of adipocyte differentiation and glucose homeostasis. Known link to diabetes. Found in both the cytoplasm and nucleus. Contains Hormone_recep (PF00104), zf-C4 (PF00105) and PPARgamma_N (PF12577) domains. Hormone_recep is known ligand-binding domain. Many crystal structures for ligand-binding domain and some for near complete protein (e.g., 3dzy ). Answers • SLC30A8 (Zinc transporter 8) Transporter (solute carrier). Possible link to diabetes: zinc-efflux transporter, may be a major component for providing zinc to insulin maturation and/or storage processes in insulin-secreting pancreatic beta-cells. Found at plasma membrane and secretory vesicle membranes. Contains Cation_efflux (PF01545) domain – not in list of known small mol binding domains at PPDMS. No crystal structure, but structures for distantly related E coli zinc transporter. • SLC16A11 (Monocarboxylate transporter 11) Transporter (solute carrier). Genetic variants linked to diabetes risk. Located in endoplasmic reticulum membrane. Contains MFS_1 (PF07690) domain – this is a known small mol binding domain. No crystal structure – only distantly related E coli multidrug transporter. Answers • THADA (Thyroid adenoma-associated protein) Novel/uncharacterised family. No information to link to diabetes apart from expression in pancreas. Subcellular location not known. Contains DUF2428 (PF10350) domain – not known small mol binding domain. No crystal structure and no related structures.
© Copyright 2025 Paperzz