Target_Identification_Hands_On_1

Target Identification – Hands On
Part 1
Aroon Hingorani/Anna Gaulton
Explore the GWAS catalogue
• The GWAS catalogue collates SNP – trait association
data from published genome-wide association studies. It
currently contains information from more than 2000
publications. Information is also provided about the
significance of the association (p-value), location (e.g.,
protein-coding, intergenic, intronic) and the gene/genes
mapped to this location.
• https://www.ebi.ac.uk/gwas/
Search for disease/trait
of interest
Studies matching the search
term
Search for disease/trait
of interest
Associations matching the
search term
Search for disease/trait
of interest
Catalog traits matching the
search term
Download for further
filtering/sorting
Gene assigned by author
Gene assigned by
Alter p-value threshold
GWAS catalog
Refine traits
Type/location of variant
e.g., missense changes protein sequence,
intergenic = between genes
Exercise 1
• Search the GWAS catalogue for a disease of interest:
• E.g., ‘coronary heart disease’ or ‘type 2 diabetes’
• Filter the results to include only those associations with a pvalue less than 10-8
• Identify SNPs that are within genes and likely to affect protein
sequence (e.g., missense or frameshift mutations)
• Hint, it may help to download to excel and sort the data here
• Identify the genes that these SNPs lie within
• You can also search with the rs number in Ensembl if you need
more information about the location of the SNP:
http://www.ensembl.org/index.html
Protein Sequence/Structure Information
• UniProt contains sequence and functional information for proteins
(e.g., Gene Ontology annotation, protein family information,
subcellular location, expression, interactions)
www.uniprot.org
• PDBe contains protein structure information: www.ebi.ac.uk/pdbe
Keyword/accessio
n search
Sequence search
Domain/Family Annotation
• InterPro (protein family database which contains signatures for
different protein families and domains from a number of different
resources) www.ebi.ac.uk/interpro
• Pfam (protein domain database) http://pfam.xfam.org
• A list of Pfam domains believed to bind small molecules:
https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps/evidence/
Exercise 2
• Find out more about the proteins encoded by these
genes
• What are their functions? Is there a known link to the
disease of interest?
• What are their subcellular locations?
• What protein families do they belong to? What domains do
they contain?
• Are these domains known to bind small molecules?
• Do they have structures available in the Protein Data Bank?
Answers
Answers
• KCNJ11 (also called Kir6.2)
Inward rectifier K+ channel, forms ATP-sensitive K channel with ABCC9
subunit (sulfonylurea receptor 2). Known link to several diabetic
conditions. Membrane protein, found at plasma membrane. Contains IRK
(PF01007) domain. Not on PPDMS list of known small mol binding
domains. No crystal structure, but structures for other family members
(e.g., 3sya).
• PPARG (Peroxisome proliferator-activated receptor gamma/NR1C3)
Nuclear hormone receptor, transcription factor and key regulator of
adipocyte differentiation and glucose homeostasis. Known link to
diabetes. Found in both the cytoplasm and nucleus. Contains
Hormone_recep (PF00104), zf-C4 (PF00105) and PPARgamma_N
(PF12577) domains. Hormone_recep is known ligand-binding domain.
Many crystal structures for ligand-binding domain and some for near
complete protein (e.g., 3dzy ).
Answers
• SLC30A8 (Zinc transporter 8)
Transporter (solute carrier). Possible link to diabetes: zinc-efflux
transporter, may be a major component for providing zinc to insulin
maturation and/or storage processes in insulin-secreting pancreatic
beta-cells. Found at plasma membrane and secretory vesicle
membranes. Contains Cation_efflux (PF01545) domain – not in list of
known small mol binding domains at PPDMS. No crystal structure, but
structures for distantly related E coli zinc transporter.
• SLC16A11 (Monocarboxylate transporter 11)
Transporter (solute carrier). Genetic variants linked to diabetes risk.
Located in endoplasmic reticulum membrane. Contains MFS_1
(PF07690) domain – this is a known small mol binding domain. No
crystal structure – only distantly related E coli multidrug transporter.
Answers
• THADA (Thyroid adenoma-associated protein)
Novel/uncharacterised family. No information to link to diabetes apart
from expression in pancreas. Subcellular location not known. Contains
DUF2428 (PF10350) domain – not known small mol binding domain. No
crystal structure and no related structures.