Lecture 6.3: From DNA to Protein Dr. Joanne Fox Day 6: Saturday February 21st, 2004 13:45 – 15:15pm Lecture 6.3 1 From DNA to Protein Lecture 6.3 2 Objectives • Review protein sequence features and databases • Review the structural diversity of amino acids and protein sequences • Highlight several physiochemical and structural features which can be calculated from protein sequences • Show how proteomics utilizes methods and techniques for measuring, comparing and assessing protein features Lecture 6.3 3 Outline: • Protein sequence features • Databases of protein sequences • Basics of protein structure – 1o structure, prediction of Mw and pI – 2o structure, prediction methods – 3o structure, methods for predicting folds • Proteomics – Current methods – Cutting edge technology Lecture 6.3 4 Amino Acids amino group alpha carbon O H3N+ O H • The general formula for an amino acid • R is commonly one of 20 different side chains • At pH 7 both the amino and carboxyl groups are ionized R carboxyl group Lecture 6.3 side chain group 5 Peptide Bonds • Amino acids are joined together by an amide linkage called a peptide bond. • The two bonds on either side of the rigid planar peptide unit exhibit a high degree rotation peptide bonds O H H3N+ H R1 N H R2 H N O H O H R3 N H R4 O O rotation occurs here Lecture 6.3 6 Families of Amino Acids • The common amino acids are grouped according to whether their side chains are: – – – – acidic D, E basic K, R, H uncharged polar N, Q, S, T, Y nonpolar G, A, V, L, I, P, F, M, W, C • Hydrophilic amino acids (uncharged polar) are usually on the outside of a protein whereas nonpolar residues cluster on the inside of protein • Basic or acidic amino acids are very polar and are generally found on the outside of protein molecules Lecture 6.3 7 Protein Sequence Features • Proteins exhibit far more sequence and chemical complexity than DNA or RNA • Properties and structure are defined by the sequence and side chains of their constituent amino acids • The “engines” of life • >95% of all drugs target proteins • Favorite topic of post-genomic era Lecture 6.3 8 Protein Sequence Databases • Where does protein sequence information reside? – Entrez Cross Database Search • http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi – Swissprot & TrEMBL • http://ca.expasy.org/sprot/ – PIR • http://pir.georgetown.edu/ • As of December 2003, all of this information is integrated into unified protein database called Uniprot. – Uniprot • http://www.pir.uniprot.org/ Lecture 6.3 9 Entrez Cross Database Search • Protein: sequence database gives access to translated protein sequences from Genbank/EMBL/DDBJ • Complete set of deduced protein sequences • Redundancy problem Lecture 6.3 10 Swissprot & TrEMBL • Swissprot is an expert curated database – Function, domain structure, post-translational modifications, variants, reactions, similarities • TrEMBL (translated EMBL) – Computer annotated supplement to Swissprot Lecture 6.3 11 PIR – Protein Information Resource • Annotated database which includes protein family classification information Lecture 6.3 12 The Uniprot Knowledgebase • Contains all of the information in Swiss-Prot, TrEMBL, and PIR. This new unified database was launched in December 2003. Lecture 6.3 13 Basics of Protein Structure • Primary • Secondary • Tertiary Lecture 6.3 primary structure ACDEFGHIKLMNPQRSTVWY 14 Molecular Weight • Quick formula = 110 X number of residues • Accurate determination of mass by mass spectrometry • Tools exist for accurately calculating mass of peptides based on amino acid composition Lecture 6.3 15 Molecular Weight & Proteomics 2-D Gel Lecture 6.3 QTOF Mass Spectrometry 16 Isoelectric Point • The pH at which a protein has a net charge=0 pKa Values for Ionizable Amno Acids Residue C D E pKa 10.28 3.65 4.25 Residue H K R pKa 6 10.53 12.43 Basics of Protein Structure • Primary • Secondary • Tertiary Lecture 6.3 primary structure ACDEFGHIKLMNPQRSTVWY 18 Common Secondary Structure Elements • The Alpha Helix Lecture 6.3 19 Common Secondary Structure Elements • The Beta Sheet Lecture 6.3 20 Secondary Structure: Phi & Psi Angles Defined • Rotational constraints emerge from interactions with bulky groups (ie. side chains). • Phi & Psi angles define the secondary structure adopted by a protein. Lecture 6.3 21 Ramachandran Plot Supersecondary Structure Lecture 6.3 23 Secondary Structure & Protein Folding • Understanding the forces of hydrophobicity: nonpolar side chains Hydrogen bonds can form with polar side chains on outside of the protein polar side chains hydrophobic core contains nonpolar side chains unfolded or partially folded polypeptide Lecture 6.3 folded conformation 24 Hydrophobicity is a property which can be calculated for protein sequences • Hydrophobicity Scales: – Used to calculate hydrophobicity – Based on experimental evidence indicating hydrophobic/hydrophilic properties of each aa • Solubility, Stability, Location and/or Globularity of protein sequences can be predicted Lecture 6.3 Kyte / Doolittle Hyrophobicity Scale Residue A C D E F G H I K L Hphob 1.8 2.5 -3.5 -3.5 2.8 -0.4 -3.2 4.5 -3.9 3.8 Residue M N P Q R S T V W Y Hphob 1.9 -3.5 -1.6 -3.5 -4.5 -0.8 -0.7 4.2 -0.9 -1.3 25 Hydrophobicity Profile • Moving segment approach • Correlation of this technique with 3D structure interior residues exterior score Score hydrophobic 3+ hydrophilic 2 1 0 -1 -2 -3-4 1 NH2 51 101 151 201 protein sequence 251 301 COOH The a-helix is a common secondary structure element acidic • A helical wheel is a representation of the 3D structure of the a-helix. • Projection of aa side chains onto a plane perpendicular to axis of helix • Hydrophobic arcs stabilize helical interactions • Amphipathic helices are common Lecture 6.3 nonpolar 27 Secondary Structure Prediction • The presence of secondary structure elements can be predicted. • Current algorithms rely on: – – – – – – – statistics (Chou-Fasman, GOR) homology or nearest neighbor comparisons (Levin) physico-chemical properties (Lim, Eisenberg) pattern matching (Cohen, Rooman) neural networks (Qian & Sejnowski, Karplus) evolutionary methods (Barton, Niemann) and combined approaches (Rost, Levin, Argos) Lecture 6.3 28 Chou-Fasman Algorithm • Assign each residue a Pa, Pb, Pc value • Take a window of 7 residues and calculate a window-averaged value for all Pa, Pb, Pc • Assign the average value for each of the secondary structures to the middle residue • Move down one residue and repeat steps 2 thru 3 until finished • Scan and assign SS to the highest P/residue Lecture 6.3 29 Chou-Fasman Statistics Table 8 Chou & Fasman Secondary Structure Propensity of the Amino Acids A C D E F G H I K L Lecture 6.3 Pa 1.42 0.7 1.01 1.51 1.13 0.57 1 1.08 1.16 1.21 Pb 0.83 1.19 0.54 0.37 1.38 0.75 0.87 1.6 0.74 1.3 Pc 0.75 1.11 1.45 1.12 0.49 1.68 1.13 0.32 1.1 0.49 M N P Q R S T V W Y Pa 1.45 0.67 0.57 1.11 0.98 0.77 0.83 1.06 1.08 0.69 Pb 1.05 0.89 0.55 1.1 0.93 0.75 1.19 1.7 1.37 1.47 Pc 0.5 1.44 1.88 0.79 1.09 1.48 0.98 0.24 0.45 0.84 30 The PhD Approach PRFILE... Lecture 6.3 31 The PhD Algorithm • Search the SWISS-PROT database and select high scoring homologues • Create a sequence “profile” from the resulting multiple alignment • Include global sequence info in the profile • Input the profile into a trained two-layer neural network to predict the structure and to “clean-up” the prediction Lecture 6.3 32 Predicting via Neural Nets & PSSM • PHDhtm – http://www.embl-heidelberg.de/predictprotein/ • TMAP – http://www.mbb.ki.se/tmap/index.html • TMPred – http://www.ch.embnet.org/software/TMPRED_form.html ACDEGF... Lecture 6.3 33 Lecture 6.3 PHD ZHANG GOR III JASEP7 PTIT LEVIN LIM GOR I CF Scores (%) Prediction Performance 75 70 65 60 55 50 45 34 Best of the Best • PredictProtein-PHD (72%) – http://cubic.bioc.columbia.edu/predictprotein • Jpred (73-75%) – http://www.compbio.dundee.ac.uk/~www-jpred/ • PREDATOR (75%) – http://www.hgmp.mrc.ac.uk/Registered/Option/predator.html • PSIpred (77%) – http://bioinf.cs.ucl.ac.uk/psipred/ Lecture 6.3 35 Basics of Protein Structure • Primary • Secondary • Tertiary Lecture 6.3 primary structure ACDEFGHIKLMNPQRSTVWY 36 Tertiary Structure Lactate Dehydrogenase: Mixed a / b Lecture 6.3 Immunoglobulin Fold: b Hemoglobin B Chain: a 37 Protein Structure Databases • Where does protein structural information reside? – PDB: • http://www.rcsb.org/pdb/ – MMDB: • http://www.ncbi.nlm.nih.gov/Structure/ – FSSP: • http://www.ebi.ac.uk/dali/fssp/ – SCOP: • http://scop.mrc-lmb.cam.ac.uk/scop/ – CATH: • http://www.biochem.ucl.ac.uk/bsm/cath_new/ Lecture 6.3 38 Structural Proteomics • Aim to delineate total repertoire of protein folds • Provide 3D portraits for all proteins in an organism • Goal: Use structure to infer function. – Compare structure of unknown protein to known set of structures – More sensitive than primary sequence comparisons Lecture 6.3 39 The Protein Fold Universe Lecture 6.3 500? 2000? 10000? 8 How Big Is It??? ? 40 Structures in PDB PDB = 19860 structures Jan 03 PDB = 23997 structures Jan 04 “structural genomics” search = 156 structures Jan 03 search = 478 structures Jan 04 Lecture 6.3 41 500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 0 1980 Lecture 6.3 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 1985 1990 1995 2000 42 Structures Sequences Structural Proteomics Unique folds in PDB Lecture 6.3 43 Prediction Methods for 3D structure • Intermediate Steps – Predict secondary structure – Calculate solvent accessibility • Methods for 3D structure prediction based on: – Threading, Homology Modeling or Fold recognition • Similarity in amino acid sequence implies similar structure/function – Ab Initio Techniques • Numerical methods designed to simulate the structure and dynamics of marcromolecules Lecture 6.3 44 Proteomics • The study of the expression, location, interaction, function and structure of all the proteins in a given cell or organism • Expressional Proteomics • Functional Proteomics • Structural Proteomics Lecture 6.3 45 Proteomics • Expressional Proteomics • 2D or Capillary Electrophoresis, protein chips • Mass Spectrometry, Laser induced fluorescence • Functional Proteomics • Mass Spectrometry, micro-assays, protein chips • Yeast or Bacterial 2-hybrid systems • Structural Proteomics • High throughput X-ray crystallography • High throughput NMR spectroscopy Lecture 6.3 46 2D Gel Principles SDS PAGE Lecture 6.3 47 Mass Spec Principles Sample + _ Ionizer Lecture 6.3 Mass Filter Detector 48 Ionization Methods 370 nm UV laser Fluid (no salt) + _ Lecture 6.3 cyano-hydroxy cinnamic acid Gold tip needle MALDI ESI 49 Protein ID Protocol Lecture 6.3 50 Computational Tools for Protein Identification • PeptIdent – http://us.expasy.org/tools/peptident.html • Mascot – http://www.matrixscience.com/search_form_select.html Covered in Lab 6.4 • ProteinProspector – http://prospector.ucsf.edu/ • MOWSE – http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse • PeptideSearch – http://www.mann.embl-heidelberg.de/ GroupPages/PageLink/peptidesearchpage.html • AACompSim/AACompIdent – http://www.expasy.ch/tools Lecture 6.3 51 Proteomics • Human proteome estimated to contain 500,000+ proteins • The next “big wave” in bioinformatics • • • • How to deal with so much data? How to link structure to function to sequence? How to show or store temporal and spatial data? How to use it in drug discovery & development? Proteomics Workshop July 19 – 24th, 2004 Calgary, Alberta Lecture 6.3 52 The Cutting Edge of Proteomics • Evolution of Proteomes • Structural Genomics • Quantitative Mass Spectrometry and Protein Chip Technology • Chemical Proteomics • Proteome Scale Analysis of Networks, i.e., signal transduction, Y2H experiments Lecture 6.3 53 Global Proteome Interaction Mapping in C. elegans Science 23 January 2004 303: 540 see also Science 7 January 2000 287: 116 Lecture 6.3 54 Yeast Two Hybrid (Y2H) on the genomic scale • Global interaction map of C. elegans • Use proteome as bait in Y2H experiment • Detect all pairwise interactions • Create global protein:protein interaction network Lecture 6.3 55 Protein:Protein Interaction Networks Lecture 6.3 56 DNA vs Protein Chip Technology • DNA microtechnology – Can successfully read 1000’s of side by side measurements of RNA levels – BUT RNA ≠ protein = function • Protein Microarray Technology – Goal: develop protein chip with proteins in active state. • Proteins more challenging to prepare than DNA/RNA • Protein functionality depends on state, modifications, binding partners, localization etc. Lecture 6.3 57 Protein Chip - Methods • Attachment Methods: • Diffusion • Absorption – nitrocellulose • Covalent Crosslinking – Reactive surfaces • Affinity Attachment – Affinity tags Lecture 6.3 58 Protein Chip - Applications • Antibody Chip – Detect Ag-Ab interactions • Protein Chip – Protein:protein – Protein:drug – Enzyme:substrate • Ligand Chip • And more…. Lecture 6.3 59 Protein Chips Summary • Protein sequence, and subsequently protein sequence databases, are much more complex than DNA • Prediction of protein structure is a complex problem at both the 2D and 3D levels • Proteomics initiatives based on different technologies are making inroads into the study of protein structure and function on a global level Lecture 6.3 61
© Copyright 2025 Paperzz