ID: NAME: . Exam for Applied Bioinformatics International Masters in Molecular BioEngineering, TU Dresden Prof. Michael Schroeder 19.2.2009 Questions Please remove from your desk all but: your pen, this document, and provided scratch paper . Please place your student ID card clearly visible on your desk. You have 90 minutes to answer the questions. Please mark each page of this document with your name and student ID. All answers must be written in this document, but you can use additional scratch paper for preparing your answers. Please read all questions before you start answering. The exam consists of three independent parts. Within each part, the first questions are typically easier than the last questions. ID: NAME: . Part 1: Dot plots and sequence alignment (each question is 20%) a) Fill in the dot plot below, and indicate a region for which the sequences align well. b) Briefly explain how to compute sequence alignments with dynamic programming. How do you set the scoring scheme to compute the longest common subsequence? c) Fill in the dynamic programming matrix below according to your scoring scheme in 1b, write down the alignment(s) obtained, and state the length of the longest common subsequence: Q R R Q Y 0 Q 0 R 0 Q 0 0 0 0 0 0 ID: NAME: . d) Below are the sequences of two SH3 domain carrying proteins Vinexin (human) and Sorbin (mouse) and a dot plot of these two sequences. The dot plot is computed using a window size of 6 and allowing at most 3 mismatches Write down an alignment of the first short region (~10 amino acids) of high sequence similarity as seen on the dot plot. How many mismatches are there ? How long is the SH3 domain (roughly) and how many regions of high sequence similarity of more than 12 amino acids are there ? ID: NAME: . e) RNA folds building loops and stems. Stems arise because regions of the RNA match their reverse complement, which does not happen for the loops. Below you find three (predicted) RNA structures S1, S2, S3 and three dot plots D1, D2, D3 of the sequences against their reverse complement. An arrow indicates the start of the sequence on the structures. The dot plots are computed using a window size of 3 and allowing at most 1 mismatch: Briefly explain why some blocks in the dot plots are uniformly black, others have chequerboard motifs and others have a diagonal line motif? Which structures belong to which dot plots? (write three pairs: Sx → Dy) Briefly justify you answer. To which stem (stem 1 and stem 2 in S1, stem 3 in S2 , or stem 4 in S3) corresponds the block b1 in dot plot D2? Indicate on the dot plot (draw a circle around it) which other block corresponds to this stem. ID: NAME: . Part 2: Phylogeny (each question is 20%) a) Below is the multiple sequence alignment of four sequences: seq1 IVFLGE seq2 LVLLGEAV seq3 LVLLGDSVG seq4 VIILGDSTV seq1 seq2 seq3 seq4 seq1 seq2 seq3 seq4 Suppose that a gap scores 1, a mismatch 1, and a match +2. Fill in the matrix with the pairwise scores. b) Explain hierarchical clustering: c) Cluster the sequences using single linkage and draw the corresponding phylogenetic tree and put the scores on the tree at each junction: ID: NAME: . d) Briefly explain how bootstrapping works: e) As part of bootstrapping, the following 4 trees were generated: Copy your tree from c) and add to it the bootstrap values according to the four trees: Which non trivial cluster is the most reliable one? And why? ID: NAME: . Part 3: Structure (each question is 20%) a) What is RMSD and what is it used for? b) Proteins are not rigid, but flexible. How could the RMSD definition be modified to cope with flexibility? Give advantages and disadvantages of the basic RMSD definition and your modification. c) Briefly explain what a Ramachandran plot is? How can it be used to classify structural domains into the classes all alpha, all beta, alpha+beta(Mainly antiparallel beta sheets, segregated alpha and beta regions) , alpha/beta (Mainly parallel beta sheets, betaalpha beta units)? ID: NAME: . d) Below you find three Ramachandran plots R1, R2, R3, and three structures S1, S2, S3. The order is mixed up. Note: All Ramachandran plots have the same background (overall distribution across all structures). What distinguishes them are the dots that represent all (Phi,Psi) pairs for a given structure. Which plots and structures belong together? (Provide three Rx → Sy pairs.) ID: NAME: . e) We want to build a hierarchical classification for protein structures. Briefly explain how to use structure alignment and hierarchical clustering for this purpose. END OF EXAM ID: NAME: . EXTRA SPACE FOR ANSWERS: (please refer to exam part and question) ID: NAME: . EXTRA SPACE FOR ANSWERS: (please refer to exam part and question)
© Copyright 2026 Paperzz