HW 5: Structural phylogenomic analysis: Voltage-gated Potassium channel membrane topology prediction Answer key and motivation for steps in the homework Bioe 190 Fall 2016 Notes on this answer key: You’ll see text in red sprinkled throughout this answer key, along with a few figures. The figures come from a solution provided by Katelyn Greene. Katelyn went far above what was required, but I’m grateful to her for this work. The text in black is from the original homework assignment (which has been edited down a bit for compactness). 1. Go to UniProt and retrieve and examine the record for KCNA1_HUMAN. a. Draw the membrane topology for KCNA1_HUMAN, given the SwissProt annotation. (For the purposes of this lab, you can assume the SwissProt topology annotation, including the location of transmembrane domains and intramembrane segments, is roughly correct.) Purpose: To gain experience in diagramming membrane proteins. The standard approach, as presented in lecture, for depicting membrane proteins is a “snake diagram” (so-called because the protein snakes in and out of the membrane). Katelyn’s solution (shown below) goes above and beyond what I expected – it’s publication quality. For the purposes of this homework, I was happy to accept pictures (taken with a cell phone) of simple hand drawings. b. Submit the sequence to Pfam. Include this information on your membrane topology drawing … i. Confirm that the Ion_trans domain spans the entire region from TM segment S1 to S6. Purpose: To understand the correspondence between this important Pfam domain and the membrane-spanning region. c. Submit the sequence to TMHMM and compare the results with the TM domains and topology listed in the SwissProt record. Note which TM domains (or intramembrane segments) are identified by both SwissProt and TMHMM; note any disagreements. Purpose: To understand the common errors produced by transmembrane prediction tools. Answer: Most of the proteins were problematic for TMHMM, which missed one or more TM helices. In most cases, the positively charged S4 segment (the voltage sensor) stumped TMHMM. 2. Run BLAST to find homologs in SwissProt using the UniProt BLAST server. Aim at having 3 or 4 members per subfamily (with species names that you recognize and whose relative taxonomic relationships you understand). Purpose: To gain experience in interpreting MSAs and phylogenetic trees. 3. Confirm that your selection includes the following UniProt IDs (if not, add them): 1. KCNA3_HUMAN 2. KCA10_HUMAN 3. KCNA7_HUMAN 4. KCNC1_HUMAN 5. KCNC4_HUMAN 6. KCNA1_ONCMY 7. KCNSK_CAEEL 8. KCNAB_DROME 9. KCNAG_CAEEL Purpose: I chose these proteins because the SwissProt membrane segment annotations (the intramembrane and/or transmembrane segments) of these sequences disagrees with the consensus. 4. Then click the Align button just above the sequence selection box. 5. Highlight the transmembrane and intramembrane regions using the control box at the left side of the screen. Create a series of figures, with a screenshot of each panel displaying either a TM domain or an intramembrane segment. You will immediately observe that some members (including all of the sequences listed above) disagree with an obvious consensus. Surprisingly, the sequence similarity in these regions (between members having the consensus topology and those disagreeing) can be really high, so you would expect them to have transmembrane and intramembrane segments in the same positions. As shown in Figure 3, the (annotated) membrane segment edges do not always align. Recall the Positive Inside Rule, and draw a rectangle around the consensus region for each TM and intramembrane segment. Your figure caption should include a title that indicates which transmembrane/intramembrane segments are displayed. Label each TM segment by the segment label in SwissProt (e.g., S1, S2, etc.). Purpose: To gain experience in using a consensus annotation approach. Note that there are many different ways to derive a consensus. The most common is a majority-rule approach (as shown in the Pevsner text for secondary structure prediction). If you used a majority rule consensus, you’d label a column as in the membrane if a majority of the sequences were labeled as in the membrane for that position. A strict consensus would require all the sequences to be labeled as in the membrane. As you can imagine, which rule you use depends on whether you want to optimize precision (use the strict consensus) or a balance between precision and recall (use the majority rule). But since you have a separate source of information – the Positive Inside Rule – you can use this to set the membrane segment border more precisely: if you see positively charged residues (K, R and H) at the cytoplasmic side of a membrane segment, you can position the end or beginning of the membrane segment so that those residues are just outside the membrane. 6. Now, examine the outliers. You should be able to see that each of the 9 sequences listed in point 3, above, diverge in fundamental ways from their homologs. For each protein in the list, explain in what way(s) it departs from the consensus. Purpose: to understand the limitations of transmembrane prediction when the membrane segment does not agree with the model used by the prediction tool (i.e., that the TM segment should be hydrophobic): the S4 segment is the voltage-sensor, and includes positively charged residues. This throws the prediction tools off. The other problem is that the TM prediction tools don’t (generally) recognize intramembrane segments, and can mistake intramembrane segments for TM segments. Since the SwissProt curators make use of TM prediction webservers/tools, their annotations will include these errors. (SwissProt also transfers annotations of membrane segments by homology: this is the most likely explanation for both the accurate annotations and the errors.) Here’s what you should have found. 1. 2. 3. 4. 5. 6. KCNA3_HUMAN (missing intramembrane segment) KCA10_HUMAN (missing intramembrane segment) KCNA7_HUMAN (missing intramembrane segment) KCNC1_HUMAN (missing intramembrane segment) KCNC4_HUMAN (missing intramembrane segment) KCNA1_ONCMY (missing S4 segment; has a predicted TM domain where the consensus is an intramembrane segment), 7. KCNSK_CAEEL (missing S4 segment; missing intramembrane segment) 8. KCNAB_DROME (missing intramembrane segment) 9. KCNAG_CAEEL (missing S4 segment) Note that SwissProt identified a candidate S4 segment for all of these proteins – but in several cases, the S4 segment was not in the consensus position. 7. Phylogenomic function prediction and interpreting phylogenetic trees for protein superfamilies. Examine the tree (below the UniProt MSA display). Purpose: knowing how to examine phylogenetic trees representing gene families (or superfamilies, such as these Potassium channel proteins) is a useful skill. This problem is designed to give you some experience. a. Examine the placement of KCNA1_ONCMY. i. If KCNA1_ONCMY had not been assigned to the KCNA1 subfamily and had instead been labeled “Unknown” or “Hypothetical” and you had to predict a function (gene subfamily assignment) based on its phylogenetic placement, what subfamily would you assign it to? Purpose: to gain experience in inferring function based on a phylogenetic placement. In this case, the correct inference is that the sequence cannot be classified functionally based on the tree. If you examine the subtree containing KCNA1_ONCMY and other sequences (see figure below), you’ll see that all the other sequences are equally distant from KCNA1_ONCMY (based on a visual inspection of the tree, recalling how tree distances are measured). Also: the most recent common ancestor (MRCA) of KCNA1_ONCMY and all of the other subtypes is at the node just above KCNA1_ONCMY. Figure from Katelyn’s solution. You’ll note that the tree successfully clusters KCNA2 and KCNA3 subfamilies – but that the KCNA1 subfamily appears to be broken up, with KCNA1_ONCMY isolated as an outgroup sequence apart from the other KCNA1 sequences. ii. If the KCNA1 subfamily assignment were correct, where would you expect this sequence to be placed in the tree? Explain your logic. Answer: you’d expect KCNA1_ONCMY to be placed within the KCNA1 subtree. iii. Look up the terms monophyletic/monophyly, paraphyletic/paraphyly and polyphyletic/polyphyly, and decide which term describes the KCNA1 subfamily (based on the SwissProt assignment of sequences to this subfamily, for sequences included in the tree). See https://www.mun.ca/biology/scarr/Taxon_types.htm. See Figure 1, at the end of this document. Answer: the KCNA1 subfamily is polyphyletic. iv. Go back to the SwissProt page and examine the evidence supporting the functional assignment to the KCNA1 subfamily. What evidence is provided? How strong is that evidence? Answer: the annotation appears to be based on similarity (to some unnamed protein). In fact, if you click on the Publications link in the left column, you’ll find a paper that describes this protein as not belonging to any particular subfamily (see below). A publication is linked in (see below). The abstract (see below) shows that the KCNA1_ONCMY sequence (tsha2) was described as equally similar to KCNA1, KNCA2 and KCNA3 subtypes: “tsha2 did not show a preferential sequence homology with a particular subtype of shaker, but exhibited uniform similarity with mammalian Kv1.1, Kv1.2, and Kv1.3, respectively.” (This appears to not have been noticed or perhaps ignored by the SwissProt curators.) v. Submit KCNA1_ONCMY to BLAST (either at UniProt or NCBI) to see what matches come to the top. If you were using an annotation transfer protocol to predict the function of KCNA1_ONCMY, what subfamily would you assign it to? Purpose: This part of the homework is designed to demonstrate the problems with the standard annotation transfer protocol. Answer: The answers to this problem depend on what sequence database you selected. If you ran BLAST against SwissProt, the top hit is: P22739 (KCNA2_XENLA), annotated as “Potassium voltage-gated channel subfamily A member 2” (i.e., KCNA2 subfamily). If you ran BLAST against UniProt (including TrEMBL), the top hit is: G3G7Y7 (G3G7Y7_LATJA), annotated as “Potassium voltage-gated channel Kv1.3”. (A separate search in SwissProt will show you that Kv1.3 indicates the KCNA3 subfamily – see the annotation for P15384.) If you ran BLAST against NR, the top hit is: XP_013981333.1 (from Salmo salar) annotated as “shaker-related potassium channel tsha2”. (It’s not clear where this functional description originated; perhaps from KCNA1_ONCMY.) b. Examine the subtrees containing three or more sequences with the same gene name (e.g., KCNA1, KCNA2, KCNA3). Find the maximal subtree that is restricted to sequences from the same subfamily. (See Figure 4 for an example of a solution to this part of the lab.) For each subtree satisfying these criteria, do the following: i. Insert a screen shot of the subtree (from the subtree root to the leaf labels). Note that some students submitted screen shots that were not restricted to the subtree for that subfamily. Unless your figure somehow indicated the subtree under consideration (e.g., by putting a box around the subfamily tree – as Katelyn does in her solution, as shown below -- or placing a disk at the subtree root node) it would not be an effective figure. This is why the instructions specifically asked you to restrict the subtree screenshot from the subtree root to the leaf labels. (Note that Katelyn’s subtree images would have been more effective if she’d restricted the screenshot to each subtree; this would have enabled her to expand the image and make the leaf labels (sequence identifiers) larger.) ii. Create a figure caption with title “Subtree for <gene name> subfamily”. iii. In the figure caption text, explain the assumed evolutionary relationship of these sequences (ortholog, paralog, super-ortholog) based on having been assigned by SwissProt to the same subfamily/subtype, and the subtree topology. Use the most precise term possible, and explain your logic. Purpose: To gain experience with the orthology subtype definitions and inferring these relationships by analysis of a phylogenetic tree Answer: All of the subtrees should satisfy the definition of super-orthology. iv. Explain whether the subfamily is monophyletic, paraphyletic or polyphyletic – based on the gene name assigned by SwissProt and by the presence or absence of other proteins with the same gene name in the tree. Purpose: To gain experience with these phylogenetic terms. Answer: With the exception of the KCNA1 subfamily tree (which would not include KCNA1_ONCMY) of the subtrees should satisfy the definition of monophyletic. v. Examine the branching order between the species in the subtree. Is the branching order congruent with the trusted species phylogenies? (Recall that in step 2 you selected sequences from species whose taxonomic relationships you understood.) Purpose: To gain experience with interpreting phylogenetic trees and gene tree species tree reconciliation. Answer: Many of the subtrees were not congruent with the reference species phylogeny. Shown below are three examples (from Katelyn Greene’s solution), two that are not congruent with the trusted species tree and one that is. 8. Now, perform a detailed MDA and topology analysis for each of the 4 proteins shown in boldface in point 3 using the same techniques as in part 1 (for KCNA1_HUMAN). a. Based on your analyses, answer the same questions (1a-c). b. Examine the SwissProt record for the hit to confirm that the transmembrane and intramembrane alignment coloring in the UniProt MSA reflects the actual assignments in the SwissProt record. c. Derive and evaluate the pairwise alignment of KCNA1_HUMAN and each hit being evaluated using the NCBI BLAST server (select the “Align two or more sequences” checkbox). Purpose: To gain experience with using the BLAST “align two or more sequences” tool, and in examining a pairwise alignment to evaluate whether the two share a common MDA. i. Insert a screenshot of the pairwise alignment, including the alignment statistics (%ID and E-value, etc.). Highlight (or draw a box around) each region where the SwissProt labeling of transmembrane or intramembrane segments disagrees with the consensus. ii. Evaluate the degree to which you can infer that the two proteins share the same MDA based exclusively on the pairwise alignment. Is the alignment global to each, or local to one or both? Note: This was a slightly hard call. Many heuristic approaches for evaluating whether two proteins are globally alignable examine only the E-value and fractional (bi-directional) overlap, using a cutoff of 70% or 80% overlap. If you use this approach, you have a lot of company. But in fact, it’s potentially problematic, as these analyses will show you. If you want to predict whether two proteins share a common MDA – based on the pairwise alignment – you should be on the lookout for any extended regions that are not included in the alignment. For all of the proteins you are asked to evaluate, the pairwise alignments appear convincing: the alignments have strong E-values, good to high pairwise percent identity and low (to moderate) gaps, and most of the residues are included in the pairwise alignment. But if you look more closely, you’ll see that several are in an ambiguous zone with long stretches at the N-terminus (before the BTB/POZ domain) and/or at the C-terminus (after the Ion_trans domain) that are not included. If you return to the MSA produced by the UniProt server you’ll see that these regions appear variable across the set of homologs I asked you to select – but then, I deliberately asked you to include sequences that aligned to these two domains but allowed glocal matches. Two sequences with extended regions not included in the pairwise BLAST alignment (and small to moderate indels in the BTB/POZ and Ion_trans domains): KCNA1_ONCMY (very minor indels in the BTB/POZ and Ion_trans domains, but BLAST alignment leaves last 110aa of KCNA1_HUMAN out) KCNSK_CAEEL (the first 134 amino acids (before the BTB/POZ domain) are not included. Moderate size gap (12aa) in the BTB/POZ domain. iii. To further confirm agreement in MDA, you can compare the Pfam domains and domain order. Now compare the Pfam domains found in the hit against the Pfam domains found in KCNA1_HUMAN: 1. Are the same Pfam domains found in the same order? (Compare this finding to your answer to the previous question.) Answer: yes to all. 2. Examine the ranges of each Pfam domain in each sequence in the pairwise alignment: is the entire Pfam BTB_2 and Ion_trans domain included in the pairwise alignment? Do the Pfam domain ranges overlap perfectly or only somewhat? If there are insertions or deletions, are they primarily (or exclusively) outside of the two Pfam domains? Answer: The Pfam domain boundaries overlap. iv. Examine the membrane-spanning and intramembrane segments of KCNA1_HUMAN (based on the SwissProt annotation). Are there any insertions or deletions relative to KCNA1_HUMAN in these membrane segments? If there are indels in the alignment, are they primarily outside of these (KCNA1_HUMAN) membrane segments? Purpose: This was designed to help you develop an intuition about the types of structural/functional pressures on proteins – indels are generally well tolerated in regions between evolutionary/structural/functional domains and you will seldom see indels in membrane segments. Answer: With almost no exceptions, the indels are outside of the membrane segments identified (by SwissProt) for KCNA1_HUMAN. However, if you take a look at the MSA produced by Clustal Omega (which I didn’t ask you to do), you’ll see that the indel characters have been shifted out of the membrane segments – and the alignments are generally better. v. For each region where the membrane segment labeling disagrees with the consensus, attempt to characterize the similarity between the hit and KCNA1_HUMAN (very similar, moderately similar, divergent). Answer: The sequence identity appears high (very similar) in the membrane segments, even though the annotations in SwissProt do not agree. vi. Try to explain the disagreement between the membrane-segment labeling of KCNA1_HUMAN and the hit. If the sequence similarity is high and there are few gaps, you should expect that the structural similarity should be high. So if the membrane segment annotation disagrees dramatically, you might reasonably expect that biological curator error is to blame. In your opinion, is this (1) biological curator error (perhaps based on TMHMM or other transmembrane prediction tool error), (2) sequence divergence (indicative of actual structural divergence), or (3) some other reason. Answer: I assume this is biological curator error.
© Copyright 2026 Paperzz