Assignment 3 Genetic sequence is a string from DNA bases which represented by four-‐letter alphabet , Adenine (A), Thymine (T), Guanine (G), and Cytosine (C). The interpretation of these sequences will helps in answering question such as what is the function of the protein that this gene encodes and many more. Thus when a new findings are made we need to compare the genetic sequence with the one which has already been sequenced and whose function is well understood (reference sequence). This process is called sequence alignment. By doing so, we will be able to measure the similarity of genetic sequences by using penalty. The higher the penalty means the lesser similarity of the two sequences. As a reminder, sequence alignment is not base pairing. It is a comparison of two sequences. As an example, alignments of AACAGTTACC that is the reference sequence and TAAGGTCA-‐ -‐ that is a test sequence are: Sequence length Reference sequence (referenceseq.txt) A A C A G T T A C C Gaps Test sequence (testseq1.txt) T A A G G T C A -‐ -‐ Penalty 1 0 1 1 0 0 1 0 2 2 Penalty Cost Total Penalty = 1+0+1+1+0+0+1+0+2+2 = 8 Per gap (-‐) 2 The alignment in this example gives a total penalty of 8. Per mismatch 1 Per match 0 1. Write a program to compute the optimal sequence alignment of two DNA sequence by utilizing control structures: loop and selection based on the following steps: Step 1: Read sequence to be aligned (test sequence) using ‘char’ type from testseq1.txt (above example) or testseq2.txt. Ask user to choose by pressing ‘1’ for testseq1.txt or press ‘2’ for testseq2.txt as in figure below: Step 2: Read the DNA sequence from reference file (referenceseq.txt). Step 3: Display the reference sequence with the test sequence and calculate the penalty for each of the bases in the two sequences based on the example above. Your program should show output as in figure below: Step 4: Calculate the total penalty for the test sequence and length of the sequence (Including gaps (-‐)). Step 5: Calculate the GC content percentage (the number of G and C bases over the whole sequence length) given by following formula: GC-‐content percentage: ((No. of G + No. of C) / sequence length) x 100 Example using test sequence in the above example (testseq1.txt): ((2+ 1) / 10) x 100 = 30% AT/GC ratio: (No. of A + No. of T) / (No. of G + No. of C) Example using test sequence in the above example (testseq1.txt): (3+2) / (2 + 1) = 1.66667 Step 6: This program will be terminated if user enter value ‘n’ or ‘N’ when the output program displays “Do you want to use this program again? Press ’Y’ if YES or Press ‘N’ if NO”. At this step, your program should look like the figure below:
© Copyright 2026 Paperzz