Reading 3 RNA bases at a time. From RNA to Protein.

19.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 6
PLAYING WITH PYTHON: DE-CODING DNA
A quick recap of the range function.
The range you know:
range with start/stop
>>> print range(5) [0, 1, 2, 3, 4] >>> >>> print range(1,5) [1, 2, 3, 4] >>> Range with variable steps
print range(1,6,2) [1,3,5] >>> Practice exercises. In the shell window, enter the code which will give the
output listed in the invidual box.
Exercise 1:
Exercise 2:
Exercise 3:
>>> [0,1,2,3,4,5,6,7,8,9] >>> >>> [6, 7, 8, 9] >>> >>> [10,20,30,40,50, 60,70,80,90] >>> Now, lets go back to the biology and continue de-coding our DNA sequences.
Use the code window to type the programs below. Remember:
Enter the program
Save it
Run it
London Research Institute, Cancer Research UK
Output appears in the shell window
1 19.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 6
Read the alphabet by looping one step at a time and printing three letters. Do you
remember the program below?
Your program in the Code Window
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" length_of_alphabet = len(alphabet) list_of_positions = range(length_of_alphabet) print "Three letters at a time" for letter in list_of_positions: print alphabet[letter:letter+3] This is what is happening: the program skips forward one letter
at a time and reads three
The output in the Shell Window
>>> Three letters at a time ABC BCD CDE DEF … XYZ YZ Z >>> ABCDEFGHIJKLMNOPQRSTUVWXYZ PROGRAM 1.
Now write a program to loop three steps at a time and print three letters. Use the
range function you just learned.
Your program in the Code Window
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" length_of_alphabet = len(alphabet) list_of_positions = range(0,length_of_alphabet,3) print "Three letters at a time" for letter in list_of_positions: print alphabet[letter:letter+3] The output in the Shell Window
>>> Three letters at a time ABC DEF GHI JKL … VWX YZ >>> The program reads and prints three letters at a time
ABCDEFGHIJKLMNOPQRSTUVWXYZ Do you notice the difference in output from the two programs above?
Do you remember why it is important to read three DNA bases at a time? Discuss.
London Research Institute, Cancer Research UK
2 19.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 6
PROGRAM 2.
Lets re-write the previous program to read an RNA sequence three letters
at a time.
Here is an example of an RNA sequence. Use one from the RNA sequences in the “DNA and RNA
sequences” folder on your Raspberry Pi to test the program below.
GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA RNA my_rna = raw_input("Enter a string of RNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases: " print number_of_bases start_position=0 print "Now start reading 3 bases at a time..." for t in range(start_position,number_of_bases,3): codon=my_rna[t:t+3] print codon Reading the right codons (the right sequence of three bases at a time).
Remember, last time we looked for START and STOP codons!
GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA START
RNA STOP
PROGRAM 3.
Write a program that would:
1. Look for the Start codon in an RNA sequence.
2. Then read 3 bases at a time.
1. Read one letter at a time until you reach AUG (the START codon).
GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA RNA 2. Once you find AUG, read three bases at a time…
Here, we have
combined the two
different range
functions
GCAUAUGUUCAUAGACGAGUUAAUCAGCUGA RNA London Research Institute, Cancer Research UK
3 19.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 6
my_rna = raw_input("Enter a string of RNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases: " print number_of_bases start_position=0 for b in range(number_of_bases): if(my_rna[b:b+3] == "AUG"): start_position=b print "Found the start codon" break print "Now start reading 3 bases at a time..." for t in range(start_position,number_of_bases,3): codon=my_rna[t:t+3] print codon CAUAUG UUC AUA GAC GAG UUA AUC AGC UGA ````````````````` RNA START
PROGRAM 4.
Can you modify the program to identify any potential Stop codons (UAA,
UAG, UGA) and know when to stop reading?
my_rna = raw_input("Enter a string of RNA bases, A,G,C or U: ") number_of_bases = len(my_rna) print "Number of bases: " print number_of_bases start_position=0 for b in range(number_of_bases): if(my_rna[b:b+3] == "AUG"): start_position=b print "Found the start codon" break print "Now start reading 3 bases at a time..." for t in range(start_position,number_of_bases,3): codon=my_rna[t:t+3] print codon if(codon == "UAA"): print "Found a stop codon!" elif(codon == "UAG"): print "Found a stop codon!" elif(codon == "UGA"): print "Found a stop codon!" CAUAUG UUC AUA GAC GAG UUA AUC AGC UGA ````````````````` START
RNA STOP
Now we are in the correct reading frame.
London Research Institute, Cancer Research UK
4 19.11.2014
Code Breaking: Reading the Genetic Code with Raspberry Pi, Visit 6
PROGRAM 5.
The final step in de-coding the DNA. Translating the RNA sequence into a
protein sequence.
The codon table: our new dictionary. You can find this in a folder on your Raspberry
Pi. You can copy and insert the dictionary in your new program.
# from RNA to protein genetic_code = {"GCA":"A", "GCC":"A", "GCG":"A", "GCU":"A", "UGC":"C", "UGU":"C", "GAC":"D", "GAU":"D", "GAA":"E", "GAG":"E", "UUC":"F", "UUU":"F", "GGA":"G", "GGC":"G", "GGG":"G", "GGU":"G", "CAC":"H", "CAU":"H", "AUA":"I", "AUC":"I", "AUU":"I", "AAA":"K", "AAG":"K", "UUA":"L", "UUG":"L", "CUA":"L", "CUC":"L", "CUG":"L", "CUU":"L", "AUG":"M", "AAC":"N", "AAU":"N", "CCA":"P", "CCC":"P", "CCG":"P", "CCU":"P", "CAA":"Q", "CAG":"Q", "AGA":"R", "AGG":"R", "CGA":"R", "CGC":"R", "CGU":"R", "CGG":"R", "AGC":"S", "AGU":"S", "UCA":"S", "UCC":"S", "UCG":"S", "UCU":"S", "ACA":"T", "ACC":"T", "ACG":"T", "ACU":"T", "GUA":"V", "GUC":"V", "GUG":"V", "GUU":"V", "UGG":"W", "UAC":"Y", "UAU":"Y", "UAG":"!", "UAA":"!", "UGA":"!"} rna_sequence=raw_input("RNA sequence: ") number_of_bases=len(rna_sequence) print "Number of bases: ",number_of_bases protein_start=0 for b in range(number_of_bases): if(rna_sequence[b:b+3]=="AUG"): protein_start=b break protein_sequence="" for b in range(protein_start,number_of_bases,3): AA=genetic_code[rna_sequence[b:b+3]] if(AA=="!"): break else: protein_sequence=protein_sequence+AA print protein_sequence CAUAUG UUC AUA GAC GAG UUA AUC AGC UGA ````````````````` F I D E L I S START
RNA Protein STOP
Practice 1.
Now check the results you get after you translate the other RNA sequences from your
“DNA and RNA sequences” folder. How do they compare to the paper exercise you did
during the first session?
We can now move on to the diagnostic test and identifying the gene involved in cancer.
London Research Institute, Cancer Research UK
5