Supplementary Material for Genes adapt non-optimal codon usage to create cell-cycle-dependent oscillations in protein levels Milana Frenkel-Morgensterna,b,1,2, Tamar Danona, Thomas Christianc, Takao Igarashic, Lydia Cohena, Ya-Ming Houc, Lars Juhl Jensend a Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, 76100, Israel. b Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100, Israel. c Department of Biochemistry and Molecular Pharmacology, Thomas Jefferson University, Philadelphia, PA 19107, USA. d Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, DK-2200, Denmark. 1 to whom correspondence should be addressed: [email protected] or [email protected] 2 current address: Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain. This document includes: Supplementary Tables Supplementary Figures Supplementary Results Supplementary Methods References 1 Table 1: The codon preferences for sets of the cell-cycle-regulated genes of Schizosaccharomyces pombe: 40 cell-cycleregulated genes in the B1 set, 188 genes in the B2 set, and the 500 most significantly oscillating genes in the top-500 set were analyzed (Jensen et al, 2006b). The amino acids: Ala, Arg, Ile, Leu, Pro, Thr, Ser, Val, which use wobble inosine tRNA modification, utilize the non-optimal codons with significant preferences in at least two sets of the cell cycle regulated genes (P-values were calculated using the bootstrap sampling with the same CAI distribution as that of cell cycle regulated genes). Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG B1 -0.02 0.01 0 0.01 -0.04 0.02 0 0.01 0 0.01 0.01 -0.01 0 0 0.05 -0.05 0.01 -0.01 -0.01 0.01 -0.02 0 0.01 0.01 0.01 -0.01 0.01 0.02 -0.03 -0.01 0 -0.02 0.01 0.02 0 0.02 -0.02 0 0.01 -0.01 0.01 0.01 -0.01 -0.01 0 -0.01 -0.01 0 0.01 0.01 0.02 -0.01 0 -0.01 0 0.03 -0.03 -0.01 -0.02 0.04 -0.01 Preferences B2 -0.01 0.02 0 -0.01 -0.01 0.03 -0.02 0.01 0.01 -0.02 0.02 -0.02 0 0 0 0 0 0 0.03 -0.03 -0.03 -0.01 0.03 0.01 0.04 -0.04 0.03 -0.01 -0.02 0.01 -0.01 -0.02 0.01 0.01 0 0.02 -0.02 0 0.02 -0.02 -0.01 0.02 -0.02 0.01 0 0.02 -0.02 -0.01 0 0.01 -0.01 0.03 -0.01 -0.01 0 -0.01 0.01 0.02 -0.01 -0.01 0 Top-500 -0.02 0.01 -0.01 0.02 -0.02 0.03 -0.01 0.02 -0.01 -0.01 0.04 -0.04 0.01 -0.01 0 0 -0.01 0.01 0.02 -0.02 -0.03 -0.02 0.04 0.01 0 0 0.01 0.02 -0.03 0.01 -0.02 -0.01 0.01 0.01 0 0.01 -0.01 0 0.04 -0.04 0.01 0.01 -0.01 -0.01 0 0.01 -0.02 0.01 0 0 0.01 0.01 -0.01 -0.01 0 -0.02 0.02 0.01 -0.01 0 0 2 B1 0.14 0.26 0.53 0.29 0.02 0.24 0.51 0.22 0.51 0.27 0.3 0.3 0.51 0.51 0.05 0.05 0.29 0.29 0.5 0.52 0.19 0.52 0.34 0.24 0.34 0.34 0.28 0.13 0.08 0.22 0.52 0.01 0.09 0.06 0.52 0.19 0.19 1 0.32 0.32 0.27 0.27 0.2 0.53 0.53 0.17 0.21 0.45 0.14 0.25 0.01 0.26 0.51 0.19 1 0.07 0.07 0.26 0.1 0.02 0.23 P-values B2 0.13 0.0001 0.54 0.14 0.14 0.01 0.0001 0.06 0.01 0.0001 0.04 0.04 0.53 0.52 0.51 0.52 0.52 0.53 0.0001 0.0001 0.0001 0.05 0.01 0.07 0.0001 0.0001 0.0001 0.14 0.02 0.05 0.07 0.01 0.0001 0.05 0.53 0.04 0.04 1 0.06 0.06 0.19 0.01 0.05 0.18 0.57 0.0001 0.05 0.05 0.55 0.01 0.19 0.0001 0.18 0.08 1 0.17 0.17 0.0001 0.01 0.14 0.52 Top-500 0.0001 0.02 0.0001 0.0001 0.0001 0.0001 0.0001 0.06 0.0001 0.02 0.0001 0.0001 0.03 0.03 0.52 0.52 0.05 0.05 0.0001 0.0001 0.0001 0.0001 0.0001 0.01 0.52 0.53 0.02 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.01 0.56 0.08 0.08 1 0.0001 0.0001 0.02 0.02 0.0001 0.55 0.59 0.0001 0.0001 0.1 0.58 0.58 0.01 0.04 0.09 0.03 1 0.0001 0.0001 0.02 0.02 0.53 0.54 Table 2: The codon preferences for sets of the cell-cycle-regulated genes of Saccharomyces cerevisiae: 113 cell cycle regulated genes in the B1 set, 352 genes in the B2 set, and the 600 most significantly oscillating genes in the top-600 set were studied (Jensen et al, 2006b). The amino acids: Ala, Arg, Ile, Pro, Thr, Ser, Val, which use wobble inosine tRNA modification, utilize the non-optimal codons with significant preferences in at least two sets of the cell cycle regulated genes (P-values were calculated using the bootstrap sampling with the same CAI distribution as that of cell cycle regulated genes). Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG B1 -0.02 0 -0.02 0.04 0.02 0.01 -0.01 0.01 -0.02 -0.01 -0.01 0.01 -0.03 0.03 0.04 -0.04 -0.02 0.02 -0.03 0.03 -0.03 -0.01 0.06 -0.02 -0.04 0.04 -0.01 0.02 -0.01 -0.01 0.02 0.02 -0.01 -0.01 -0.01 -0.01 0.01 0 0.02 -0.02 0.04 0.01 -0.01 -0.04 -0.01 0 0 -0.01 -0.02 0.03 0.03 -0.02 0.01 -0.02 0 0 0 0 0 0.01 -0.01 Preferences B2 -0.01 0.01 0 0 -0.01 -0.01 0.01 0.01 0 0 0.02 -0.02 0.02 -0.02 -0.03 0.03 0.01 -0.01 0.01 -0.01 -0.01 -0.01 0.01 0.01 -0.01 0.01 0.01 -0.01 0 0 -0.01 0 0 0 0.01 0 0 0 0.01 -0.01 0.01 -0.01 0.01 -0.02 0 0.01 -0.01 -0.01 0.01 0 0 0.02 -0.02 0 0 -0.02 0.02 0.01 -0.01 -0.01 0.01 Top-600 -0.03 0.01 0 0.02 -0.01 0 -0.01 0.01 0.01 0 0.03 -0.03 0.02 -0.02 0 0 0.02 -0.02 0 0 -0.03 0 0.03 0 -0.01 0.01 0.01 0.02 -0.03 -0.01 -0.02 0 0 0.02 0.01 0 0 0 0.03 -0.03 0.02 0.01 -0.01 -0.02 0 0.01 -0.01 -0.01 0.01 0.01 0.03 0.02 -0.04 -0.01 0 -0.04 0.04 0.02 -0.02 -0.01 0.01 3 B1 0.06 0.52 0.12 0.0001 0.09 0.001 0.52 0.52 0.001 0.09 0.21 0.21 0.001 0.001 0.04 0.04 0.07 0.07 0.001 0.001 0.01 0.12 0.01 0.02 0.01 0.01 0.18 0.05 0.25 0.06 0.01 0.04 0.01 0.18 0.05 0.22 0.22 1 0.08 0.08 0.01 0.52 0.13 0.0001 0.03 0.53 0.52 0.08 0.001 0.001 0.0001 0.02 0.21 0.01 1 0.51 0.52 0.51 0.52 0.22 0.16 P-values B2 0.09 0.04 0.53 0.52 0.12 0.05 0.06 0.55 0.56 0.54 0.0001 0.0001 0.0001 0.0001 0.01 0.01 0.11 0.11 0.06 0.06 0.1 0.02 0.05 0.05 0.52 0.52 0.06 0.07 0.52 0.55 0.03 0.55 0.58 0.54 0.0001 0.53 0.52 1 0.12 0.12 0.51 0.0001 0.02 0.06 0.57 0.01 0.02 0.01 0.01 0.53 0.53 0.05 0.0001 0.54 1 0.01 0.01 0.05 0.07 0.09 0.05 Top-600 0.0001 0.02 0.55 0.0001 0.07 0.54 0.54 0.57 0.0001 0.55 0.0001 0.0001 0.0001 0.0001 0.52 0.52 0.0001 0.0001 0.54 0.55 0.0001 0.55 0.02 0.54 0.09 0.09 0.03 0.0001 0.0001 0.0001 0.0001 0.57 0.59 0.0001 0.0001 0.53 0.53 1 0.0001 0.0001 0.0001 0.01 0.55 0.0001 0.58 0.0001 0.01 0.0001 0.0001 0.01 0.0001 0.0001 0.0001 0.01 1 0.0001 0.0001 0.0001 0.0001 0.05 0.02 Table 3: The codon preferences for sets of the cell-cycle-regulated genes of Arabidopsis thaliana: 61 cell cycle regulated genes in the B1 set, 176 genes in the B2 set (Jensen et al, 2006b). The top-400 set was excluded from the analysis because of the poor quality of the DNA microarray expression data, which was inconsistent with other sets. Codons characterized by the positive consistent bias in two sets are presented in bold. These codons correspond to non-optimal codons of amino acids encoded by two codons (His, Cys, Phe, Asn, Asp, Tyr) and for those using wobble inosine tRNA modifications (namely, Ala, Arg, Ile, Pro, Ser, Thr, Val: P-values were calculated using the bootstrap sampling with the same CAI distribution as that of cell cycle regulated genes). Preferences P-values Aa Codon 5'->3' B1 B2 B1 B2 Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT 0.01 -0.03 -0.01 0.03 0.05 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 0.01 -0.02 0.02 0.06 -0.06 0.01 -0.01 0.01 -0.01 0.05 -0.01 0 -0.04 -0.02 0.02 -0.02 0.04 -0.02 0.01 0.01 0 -0.01 -0.02 0.01 0.03 -0.03 0 -0.02 0.02 0.05 -0.02 -0.02 -0.01 0.01 -0.03 -0.01 -0.01 0 0.04 0.01 0.03 -0.02 -0.03 0.02 0.01 -0.01 0.01 0 -0.01 0 -0.06 0.06 -0.03 0.03 0 0 0.04 -0.04 -0.03 0.03 -0.01 0.01 -0.01 0.01 -0.03 0.03 -0.05 0.03 0.02 0 0 0 -0.03 0.01 0.02 0.01 -0.01 0 -0.05 0.05 0.01 0.01 -0.03 0.01 -0.02 -0.01 0.02 0.01 -0.01 0.01 0.04 0.25 0.01 0.22 0.03 0.01 0.22 0.24 0.11 0.14 0.16 0.01 0.01 0.02 0.02 0.01 0.01 0.33 0.33 0.28 0.28 0.0001 0.2 0.5 0.0001 0.01 0.01 0.14 0.01 0.1 0.18 0.13 0.52 0.21 0.03 0.13 0.04 0.04 1 0.01 0.01 0.02 0.03 0.12 0.29 0.14 0.0001 0.18 0.17 0.52 0.0001 0.27 0.0001 0.0001 0.0001 0.15 0.52 0.09 0.11 0.54 0.03 0.53 0.0001 0.0001 0.0001 0.0001 0.51 0.51 0.0001 0.0001 0.0001 0.0001 0.13 0.08 0.13 0.53 0.02 0.0001 0.0001 0.0001 0.13 0.53 0.54 0.54 0.0001 0.06 0.1 0.16 0.16 1 0.0001 0.0001 0.01 0.01 0.0001 0.19 0.0001 0.03 0.0001 0.05 0.03 0.01 0.0001 4 Aa Codon 5'->3' Preferences P-values Aa Codon 5'->3' Thr Thr Thr Trp Tyr Tyr Val Val Val Val ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG -0.04 0.04 -0.01 0 -0.06 0.06 -0.04 0 0.04 0 -0.03 0.02 -0.03 0 0.04 -0.04 -0.01 0.01 0.01 -0.01 0.25 0.01 0.25 1 0.02 0.02 0.1 0.51 0.0001 0.51 0.52 0.02 0.0001 1 0.0001 0.0001 0.09 0.06 0.01 0.1 5 Table 4: The codon preferences for sets of cell cycle regulated genes in Homo sapiens: 63 cell cycle regulated genes in the B1 set, 438 genes in the B2 set, and the 600 most significantly oscillating genes in the top-600 set were analyzed (Jensen et al, 2006b). The GC content distribution was preserved in the bootstrap sampling to calculate the p-values. Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG B1 0.04 -0.1 -0.01 0.07 0.07 -0.02 0 -0.01 -0.06 0.02 -0.13 0.13 -0.1 0.1 -0.15 0.15 0.1 -0.1 0.13 -0.13 0.03 -0.04 -0.05 0.05 -0.14 0.14 0.05 -0.12 0.06 0.02 -0.05 -0.1 0.03 0.06 0.04 0.04 -0.04 0 -0.13 0.13 0.07 -0.1 -0.02 0.05 -0.05 0.03 0.03 -0.05 -0.02 0.07 0.14 -0.04 -0.1 0.02 -0.05 -0.05 0.09 0 -0.08 0.08 0.07 Preferences B2 0.04 -0.07 -0.03 0.05 0.06 -0.02 0.03 -0.04 -0.04 0.02 -0.11 0.11 -0.1 0.1 -0.12 0.12 0.06 -0.06 0.1 -0.1 0.05 -0.06 -0.03 0.04 -0.13 0.13 0.05 -0.12 0.08 0.02 -0.04 -0.08 0.04 0.04 0.03 0.09 -0.09 0 -0.1 0.1 0.04 -0.06 -0.03 0.05 -0.05 0.04 0.03 -0.04 -0.01 0.04 0.03 -0.04 0 0.04 -0.07 -0.03 0.07 0 -0.1 0.1 0.05 Top-600 0.03 -0.04 -0.02 0.03 0.05 -0.01 0.02 -0.03 -0.03 0.01 -0.08 0.08 -0.07 0.07 -0.04 0.04 0.05 -0.05 0.08 -0.08 0.03 -0.04 -0.03 0.03 -0.07 0.07 0.04 -0.08 0.04 0.01 -0.03 -0.06 0.03 0.03 0.02 0.06 -0.06 0 -0.07 0.07 0.04 -0.06 -0.02 0.04 -0.03 0.03 0.02 -0.03 -0.01 0.03 0.04 0.0001 -0.04 0.02 -0.05 -0.02 0.04 0 -0.06 0.06 0.04 6 B1 0.0527 0.0001 0.809 0.0004 0.015 0.0264 0.727 0.8357 0.0011 0.016 0.0001 0.0001 0.0016 0.0006 0.0001 0.0001 0.0002 0.0001 0.0011 0.0015 0.3096 0.1505 0.0043 0.0096 0.0001 0.0001 0.0045 0.0002 0.0092 0.0092 0.0001 0.0001 0.0338 0.0001 0.0001 0.2291 0.2455 1 0.0001 0.0001 0.0048 0.0003 0.4117 0.0132 0.002 0.0218 0.0356 0.0004 0.0052 0.0001 0.0117 0.2354 0.0613 0.3576 0.0391 0.001 0.0001 1 0.0102 0.0101 0.0001 P-values B2 0.1704 0.0007 0.0104 0.0022 0.4612 0.0001 0.0001 0.4426 0.2556 0.0001 0.0001 0.0001 0.0002 0.0002 0.0001 0.0001 0.3992 0.4106 0.0257 0.0244 0.1902 0.0048 0.097 0.0013 0.0001 0.0001 0.0048 0.0001 0.0001 0.0007 0.0001 0.0092 0.0003 0.0063 0.0001 0.0013 0.0011 1 0.0001 0.0001 0.1377 0.0098 0.0336 0.0007 0.0003 0.0001 0.0615 0.0005 0.6396 0.0001 0.7439 0.0832 0.0908 0.1263 0.0004 0.0284 0.0001 1 0.0001 0.0001 0.0001 Top-600 0.075 0.1164 0.0194 0.0471 0.0666 0.0106 0.0001 0.1833 0.1305 0.0101 0.0001 0.0001 0.0023 0.0022 0.5953 0.6024 0.082 0.0835 0.0064 0.0048 0.4594 0.0267 0.0039 0.001 0.0104 0.0096 0.0012 0.0023 0.0812 0.2095 0.0001 0.0071 0.0011 0.0039 0.0057 0.0291 0.0275 1 0.0008 0.0007 0.0051 0.0001 0.0498 0.0002 0.0489 0.0001 0.1758 0.0009 0.0956 0.0005 0.3215 0.3323 0.2014 0.7048 0.0048 0.0856 0.004 1 0.0221 0.0244 0.0001 Table 5: The codon preferences for sets of cell cycle regulated genes in Schizosaccharomyces pombe: 40 cell cycle regulated genes in the B1 set, 188 genes in the B2 set, and the 500 most significantly oscillating genes in the top-500 set were analyzed (Jensen et al, 2006b). The GC content distribution was preserved in the bootstrap sampling to calculate the p-values. Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG B1 -0.02 0.01 0 0.01 -0.04 0 0.01 0.01 0 0.02 0.01 -0.01 0 0 -0.05 0.05 -0.01 0.01 0.01 -0.01 -0.02 0.01 0 0.01 0.01 -0.01 -0.03 0.01 0.02 -0.02 0.01 0 -0.01 0 0.02 -0.02 0.02 0 0.01 -0.01 0.01 0.01 -0.01 -0.01 0.01 0 -0.01 -0.01 0 0.01 0 -0.01 -0.01 0.02 0 -0.03 0.03 -0.02 -0.01 -0.01 0.04 Preferences B2 -0.01 0.02 0 -0.01 -0.01 -0.02 -0.02 0.01 0.01 0.03 0.02 -0.02 0 0 0 0 0 0 -0.03 0.03 -0.03 0.01 -0.01 0.03 0.04 -0.04 -0.02 0.03 -0.01 -0.02 0.01 0 0.01 -0.01 0.01 -0.02 0.02 0 0.02 -0.02 -0.01 0.02 -0.02 0.01 0 -0.01 -0.02 0.02 0 0.01 -0.01 0.03 -0.01 -0.01 0 0.01 -0.01 -0.01 0.02 0 -0.01 Top-500 -0.02 0.01 -0.01 0.02 -0.02 -0.01 -0.01 0.02 -0.01 0.03 0.04 -0.04 0.01 -0.01 0 0 0.01 -0.01 -0.02 0.02 -0.03 0.01 -0.02 0.04 0 0 -0.03 0.01 0.02 -0.01 0.01 0 0.01 -0.02 0.01 -0.01 0.01 0 0.04 -0.04 0.01 0.01 -0.01 -0.01 0 0.01 -0.02 0.01 0 0 -0.01 0.01 -0.01 0.01 0 0.02 -0.02 -0.01 0.01 0 0 7 B1 0.6316 0.7494 0.179 0.6073 0.2313 0.0632 0.0242 0.6517 0.2436 0.9277 0.7878 0.7833 0.7679 0.2583 0.0125 0.0096 0.2501 0.2493 0.0147 0.0144 0.6574 0.3953 0.0733 0.8599 0.5895 0.5897 0.757 0.8638 0.4069 0.0811 0.3533 0.382 0.021 0.0134 0.421 0.9296 0.9272 1 0.8409 0.8339 0.022 0.7195 0.4835 0.0868 0.2397 0.3179 0.7458 0.0133 0.3912 0.4856 0.0503 0.0249 0.3604 0.3611 1 0.0085 0.0063 0.455 0.0182 0.4457 0.0335 P-values B2 0.7231 0.1097 0.1672 0.0217 0.5691 0.0076 0.0659 0.2516 0.0006 0.3318 0.2377 0.2475 0.8491 0.1858 0.7252 0.2946 0.694 0.3507 0.0131 0.012 0.0794 0.1447 0.3866 0.2426 0.0134 0.0178 0.234 0.0054 0.0803 0.0001 0.0196 0.3658 0.1753 0.4817 0.2968 0.6081 0.6158 1 0.2681 0.2579 0.6735 0.0796 0.0038 0.437 0.6051 0.132 0.0092 0.0019 0.4407 0.2375 0.6753 0.0125 0.1964 0.0778 1 0.5592 0.5625 0.5166 0.1312 0.2265 0.0686 Top-500 0.372 0.7791 0.0904 0.0642 0.4486 0.7824 0.76 0.0072 0.0224 0.8349 0.0007 0.0005 0.3678 0.3651 0.7853 0.2442 0.0847 0.0879 0.2173 0.2075 0.065 0.1933 0.0074 0.041 0.9573 0.051 0.2906 0.9313 0.0518 0.0748 0.0146 0.2546 0.3646 0.1286 0.5439 0.9999 0.9997 1 0.0023 0.0017 0.0006 0.7913 0.2516 0.0025 0.8208 0.0479 0.006 0.198 0.3446 0.9103 0.9256 0.8532 0.1278 0.4456 1 0.0968 0.1008 0.6921 0.7683 0.0985 0.8081 Table 6: The codon preferences for sets of cell cycle regulated genes in Saccharomyces cerevisiae: 113 cell cycle regulated genes in the B1 set, 352 genes in the B2 set, and the 600 most significantly oscillating genes in the top-600 set were studied (Jensen et al, 2006b). The GC content distribution was preserved in the bootstrap sampling to calculate the p-values calculation. Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG B1 -0.02 0 -0.02 0.04 0.02 -0.01 -0.01 0.01 -0.02 0.01 -0.01 0.01 -0.03 0.03 -0.04 0.04 0.02 -0.02 0.03 -0.03 -0.03 -0.02 -0.01 0.06 -0.04 0.04 -0.01 -0.01 0.02 0.02 -0.01 -0.01 -0.01 0.02 -0.01 0.01 -0.01 0 0.02 -0.02 0.04 0.01 -0.01 -0.04 -0.02 -0.01 0 0 -0.01 0.03 0.01 -0.02 -0.02 0.03 0 0 0 0 0 -0.01 0.01 Preferences B2 -0.01 0.01 0 0 -0.01 0.01 0 0.01 0 -0.01 0.02 -0.02 0.02 -0.02 0.03 -0.03 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 0 0.01 -0.01 0 0 0.01 0 -0.01 0 0 0 0 0.01 -0.01 0.01 -0.01 0.01 -0.02 0.01 -0.01 -0.01 0.01 0 0 -0.02 0.02 0 0 0 0.02 -0.02 -0.01 0.01 0.01 -0.01 Top-600 -0.03 0.01 0 0.02 -0.01 -0.01 0 0.01 0.01 0 0.03 -0.03 0.02 -0.02 0 0 -0.02 0.02 0 0 -0.03 0 0 0.03 -0.01 0.01 -0.03 0.01 0.02 0 0 0.01 -0.01 -0.02 0.02 0 0 0 0.03 -0.03 0.02 0.01 -0.01 -0.02 0.01 -0.01 -0.01 0.01 0 0.01 -0.04 0.02 -0.01 0.03 0 0.04 -0.04 -0.02 0.02 0.01 -0.01 8 B1 0.0537 0.4895 0.0039 0.0038 0.0663 0.1901 0.0681 0.0255 0 0.1914 0.1307 0.1326 0.0008 0.0007 0.0317 0.0306 0.0511 0.0483 0.001 0.0005 0.0058 0.016 0.1177 0.0022 0.0034 0.0031 0.3001 0.0801 0.04 0.0001 0.0059 0.0304 0.0421 0.0032 0.1332 0.1268 0.1208 1 0.0628 0.0584 0.0052 0.1238 0.1133 0.0002 0.0002 0.0912 0.4554 0.6053 0.0251 0.0021 0.1557 0.006 0.0117 0.0062 1 0.5929 0.4449 0.4714 0.5678 0.1653 0.224 P-values B2 0.6911 0.4651 0.394 0.7583 0.1495 0.0203 0.2864 0.022 0.6645 0.0152 0.4143 0.4046 0.265 0.2658 0.0167 0.0176 0.3049 0.3042 0.2508 0.2561 0.6334 0.1417 0.0346 0.5185 0.005 0.0039 0.0478 0.8303 0.1334 0.4506 0.7636 0.0862 0.3341 0.6879 0.9336 0.0038 0.998 1 0.7429 0.7441 0.2361 0.0136 0.0186 0.0084 0.0207 0.103 0.4314 0.2443 0.71 0.7545 0.1129 0.0667 0.5276 0.5613 1 0.5611 0.5524 0.538 0.5708 0.1643 0.2226 Top-600 0.0001 0.2998 0.5848 0.0062 0.1889 0.1001 0.3591 0.0095 0.0004 0.7963 0.0045 0.0034 0.1109 0.1081 0.6374 0.3994 0.0309 0.0306 0.1333 0.9044 0.0009 0.8372 0.5272 0.0093 0.001 0.0011 0.0016 0.803 0 0.4183 0.8485 0.0575 0.0026 0.0167 0.0078 0.0022 0.9993 1 0.0045 0.0037 0.005 0.0624 0.0056 0.0007 0.0064 0.0356 0.2991 0.1216 0.7872 0.0458 0 0.0152 0.0128 0 1 0.0006 0.0007 0.0082 0.0047 0.1353 0.2169 Table 7: The codon preferences for sets of cell cycle regulated genes in Arabidopsis thaliana: 61 cell cycle regulated genes in the B1 set, 176 genes in the B2 set (Jensen et al, 2006b). The GC content distribution was preserved in the bootstrap sampling to calculate the p-values. Aa Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Codon 5'->3' GCA GCC GCG GCT AGA CGT AGG CGC CGG CGA AAC AAT GAC GAT TGT TGC CAG CAA GAG GAA GGA GGG GGT GGC CAC CAT ATC ATT ATA CTT TTA CTA CTC TTG CTG AAG AAA ATG TTC TTT CCA CCC CCG CCT TCG TCC TCA AGT AGC TCT ACT Preferences B1 B2 0.01 -0.03 -0.01 0.03 0.05 -0.01 -0.01 -0.01 -0.01 -0.01 -0.01 0.01 -0.02 0.02 -0.06 0.06 -0.01 0.01 -0.01 0.01 0.05 -0.04 -0.01 0 -0.02 0.02 -0.02 -0.02 0.04 0 -0.01 0.01 0.01 0.01 -0.02 -0.03 0.03 0 -0.02 0.02 0.05 -0.02 -0.02 -0.01 0 -0.01 -0.01 -0.03 0.01 0.04 0.04 0.03 -0.02 -0.03 0.02 0.01 0.01 0 0 -0.01 -0.01 -0.06 0.06 -0.03 0.03 0 0 -0.04 0.04 0.03 -0.03 -0.01 0.01 0.01 -0.01 -0.03 0.03 0.02 -0.05 0.03 0 -0.03 0.02 0 0 0.01 -0.01 0.01 0 -0.05 0.05 0.01 0.01 -0.03 0.01 -0.01 0.01 0.02 -0.01 -0.02 0.01 0.02 9 P-values B1 0.5811 0.027 0.4532 0.0497 0.0095 0.1914 0.1339 0.2345 0.252 0.4042 0.6493 0.6421 0.3741 0.3775 0.012 0.012 0.1606 0.162 0.0631 0.0619 0.0003 0.0001 0.1053 0.5212 0.4925 0.5024 0.0117 0.3698 0.009 0.613 0.5832 0.0906 0.1899 0.2466 0.0041 0.0073 0.0067 1 0.4079 0.4177 0.0088 0.0313 0.3042 0.1197 0.4054 0.0299 0.0609 0.0014 0.0366 0.0003 0.0287 B2 0.0038 0.0266 0.0001 0.0411 0.4378 0.126 0.5163 0.2848 0.0495 0.2432 0.0001 0.0001 0.0081 0.0076 0.3774 0.6434 0.0002 0.0001 0.0023 0.0025 0.0916 0.0076 0.1375 0.1134 0.0843 0.0833 0.0312 0.0001 0.0024 0.6046 0.0001 0.0003 0.6876 0.6687 0.0903 0.0875 0.0892 1 0.0003 0.0002 0.2659 0.0247 0.0094 0.4257 0.028 0.1365 0.0034 0.1903 0.0003 0.1132 0.1158 Aa Thr Thr Thr Trp Tyr Tyr Val Val Val Val Codon 5'->3' ACC ACA ACG TGG TAT TAC GTC GTA GTT GTG Preferences P-values -0.04 -0.01 0.01 0 0.06 -0.06 0 -0.04 0 0.04 10 -0.03 -0.03 0.04 0 -0.04 0.04 0.01 -0.01 -0.01 0.01 Aa 0.0039 0.5099 0.4218 1 0.0005 0.0002 0.7828 0.0011 0.3464 0.0055 Codon 5'->3' 0.0004 0.0003 0.0001 1 0.0162 0.0161 0.1722 0.3 0.1812 0.2979 Table 8: The codon preferences for the cell cycle regulated genes in Homo Sapiens and Saccharomyces cerevisiae expressed in G1 phase of the cell cycle, the top-600 genes are used for the analysis (Jensen et al, 2006b). The first 40 codons as well as the full sequences of these genes adopt mostly optimal codons. Aa Codon 5'->3' Ala Ala Ala Ala Arg Arg Arg Arg Arg Arg Asn Asn Asp Asp Cys Cys Gln Gln Glu Glu Gly Gly Gly Gly His His Ile Ile Ile Leu Leu Leu Leu Leu Leu Lys Lys Met Phe Phe Pro Pro Pro Pro Ser Ser Ser Ser Ser Ser Thr Thr Thr Thr Trp Tyr Tyr Val Val Val Val GCA GCC GCG GCT AGA AGG CGA CGC CGG CGT AAC AAT GAC GAT TGC TGT CAA CAG GAA GAG GGA GGC GGG GGT CAC CAT ATA ATC ATT CTA CTC CTG CTT TTA TTG AAA AAG ATG TTC TTT CCA CCC CCG CCT AGC AGT TCA TCC TCG TCT ACA ACC ACG ACT TGG TAC TAT GTA GTC GTG GTT Human G1 phase 40 first genes codons -0.002 -0.088 -0.004 0.014 -0.012 0.146 0.017 -0.073 0 -0.082 0.002 -0.022 0.01 -0.013 0 0.12 -0.014 0.017 0.002 -0.021 -0.009 0.123 0.009 -0.123 -0.004 0.078 0.004 -0.078 0.001 0.205 -0.001 -0.205 -0.021 -0.054 0.021 0.054 0.019 -0.131 -0.019 0.131 -0.014 -0.082 0.011 0.128 -0.019 -0.006 0.021 -0.041 0.009 0.094 -0.009 -0.094 0.019 -0.002 -0.023 0.057 0.004 -0.055 0.002 -0.001 -0.011 0.05 -0.006 -0.037 -0.001 -0.019 0.008 -0.019 0.007 0.026 0.002 -0.102 -0.002 0.102 0 0 -0.036 0.166 0.036 -0.166 -0.001 -0.095 -0.014 0.003 0.003 0.205 0.011 -0.113 -0.009 0.016 0.001 -0.051 -0.009 -0.078 0.001 0.044 0.007 0.098 0.008 -0.029 -0.005 -0.128 0.008 0.082 -0.006 0.08 0.004 -0.034 0 0 -0.011 0.09 0.011 -0.09 0.018 -0.017 -0.015 0.051 -0.007 0.037 0.002 -0.072 11 Yeast G1 phase 40 first genes codons -0.015 -0.028 0.005 0.022 -0.001 0.021 0.012 -0.014 -0.002 0.023 -0.008 -0.046 -0.02 0.007 0.01 0.037 -0.009 -0.012 0.031 -0.007 0.013 0.045 -0.013 -0.045 0.003 0.003 -0.003 -0.003 -0.023 0.009 0.023 -0.009 0.005 -0.071 -0.005 0.071 0.003 -0.034 -0.003 0.034 -0.048 -0.051 0.005 0.04 -0.007 0.008 0.05 0.003 -0.006 0.022 0.006 -0.022 -0.049 -0.032 0.023 0.014 0.026 0.017 0.009 0.032 -0.003 0.06 -0.005 -0.016 -0.006 0.022 -0.015 -0.038 0.021 -0.059 -0.014 -0.03 0.014 0.03 0 0 0.054 0.082 -0.054 -0.082 0.024 0.029 0.009 0.044 -0.008 0.006 -0.026 -0.079 -0.006 0.016 -0.011 -0.02 -0.007 0.016 0.014 0.014 -0.009 0.006 0.019 -0.032 -0.024 -0.083 0.004 0.016 -0.011 0.029 0.03 0.038 0 0 0.024 0.028 -0.024 -0.028 -0.03 0.001 0.014 0.013 0.003 0.014 0.013 -0.027 Table 9: The codon-bias scores for two sets of proteins: cell cycle and non-cell cycle regulated according to protein dynamics: 17 genes are from a previous study (Sigal et al, 2006). The genes depicted in blue were produced for the current study. Gene Name Classification* CCCS score** DDX5 CCR 0.022 USP7 CCR 0.011 TOP1 CCR 0.002 ANP32B CCR 0.011 H2AFV CCR 0.008 GTF2F2 CCR 0.02 RBBP7 CCR 0.014 SFRS10 CCR 0.012 GARS CCR 0.007 TARS CCR 0.012 EPRS CCR 0.021 SAE1 NCCR -0.002 SET NCCR 0.02 HMGA2 NCCR -0.012 YPEL1 NCCR -0.024 DDX46 NCCR 0.018 LMNA NCCR -0.03 HMGA1 NCCR -0.018 ZNF433 NCCR -0.013 KIAA1937 NCCR -0.012 WARS NCCR -0.015 GAPDH NCCR -0.026 * CCR represents the protein with cell-cycle-regulated protein dynamics; NCCR is non-cell-cycleregulated genes. ** CCCS is the Cell-Cycle Codon score. 12 Table 10: Comparison of the non-optimal codon abundance for human cell cycle regulated genes from B1 set with that for the non-cell cycle regulated paralogs. The criterion for finding the appropriate cases was to use only well-known cell cycle genes with clearly cycling mRNA for which obvious paralogs with clearly noncycling mRNA levels exist. Cell Cycle Percentage of Non- Non-Cell Cycle Percentage of Non- Regulated Gene Optimal Codons Regulated Paralogs Optimal Codons STAG1 60% STAG3 42% ANP32E 60% ANP32B 46% EZH2 57% KIAA0388 33% E2F5 53% E2F4 31% 13 Supplementary Figures Figure 1: Cell cycle dynamics of six proteins during two generations. A. Glutamylprolyl-tRNA synthetase (EPRS). B. Threonyl-tRNA synthetase (TARS). C. Glycyl-tRNA synthetase (GARS). D. Arginine and glutamate-rich protein 1. E. Tryptophanyl-tRNA synthetase (WARS). F. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH). A-D. Proteins with cell cycle dependent protein dynamics. E-F. Non-cell cycle dependent proteins. 14 15 Figure 2: The dynamics of ARGLU1 mRNA expression is very similar to its protein dynamics found using the time-lapse microscopy of the YFP-tagged clone. A. ARGLU1 mRNA expression as seen in two different experiments in Whitfield et al (Whitfield et al, 2002). ARGLU1 expression rises during the G2 phase and likewise, B. ARGLU1 protein dynamics peaks in G2. 16 Supplementary Results To test if the observed codon preferences were associated with a particular cell cycle phase, we partitioned the top-600 cycling genes according to their time of peak expression and searched for codon preferences that distinguished cycling genes with different peak times. The strongest codon preference was observed when comparing genes expressed during S, G2 and M phase to genes expressed during G1 phase (less than 13-20% of genes from the top-600 group). It should be noted that the late G1 phase corresponds to 20% to 40% of the cell cycle in yeast (123 genes), whereas in humans the G1 phase ranges from 5% to 40% of the cell cycle (78 genes). Notably, this comparison yielded almost exactly the same codon preferences as when cell cycle regulated and noncell cycle regulated genes were compared. To ensure that the observed bias is linked to cell cycle regulation and not cell cycle function in general, we analyzed the codon preference of non-cycling genes with a cell cycle phenotype at the mRNA level (Mukherji et al, 2006) and that of non-cycling genes with cell cycle regulated orthologs (Jensen et al, 2006a). Neither of these gene sets identified the preference for non-optimal codon usage observed for cell cycle regulated genes. That fact strongly suggests that the observed codon preference is specifically related to cyclic expression and not to cell growth or cell proliferation in general. The charged tRNA levels presumably peak in G2/M or M phase. Thus, codon preference maximally affects a phase of the cell cycle when a very large number of proteins are produced. Codon preference may not serve to ensure that genes are expressed at a specific time point but rather, it may primarily be a mechanism to ensure that they are not expressed inappropriately. In other words, to ensure that genes needed for cell cycle progression are not accidentally expressed in starving cells resting in G1 (or G0) phase. 17 Supplementary Methods 1. Construction of the tagged protein library A library of fluorescently tagged proteins was constructed in a non-small cell lung carcinoma cell line (H1299) by a two stage process. In both stages, a fluorescent reporter was integrated into the genome via Central Dogma tagging (CD-tagging: Jarvik et al, 1996; Sigal et al, 2007). In the first stage a parental clone was produced in which the nucleus is colored brighter than the cytoplasm and the cytoplasm is colored brighter than the background image. For this purpose, a red fluorescent protein (mCherry) was introduced in two rounds of CD-tagging. In the first round, the H7a clone expressing the tagged XRCC5 protein localized to the nucleus was selected. In the second round (carried out on the previously selected clone H7a), clone H7 expressing tagged DAP1 localized to the whole intracellular domain was selected. Following these two steps, a parental clone was obtained that expresses two mCherry endogenously tagged proteins (XRCC5 and DAP1), one staining the cytoplasm and the other more intensely staining the nucleus. We found that the cell-cell variability of red fluorescence in this clone was much less than that among the clones generated by transfection of mCherry, which was crucial for reliable image analysis of the cell videos. The second stage in the generation of the library was to use CD-tagging in order to tag different proteins in the parental H7 clone (H1299-cherry) with eYFP or Venus (Sigal et al, 2007: the CD-tagging protocol used is described in detail by (Sigal et al. 2007). Briefly, a fluorescent protein (FP), flanked by splice acceptor and donor sequences was integrated into the genome as an artificial exon via retroviral vectors (U5000, U5001, U5002), each containing FP in one of 3 reading frames. Cells positive for relevant FP fluorescence were sorted into 384 well plates by flow cytometry and expanded into cell clones. Tagged protein identities were determined by 3’RACE, using a nested PCR reaction that amplified the section between the FP and the host gene’s polyA mRNA tail. The PCR product was sequenced directly and aligned to the genome. Our library of CDtagged proteins is detailed at www.dynamicproteomics.net. 18 2. Long period time-lapse microscopy Time-lapse movies were obtained (at 20x magnification) as described elsewhere (Sigal et al. 2007) with an automated, humidity and CO2 controlled Leica DMIRE2 inverted fluorescence microscope and an ORCA ER cooled CCD camera (Hamamatsu Photonics). The system was controlled by ImagePro5 Plus (Media Cybernetics) software which integrated time-lapse acquisition, stage movement and software based auto-focus. During the experiment, the cells were grown and visualized in 12-well coverslip bottom plates (MatTek) coated with 10μM fibronectin (Sigma). In each well (15 or more cells), time lapse movies were obtained from four fields and each movie was taken at a time resolution of 20 minutes, filmed for at least three days (over 200 time points). Each time point included three images: phase contrast, red and yellow fluorescence. References Jarvik JW, Adler SA, Telmer CA, Subramaniam V, Lopez AJ (1996) CD-tagging: a new approach to gene and protein discovery and analysis. Biotechniques 20: 896-904 Jensen L, Jensen T, de Lichtenberg U, Brunak S, Bork P (2006a) Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443: 594-597 Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P (2006b) Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443: 594-597 Mukherji M, Bell R, Supekova L, Wang Y, Orth A, Batalov S, Miraglia L, Huesken D, Lange J, Martin C, Sahasrabudhe S, Reinhardt M, Natt F, Hall J, Mickanin C, Labow M, Chanda S, Cho C, Schultz P (2006) Genome-wide functional analysis of human cellcycle regulators. Proc Natl Acad Sci USA 103: 14819-14824 Sigal A, Danon T, Cohen A, Milo R, Geva-Zatorsky N, Lustig G, Liron Y, Alon U, Perzov N (2007) Generation of a fluorescently labeled endogenous protein library in living human cells. Nat Protoc 2: 1515-1527 Sigal A, Milo R ,Cohen A, Geva-Zatorsky N, Klein Y, Alaluf I, Swerdlin N, Perzov N, Danon T, Liron Y, Raveh T, Carpenter AE, Lahav G, Alon U (2006) Dynamic proteomics in individual human cells uncovers widespread cell-cycle dependence of nuclear proteins. Nat Methods 3: 525-531 Whitfield M, Sherlock G, Saldanha A, Murray J, Ball C, Alexander K, Matese J, Perou C, Hurt M, Brown P, Botstein D (2002) Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13: 1977-2000. 19
© Copyright 2026 Paperzz