Motif discovery for predicting regulatory elements

Prediction of Regulatory
Elements Controlling
Gene Expression
Martin Tompa
Computer Science & Engineering
Genome Sciences
University of Washington
Seattle, Washington, U.S.A.
1
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
2
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
3
DNA, Genes, and Proteins
DNA
Gene
Protein
DNA: program for cell processes
Proteins: execute cell processes
4
Regulation of Genes
• What turns genes on (producing a protein)
and off?
• When is a gene turned on or off?
• Where (in which cells) is a gene turned on?
• At what rate is the gene product produced?
5
Regulation of Genes
Transcription Factor
(Protein)
RNA
polymerase
(Protein)
DNA
Regulatory Element
Gene
6
Regulation of Genes
Transcription Factor
(Protein)
RNA
polymerase
(Protein)
DNA
Regulatory Element
Gene
7
Regulation of Genes
Transcription Factor
(Protein)
New protein
RNA
polymerase
(Protein)
DNA
Regulatory Element
Gene
8
Goal
Identify regulatory elements in DNA
sequences. These are:
• Binding sites for proteins
• Short sequences (5-25 nucleotides)
• Up to 1000 nucleotides (or farther) from
gene
• Inexactly repeating patterns (“motifs”)
9
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
10
2 Types of Motif Discovery
1. Motif discovery by overrepresentation
• One species
• Multiple (co-regulated) genes
2. Motif discovery by phylogenetic
footprinting
• Multiple species
• One gene
11
Overrepresentation:
Daf-19 Binding Sites in C. elegans
-150
GTTGTCATGGTGAC
GTTTCCATGGAAAC
GCTACCATGGCAAC
GTTACCATAGTAAC
GTTTCCATGGTAAC
che-2
daf-19
osm-1
osm-6
F02D8.3
-1
12
Phylogenetic Footprinting:
Regulatory Element of Growth Hormone Gene
AGGGGATA
AGGGTATA
AGGGTATA
AGGGTATA
AGGGTATA
Chicken
Rat
Human
Dog
-200
-1
Sheep
13
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
14
MEME
• (Multiple EM for Motif Elicitation)
Bailey & Elkan, 1995
• Very general iterative method based on
Expectation Maximization
• Available at
meme.sdsc.edu/meme/website/intro.html
15
Overrepresented Motifs
• Given sequences X = {X1, X2, …, Xn},
find statistically overrepresented
motifs of length k
• For simplicity, assume
– Exactly one motif instance per sequence
– Sequences over DNA alphabet
16
Hidden Information
• Z = {Zij}, where
1, if motif instance starts at
Zij =
position j of Xi
0, otherwise
• Iterate over probabilistic models that
could generate X and Z, trying to
converge on this solution
{
17
Model Parameters
• Motif profile: 4×k matrix θ = (θrp),
 r  {A,C,G,T}
 1pk
 θrp = Pr(residue r in position p of motif)
• Background distribution:
 θr0 = Pr(residue r in random nonmotif
position)
18
Profile Example
GTTGTC
GTTTCC
GCTACC
GTTACC
GTTTCC
0
0
1
0
0
.2
0
.8
0
0
0
1
.4
0
.2
.4
0
.8
0
.2
0
1
0
0
profile θ
19
Overview:
Expectation Maximization
• Goal: Find profile θ and motif positions Z
that have maximum likelihood
• At each iteration:
– E-step: From θ predict likely motif
positions Z
– M-step: From sequences at positions Z
compute new profile θ
20
Expectation Maximization
• Goal: Find θ, Z that maximize Pr (X, Z | θ)
• At iteration t:
– E-step: Z(t) = E (Z | X, θ(t))
– M-step: Find θ(t+1) that maximizes
Pr (X, Z(t) | θ(t+1))
21
E-step Details
Zij(t) =
Pr(Xi | Zij=1, θ(t))
Σj Pr(Xi | Zij=1, θ(t))
Xi
j
Use θ1(t), θ2(t), …, θk(t)
Use θ0(t)
22
M-step Details
• If Zij(t)  {0,1} it would be straightforward:
Calculate profile θ1, θ2, …, θk from motif
instances and θr0 from frequency of r
outside of motif instances.
• But Zij(t)  [0,1], so weight these
frequencies by the appropriate values of
Zij(t) .
23
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
24
Gibbs Sampler
• Lawrence et al., 1993
• Very general iterative method, related
to Markov Chain Monte Carlo (MCMC)
• Available at
bayesweb.wadsworth.org/gibbs/gibbs.html
25
One Iteration of Gibbs Sampler
• n motif instances each of
length k
GGGTCACGGGGTGGGAGCTGAGAAGGGGTGGAG
CACGGGGGAGCCTGGAGGGGATCCGGAGGGGTG
GGCCGTGGGGAACCTGGGGGGAGCTGGGCTCAG
GGAGCGTGGAGGTGGGGTGGGAGCTGAGGGTGG
GGCTGGGGTGGCGGTGGGAGCCCAGGACGTTG
26
One Iteration of Gibbs Sampler
• n motif instances each of
length k
• Remove one at random
• Form profile of remaining n-1
• Let pi be the probability with
i
which g[i .. i+k-1] fits profile
GGGTCACGGGGTGGGAGCTGAGAAGGGGTGGAG
CACGGGGGAGCCTGGAGGGGATCCGGAGGGGTG
GGCCGTGGGGAACCTGGGGGGAGCTGGGCTCAG
GGAGCGTGGAGGTGGGGTGGGAGCTGAGGGTGG
GGCTGGGGTGGCGGTGGGAGCCCAGGACGTTG
27
One Iteration of Gibbs Sampler
• n motif instances each of
length k
• Remove one at random
• Form profile of remaining n-1
• Let pi be the probability with
i
which g[i .. i+k-1] fits profile
• Choose to start replacement
at i with probability
proportional to pi
GGGTCACGGGGTGGGAGCTGAGAAGGGGTGGAG
CACGGGGGAGCCTGGAGGGGATCCGGAGGGGTG
GGCCGTGGGGAACCTGGGGGGAGCTGGGCTCAG
GGAGCGTGGAGGTGGGGTGGGAGCTGAGGGTGG
GGCTGGGGTGGCGGTGGGAGCCCAGGACGTTG
28
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
29
FootPrinter
• Blanchette & Tompa, 2002
• First algorithm explicitly designed for
phylogenetic footprinting
• Available at
bio.cs.washington.edu/software.html
30
Phylogenetic Footprinting
(Tagle et al. 1988)
Functional regions of DNA evolve slower
than nonfunctional ones.
31
Phylogenetic Footprinting
(Tagle et al. 1988)
Functional regions of DNA evolve slower
than nonfunctional ones.
• Consider a set of orthologous (i.e.,
corresponding) sequences from different
species
• Identify unusually well conserved substrings
(i.e., ones that have not changed much over
the course of evolution)
32
CLUSTALW multiple sequence alignment (rbcS gene)
Cotton
Pea
Tobacco
Ice-plant
Turnip
Wheat
Duckweed
Larch
ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT
GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA
TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC
TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC
ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC
TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA
TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA
TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC
Cotton
Pea
Tobacco
Ice-plant
Turnip
Wheat
Duckweed
Larch
CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A
C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A
AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA
ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA
CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A
GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT
TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA
Cotton
Pea
Tobacco
Ice-plant
Turnip
Wheat
Duckweed
Larch
ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA
GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA
GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG
GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG
CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA
CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG
TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC
CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA
Cotton
Pea
Tobacco
Ice-plant
Larch
Turnip
Wheat
Duckweed
T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC
TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC
CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA
TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC
TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA
TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG
GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC
CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG
33
FootPrinter
• Inputs:
– evolutionary tree T
– corresponding regulatory regions at leaves
• Output: motifs well conserved w.r.t. T.
34
Finding Short Motifs
AGTCGTACGTGAC... (Human)
AGTAGACGTGCCG... (Chimp)
ACGTGAGATACGT... (Rabbit)
GAACGGAGTACGT... (Mouse)
TCGTGACGGTGAT... (Rat)
Size of motif sought: k = 4
35
Most Parsimonious
Solution
ACGT
AGTCGTACGTGAC...
AGTAGACGTGCCG...
ACGT
ACGTGAGATACGT...
ACGT
ACGG
GAACGGAGTACGT...
TCGTGACGGTGAT...
“Parsimony score”: 1 mutation
36
Substring Parsimony Problem
Given:
• phylogenetic tree T,
• set of orthologous sequences at leaves of T,
• length k of motif
• threshold d
Problem:
• Find each set S of k-mers, one k-mer from
each leaf, such that the parsimony score of S
in T is at most d.
This problem is NP-hard.
37
FootPrinter’s Exact Algorithm
(with Mathieu Blanchette, generalizing Sankoff and Rousseau
1975)
Wu [s] = best parsimony score for subtree rooted at node u,
if u is labeled with string s.
…
…
4k entries
ACGG: 1
ACGT: 0
ACGG: +
ACGT: 0
...
AGTCGTACGTG
...
…
ACGG:
ACGT :0
...
…
ACGGGACGTGC
ACGG: 2
ACGT: 1
…
ACGG:
ACGT :0
...
...
…
ACGG:
ACGT :0
...
…
ACGG: 1
ACGT: 1
...
…
ACGG: 0
ACGT: 2
...
…
ACGTGAGATAC
GAACGGAGTAC
TCGTGACGGTG
ACGG: 0
ACGT: +
...
38
Running Time
Wu [s] =  min ( Wv [t] + d(s, t) )
v: child t
of u
Number of
species
Average
sequence
length
Total time O(n k (4k + l ))
Motif length
39
Improvements
• Better algorithm reduces time from
2k
k
O(n k (4 + l )) to O(n k (4 + l ))
• By restricting to motifs with parsimony
score at most d, greatly reduce the
number of table entries computed
(exponential in d, polynomial in k)
• Amenable to many useful extensions
(e.g., allow insertions and deletions)
40
Application to -actin Gene
Gilthead sea bream (678 bp)
Medaka fish (1016 bp)
Common carp (696 bp)
Grass carp (917 bp)
Chicken (871 bp)
Human (646 bp)
Rabbit (636 bp)
Rat (966 bp)
Mouse (684 bp)
Hamster (1107 bp)
41
Common carp
ACGGACTGTTACCACTTCACGCCGACTCAACTGCGCAGAGAAAAACTTCAAACGACAACA
TTGGCATGGCTTTTGTTATTTTTGGCGCTTGACTCAGG
ATCTAAAAACTGGAACGGCGAAGGTGACGGCAATGTTTTGGCAAATAAGCATCCCCGAAGTTCTACAATGCATCTGAGGACTCAATGTTTTTTTTTTTTTTT
TTTCTTTAGTCATTCCAAATGTTTGTTAAATGCATTGTTCCGAAACTTATTTGCCTCTATGAAGGCTGCCCAGTAATTGGGAGCATACTTAACATTGTAGTATTGTA T
GTAAATTATGTAACAAAACAATGACTGGGTTTTTGTACTTTCAGCCTTAATCTTGGGTTTTTTTTTTTTTTTGGTTCCAAAAAACTAAGCTTTACCATTCAAGATGTAAA
CCTGTACACTGAC
GGTTTCATTCCCCCTGGCATATTGAAAAAGCTGTGTGGAACGTGGCGGTGCAGACATTTGGTGGGGCCA A
TAATTCAAATAAAAGT
GCACATGTAAGACATCCTACTCTGTGTGATTTTTCTGTTTGTGCTGAGTGAACTTGCTATGAAGTCTTTTAGTGCACTCTTTAATAAAAGTAGTCTTCCCTTAAAGTGTCC
CTTCCCTTATGGCCTTCACATTTCTCAACTAGCGCTTCAACTAGAAAGCACTTTAGGGACTGGGATGC
Chicken
TTGGCATGGCTTTATTTGTTTTTTCTTTTGGC
ACCGGACTGTTACCAACACCCACACCCCTGTGATGAAACAAAACCCATAAATGCGCATAAAACAAGACGAGA
GC
TTGACTCAGGATTAAAAAACTGGAATGGTGAAGGTGTCAGCAGCAGTCTTAAAATGAAACATGTTGGAGCGAACGCCCCCAAAGTTCTACAATG
CATCTGAGGACTTTGATTGTACATTTGTTTCTTTTTTAAT AGTCATTCCAAATATTGTTATAATGCATTGTTACAGGAAGTTACTCGCCTCTGTGAAGGCAACAGCCCA
GCTGGGAGGAGCCGGTACCAATTACTGGTGTTAGATGATAATTGCTTGTCTGTAAATTATGTAACCCAACAAGTGTCTTTTTGTATCTTCCGCCTTAAAAACAAAACAC
ACTTGATCCTTTTTGGTTTGTCAAGCAAGCGGGCTGTGTTCCCCAGTGATAGATGTGAATGAAGGCTTTACAGTCCCCCACAGTCTAGGAGTAAAGTGCCAGTATGTGGG
CCTGTACACTGAC
GGAGGGAGGGGCTA
TTAAGACCAGTTCAAATAAAAGTGCACACAATAGAGGCTTGACTGGTGTTGGTTTTTATTTCTGTGCTGCGC
TGCTTGGCCGTTGGTAGCTGTTCTCATCTAGCCTTGCCAGCCTGTGTGGGTCAGCTATCTGCATGGGCTGCGTGCTGGTGCTGTCTGGTGCAGAGGTTGGATAAACCGT
GATGATATTTCAGCAAGTGGGAGTTGGCTCTGATTCCATCCTGAGCTGCCATCAGTGTGTTCTGAAGGAAGCTGTTGGATGAGGGTGGGCTGAGTGCTGGGGGACAGCT
GGGCTCAGTGGGACTGCAGCTGTGCT
Human
TTGGCATGGCTTTATTTGTTTTTTTTGTTTTGTT
GCGGACTATGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCTAACTTGCGCAGAAAACAAGATGAGA
TTGGTTTTTTTTTTTTTTTTGGC
TTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTCA
CAATGTGGCCGAGGACTTTGATTGCATTGTTGTTTTTTTAATAGTCATTCCAAATATGAGATGCATTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTC
TCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGGAGGTGATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATA
CTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGG
CCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTAAAAATGAGGCCAAGTGTGACTTTGTGGTGTGGCTGGGT
AGGCAGCCAGGGCTTA
TGGGGGCAGCAGAGGGTG
Parsimony score over 10 vertebrates: 0 1
2
42
Motifs Absent from Some Species
• Find motifs
– with small parsimony score
– that span a large part of the tree
• Example: in tree of 10 species spanning 760 Myrs, find
all motifs with
–
–
–
–
score 0 spanning at least 250 Myrs
score 1 spanning at least 350 Myrs
score 2 spanning at least 450 Myrs
score 3 spanning at least 550 Myrs
43
Application to c-fos Gene
10
Puffer fish
7
2
Chicken
2
1
Pig
2
2
1
0
1
Mouse
Hamster
Human
Asked for motifs of length 10, with
0 mutations over tree of size 6
1 mutation over tree of size 11
2 mutations over tree of size 16
3 mutations over tree of size 21
4 mutations over tree of size 26
Found:
0 mutations over tree of size 8
1 mutation over tree of size 16
3 mutations over tree of size 21
4 mutations over tree of size 28
44
Application to c-fos Gene
Motif
Score
Conserved in
Known?
CAGGTGCGAATGTTC
0
4 mammals
TTCCCGCCTCCCCTCCCC
0
4 mammals
GAGTTGGCTGcagcc
3
puffer + 4 mammals
GTTCCCGTCAATCcct
1
chicken + 4 mammals
yes
CACAGGATGTcc
4
all 6
yes
AGGACATCTG
1
chicken + 4 mammals
yes
GTCAGCAGGTTTCCACG
0
4 mammals
yes
TACTCCAACCGC
0
4 mammals
yes
metK in B. subtilis45
Outline
• Regulation of genes
• Motif discovery by overrepresentation
– MEME
– Gibbs sampling
• Motif discovery by phylogenetic
footprinting
– FootPrinter
– MicroFootPrinter
46
MicroFootPrinter
• Neph & Tompa, 2006
• Designed specifically for phylogenetic
footprinting in prokaryotic genomes
• Front end to FootPrinter
• Available at
bio.cs.washington.edu/software.html
47
Microbial Footprinting
• 1454 prokaryotes with genomes completely
sequenced (as of 2/17/2011)
– For any prokaryotic gene of interest, plenty of close genes
in other species available
– Relatively simple genomes
• MicroFootPrinter
–
–
–
–
undergraduate Computational Biology Capstone project
Goal: simple interface for microbiologists
User specifies species and gene of interest
Automates collection of orthologous genes, cis-regulatory
sequences, gene tree, parameters
48
Demo
• MicroFootPrinter home
• Examples: Agrobacterium tumefaciens
genes regulated by ChvI (with Eugene Nester)
– chvI (two component response regulator)
– ropB (outer membrane protein )
49
Sample chvI motif
Parsimony score:
Span:
Significance score:
2
41.10
4.22
B. henselae
R. etli
R. leguminosarum
S. meliloti
S. medicae
A. tumefaciens
M. loti
M. sp.
O. anthropi
B. suis
B. melitensis
B. abortus
B. ovis
B. canis
-151
-90
-106
-119
-118
-105
-80
-87
-158
-38
-156
-156
-156
-38
GCTACAATTT
GCCACAATTT
GCCACAATTT
GCCACAATTT
GCCACAATTT
GCCACAATTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
GCCACATTTT
50
Sample ropB motif
Parsimony score:
Span:
Significance score:
1
20.70
1.34
Jannaschia sp.
R. etli
R. leguminosarum
A. tumefaciens
S. meliloti
S. medicae
-151
-134
-135
-131
-128
-128
CACATTTTGG
CACAATTTGG
CACAATTTGG
CACATTTTGG
CACATTTTGG
CACATTTTGG
51
Combined ChvI Motif
ropB:
chvI:
Atu1221:
ultimate:
CACATTTTGG
GCCACAATTT
TTGTCACAAT
GYCACAWTTTGG
Y={C,T}
W={A,T}
52