Topological Pressure of Finite Sequences

Topological Pressure
of Finite Sequences
David Koslicki
July 2011
Penn State University
Joint work with Dan Thompson
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Symbolic Dynamics
Symbolic Dynamics
Hadamard: 19th century
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
T :X ØX
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
T
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
X=
A
T
B
C
D
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
.
X=
A
T
B
C
D
T :X ØX
Smale’s Horseshoe Map
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
.
X=
A
T
B
C
D
T :X ØX
Smale’s Horseshoe Map
.
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
.
X=
A
B
C
D
T :X ØX
Smale’s Horseshoe Map
.
T
.
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
.
X=
A
B
C
D
T :X ØX
Smale’s Horseshoe Map
.
T
.
.
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
.
X=
A
B
C
D
T :X ØX
Smale’s Horseshoe Map
.
T
.
A
B
C
D
.
Symbolic Dynamics
Hadamard: 19th century
Studying geodesics
Coding smooth transformations
T :X ØX
.
X=
A
B
C
.
D
T
Smale’s Horseshoe Map
“CD” is an acceptable
word
.
A
B
C
D
.
Symbolic Dynamics
Symbolic Dynamics
= 8a 1 , . . . , a d <
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø Shift Map
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Shift Map
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Shift Map
X Õ
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Shift Map
X Õ
Shift-invariant space
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Shift Map
X Õ
sH X L X
Shift-invariant space
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
w œ X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
w œ X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
pw HnL # 8u : u n and u appears as a subword of w<
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
w œ pw HnL # 8u : u n and u appears as a subword of w<
Number of n-length subwords of w
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
w œ pw HnL # 8u : u n and u appears as a subword of w<
Number of n-length subwords of w
Topological Entropy of a Sequence
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
w œ pw HnL # 8u : u n and u appears as a subword of w<
Number of n-length subwords of w
Topological Entropy of a Sequence
H top HwL lim
nض
log pw HnL
n
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
w œ pw HnL # 8u : u n and u appears as a subword of w<
Number of n-length subwords of w
Topological Entropy of a Sequence
H top HwL lim
log pw HnL
nض
Exponential growth rate of the number of
n-length subwords as n tends to infinity
n
Symbolic Dynamics
= 8a 1 , . . . , a d <
Alphabet
One-sided infinite sequences
s : Ø sHHai Liœ L Hai+1 Liœ
Complexity Function
Shift Map
X Õ
Shift-invariant space
H X , sL
Symbolic Dynamical System
sH X L X
w œ pw HnL # 8u : u n and u appears as a subword of w<
Number of n-length subwords of w
Topological Entropy of a Sequence
H top HwL lim
log pw HnL
nض
Exponential growth rate of the number of
n-length subwords as n tends to infinity
n
Traditional Topological Entropy
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
Sturmian sequence
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
H top HwL 0
Sturmian sequence
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
H top HwL 0
Example 2: Full Shift
Sturmian sequence
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
Sturmian sequence
H top HwL 0
Example 2: Full Shift
v baaaaabbbaabaababbabbbaababaababbaaaaaaa…
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
Sturmian sequence
H top HwL 0
Example 2: Full Shift
v baaaaabbbaabaababbabbbaababaababbaaaaaaa…
pw HnL 2n
w encodes trajectory
Traditional Topological Entropy
Example 1: Cutting Sequence of Square Billiard
w abababaabababaababababaabababaababababaa…
pw HnL n + 1
Sturmian sequence
H top HwL 0
Example 2: Full Shift
v baaaaabbbaabaababbabbbaababaababbaaaaaaa…
pw HnL 2n
H top HwL 1
w encodes trajectory
Salient Properties:
Salient Properties:
1. 0 § H top HwL § 1
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
lim
nض
log pw HnL
n
=0
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
lim
log pw HnL
nض
So we need to evaluate at a single n…
n
=0
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
lim
log pw HnL
nض
So we need to evaluate at a single n…
Shape of complexity function:
n
=0
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
lim
log pw HnL
nض
So we need to evaluate at a single n…
Shape of complexity function:
n
=0
Salient Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
Issues with finite sequences:
pw HnL eventually zero
lim
log pw HnL
nض
So we need to evaluate at a single n…
Shape of complexity function:
n
=0
Adaptation to Finite Sequences
Adaptation to Finite Sequences
Shape of complexity function:
Adaptation to Finite Sequences
Shape of complexity function:
For a finite sequence w, of length N there are integers m and M , such that the
complexity function pw HnL is strictly increasing in the interval [0,m], non-decreasing
in the interval [m,M] and strictly decreasing in the interval [M,N]
(in fact pwtHn + 1L - pw HnL - 1 ).
Adaptation to Finite Sequences
Shape of complexity function:
For a finite sequence w, of length N there are integers m and M , such that the
complexity function pw HnL is strictly increasing in the interval [0,m], non-decreasing
in the interval [m,M] and strictly decreasing in the interval [M,N]
(in fact pwtHn + 1L - pw HnL - 1 ).
Definition of Topological Entropy for Finite Sequences:
Adaptation to Finite Sequences
Shape of complexity function:
For a finite sequence w, of length N there are integers m and M , such that the
complexity function pw HnL is strictly increasing in the interval [0,m], non-decreasing
in the interval [m,M] and strictly decreasing in the interval [M,N]
(in fact pwtHn + 1L - pw HnL - 1 ).
Definition of Topological Entropy for Finite Sequences:
For w, a finite sequence over the alphabet (with d ), let n be the
unique integer such that d n + n - 1 § w < d n+1 + Hn + 1L - 1
Then we define:
Adaptation to Finite Sequences
Shape of complexity function:
For a finite sequence w, of length N there are integers m and M , such that the
complexity function pw HnL is strictly increasing in the interval [0,m], non-decreasing
in the interval [m,M] and strictly decreasing in the interval [M,N]
(in fact pwtHn + 1L - pw HnL - 1 ).
Definition of Topological Entropy for Finite Sequences:
For w, a finite sequence over the alphabet (with d ), let n be the
unique integer such that d n + n - 1 § w < d n+1 + Hn + 1L - 1
Then we define:
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
Shape of complexity function:
For a finite sequence w, of length N there are integers m and M , such that the
complexity function pw HnL is strictly increasing in the interval [0,m], non-decreasing
in the interval [m,M] and strictly decreasing in the interval [M,N]
(in fact pwtHn + 1L - pw HnL - 1 ).
Definition of Topological Entropy for Finite Sequences:
For w, a finite sequence over the alphabet (with d ), let n be the
unique integer such that d n + n - 1 § w < d n+1 + Hn + 1L - 1
Then we define:
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
Adaptation to Finite Sequences
Lemmas:
Adaptation to Finite Sequences
Lemmas:
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
Lemmas:
H top HwL : A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
Lemmas:
H top HwL : A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
Lemmas:
H top HwL : A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
Properties:
logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
H top HwL : Lemmas:
A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
Properties:
1. 0 § H top HwL § 1
logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
H top HwL : Lemmas:
A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
logd I pwd1
n
n
+n -1
HnLM
Adaptation to Finite Sequences
H top HwL : Lemmas:
logd I pwd1
A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
n
n
+n -1
HnLM
Adaptation to Finite Sequences
H top HwL : Lemmas:
logd I pwd1
n
A sequence w of length d n + n - 1 over an alphabet with
d letters can contain at most d n subwords of length n.
Conversely, for a sequence w, to contain d n subwords of
n
length n, it must have length d + n - 1 .
Properties:
1. 0 § H top HwL § 1
2. H top HwL º 0 iff w contains few subwords (simple)
3. H top HwL º 1 iff w contains many subwords (complex)
4. For different v and w, H top Hv L and H top HwL are comparable
n
+n -1
HnLM
Examples
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
H top HwL : logd I pwd1
n
+n -1
HnLM
n
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
8CT , TT , TC , CC , CT , TC , CA , AA , AG , GT , TC , CT , TC , CA , AA , AC <
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
8CT , TT , TC , CC , CA , AA , AG , GT , AC <
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
8CT , TT , TC, CC, CA, AA, AG, GT , AC < = 9
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
8CT , TT , TC , CC , CA , AA , AG , GT , AC < = 9
pw171 H2L 9
H top HwL : logd I pwd1
n
n
+n -1
HnLM
Examples
wCT T CCT C A AGT CT C A ACCGGT T
H top HwL : w 22
17 42 + 2 - 1 § w < 43 + 3 - 1 66
n2
42 +2-1
w1
w17
1
CT T CCT C A AGT CT C A AC
8CT , TT , TC, CC, CA, AA, AG, GT , AC < = 9
pw171 H2L 9
H top HwL log4 I pw42 +2-1 H2LM
1
2
=
log4 H9L
2
= .792481
logd I pwd1
n
n
+n -1
HnLM
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Pressure
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Example:
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Example:
yHwL w31
a
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Example:
yHwL w31 a
yHwL 1
‚ wn1
n i
ai
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Example:
Subwords
yHwL w31 a
yHwL 1
‚ wn1
n i
ai
Topological Pressure
Intuition: Topological pressure captures the diversity of subwords.
Topological pressure weights certain subwords as more/less important.
Characterize “importance” through a function called the potential.
Function that depends on m symbols:
We say that a function y depends on m symbols if:
For all v œ m the restriction of y to @vD is a constant function.
There exists w œ Am-1 such that the restriction of y to @wD is not
a constant function.
Example:
yHwL Subwords
SWn HwL 8u : u n and u Õ w<
w31 a
yHwL 1
‚ wn1
n i
ai
Topological Pressure
Topological Pressure
Definition of Topological Pressure for finite sequences:
Topological Pressure
Definition of Topological Pressure for finite sequences:
For a word w such that w = n +n - 1 and a function y which depends on m
symbols, with n ¥ m then the topological pressure is:
Topological Pressure
Definition of Topological Pressure for finite sequences:
For a word w such that w = n +n - 1 and a function y which depends on m
symbols, with n ¥ m then the topological pressure is:
1
PHw, yL log† §
n
‚
uœ SWn HwL
n-m
exp ‚ yIsi uM
i=0
Topological Pressure
Definition of Topological Pressure for finite sequences:
For a word w such that w = n +n - 1 and a function y which depends on m
symbols, with n ¥ m then the topological pressure is:
1
PHw, yL log† §
n
For a w with n
+n-1§ w < ‚
n-m
uœ SWn HwL
n +1
+n
exp ‚ yIsi uM
let:
i=0
Topological Pressure
Definition of Topological Pressure for finite sequences:
For a word w such that w = n +n - 1 and a function y which depends on m
symbols, with n ¥ m then the topological pressure is:
1
PHw, yL log† §
n
For a w with n
+n-1§ w < ‚
n-m
uœ SWn HwL
n +1
PHw, yL PIw1
+n
exp ‚ yIsi uM
i=0
let:
n +n-1
, yM
Topological Pressure
Topological Pressure
Variational Principle:
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
m
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
Where m is a shift-invariant probability measure on m
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
m
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
Miscellanea:
m
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
m
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
Miscellanea:
For y log j with j > 0 a function that
depends on m symbols, we have
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
m
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
Miscellanea:
For y log j with j > 0 a function that
depends on m symbols, we have
PHw, yL 1
log† §
n
‚
n-m
‰ jIsi uM
uœ SWn HwL i=0
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
m
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
Miscellanea:
For y log j with j > 0 a function that
depends on m symbols, we have
PHw, yL 1
log† §
n
Scaling of a potential:
‚
n-m
‰ jIsi uM
uœ SWn HwL i=0
Topological Pressure
Variational Principle:
lim
max PHw, yL sup :hm + ‡ y dm>
nض w: w n
m
Where m is a shift-invariant probability measure on and hm is the metric entropy of m
Miscellanea:
For y log j with j > 0 a function that
depends on m symbols, we have
PHw, yL 1
log† §
n
‚
n-m
‰ jIsi uM
uœ SWn HwL i=0
Scaling of a potential:
PHw, logHt jLL n-m
log† § Ht L + PHw, log jL
n
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Application to DNA Sequences
Application to DNA Sequences
DNA:
Application to DNA Sequences
DNA:
Application to DNA Sequences
DNA:
Alphabet: 8 A, C, T , G <
Application to DNA Sequences
DNA:
Alphabet: 8 A, C, T , G <
Application to DNA Sequences
DNA:
Alphabet: 8 A, C, T , G <
Human genome: 3,080,000,000 bp
Application to DNA Sequences
DNA:
Alphabet: 8 A, C, T , G <
Human genome: 3,080,000,000 bp
Only 2% codes for genes (gene length 30002.4 million bp)
Application to DNA Sequences
Application to DNA Sequences
Introns, Exons, and Genes:
Application to DNA Sequences
Introns, Exons, and Genes:
Exons: translated to a protein (gene)
Introns: junk?
Application to DNA Sequences
Introns, Exons, and Genes:
Exons: translated to a protein (gene)
Introns: junk?
Application to DNA Sequences
Introns, Exons, and Genes:
Exons: translated to a protein (gene)
Introns: junk?
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Application to DNA Sequences
Application to DNA Sequences
Application to Introns and Exons:
(D. Koslicki, Bioinformatics, 2011)
Application to DNA Sequences
Application to Introns and Exons:
(D. Koslicki, Bioinformatics, 2011)
How well does topological entropy
distinguish between introns and exons?
Application to DNA Sequences
Application to Introns and Exons:
(D. Koslicki, Bioinformatics, 2011)
How well does topological entropy
distinguish between introns and exons?
Is the entropy of introns higher or lower
than that of exons?
Application to DNA Sequences
How well does topological entropy
distinguish between introns and exons?
Application to Introns and Exons:
(D. Koslicki, Bioinformatics, 2011)
Is the entropy of introns higher or lower
than that of exons?
LC
1.00
0.99
0.98
Exons
Chromosome
Chr Y
Chr X
Chr 22
Chr 21
Chr 19
Chr 18
Chr 17
Chr 16
Chr 15
Chr 14
Chr 13
Chr 12
Chr 11
Chr 10
Chr 9
Chr 8
Chr 7
Chr 6
Chr 5
Chr 4
Chr 3
Chr 2
Chr 1
Chr 20
Introns
0.97
Application to DNA Sequences
How well does topological entropy
distinguish between introns and exons?
Application to Introns and Exons:
(D. Koslicki, Bioinformatics, 2011)
Is the entropy of introns higher or lower
than that of exons?
Htop
0.94
LC
0.93
1.00
0.92
0.91
0.99
0.90
0.98
0.89
Exons
Introns
Exons
0.87
Chr Y
Chr X
Chr 22
Chr 21
Chr 20
Chr 19
Chr 18
Chr 17
Chr 16
Chr 15
Chr 14
Chr 13
Chr 12
Chr 11
Chr 10
Chr 9
Chr 8
Chr 7
Chr 6
Chr 5
Chr 4
Chr 3
Chr 2
Chr 1
Introns
Chromosome
0.88
0.97
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Gene Distribution Detection
Gene Distribution Detection
Gene Detection Simplified:
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Not very good at finding novel genes
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Not very good at finding novel genes
Gene Distribution:
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Not very good at finding novel genes
Gene Distribution:
Given a novel genome, estimate the
distribution of genes (or coding sequences)
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Not very good at finding novel genes
Gene Distribution:
Given a novel genome, estimate the
distribution of genes (or coding sequences)
Gene Distribution Detection
Gene Detection Simplified:
1. Use all 6 open reading frames, look for “stop” and “start” amino acids.
2. Use known genes with pattern matching and neural networks.
Must have previously known genes
Not very good at finding novel genes
Gene Distribution:
Given a novel genome, estimate the
distribution of genes (or coding sequences)
Ensemble genome
browser
Gene Distribution Detection
Gene Distribution Detection
Potentials:
Gene Distribution Detection
Potentials:
Restrict attention to y = log j which depend on 3 symbols.
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
When is a sequence “important”?
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
When is a sequence “important”?
Codons, Proteins…
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
When is a sequence “important”?
Procedure:
Codons, Proteins…
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Analyze biological relevance
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Analyze biological relevance
Compare to other possible potentials
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Analyze biological relevance
Compare to other possible potentials
Intuition:
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Analyze biological relevance
Compare to other possible potentials
Intuition:
View a DNA sequence as concatenations of symbolic dynamical
systems, and then use pressure to differentiate between the systems.
Gene Distribution Detection
Potentials:
Problem:
Restrict attention to y = log j which depend on 3 symbols.
Which potential to use?
Codons, Proteins…
When is a sequence “important”?
Procedure:
Obtain a potential via maximizing the correlation of PHw, log jL
with a given set of biological data.
Analyze biological relevance
Compare to other possible potentials
Intuition:
View a DNA sequence as concatenations of symbolic dynamical
systems, and then use pressure to differentiate between the systems.
Gene Distribution Detection
Gene Distribution Detection
Sliding Window Output:
Gene Distribution Detection
Sliding Window Output:
Chromosome 5 (180,915,260 bp)
Window size 65543
Gene Distribution Detection
Sliding Window Output:
Chromosome 5 (180,915,260 bp)
Window size 65543
TopologicalPressure
90
85
80
75
70
65
60
500
1000
1500
2000
2500
Window Location
Gene Distribution Detection
Gene Distribution Detection
Smoothing:
Gene Distribution Detection
Smoothing:
Convolution with a Gaussian kernel
Moving Average
Gaussian Filter
Etc. all give similar results
Gene Distribution Detection
Smoothing:
Convolution with a Gaussian kernel
Moving Average
Gaussian Filter
Etc. all give similar results
Radius selection based on Gaussian
kernel density estimation with
bandwidth selected according to
Silverman’s rule
Gene Distribution Detection
Smoothing:
Convolution with a Gaussian kernel
Moving Average
Gaussian Filter
Etc. all give similar results
Radius selection based on Gaussian
kernel density estimation with
bandwidth selected according to
Silverman’s rule
TopologicalPressure
77.5
77.0
76.5
76.0
500
1000
1500
2000
2500
Window Location
Gene Distribution Detection
Smoothing:
Convolution with a Gaussian kernel
Moving Average
Gaussian Filter
Etc. all give similar results
TopologicalPressure
Radius selection based on Gaussian
kernel density estimation with
bandwidth selected according to
Silverman’s rule
Window size of 65,543 on Chr5
77.5
77.0
76.5
76.0
500
1000
1500
2000
2500
Window Location
Gene Distribution Detection
Gene Distribution Detection
Correlation between pressure and CDS distribution:
Gene Distribution Detection
Correlation between pressure and CDS distribution:
Hg18, March 2006 assembly.
Window size: 65543
Gene Distribution Detection
Correlation between pressure and CDS distribution:
Hg18, March 2006 assembly.
Window size: 65543
ndardized Value
2
1
500
1000
1500
2000
2500
Window Location
-1
Topological Pressure
Known Coding Sequence Density
Gene Distribution Detection
Correlation between pressure and CDS distribution:
ndardized Value
Hg18, March 2006 assembly.
Window size: 65543
Correlation: 0.979481
2
1
500
1000
1500
2000
2500
Window Location
-1
Topological Pressure
Known Coding Sequence Density
Gene Distribution Detection
Correlation between pressure and CDS distribution:
ndardized Value
Hg18, March 2006 assembly.
Window size: 65543
Correlation: 0.979481
2
1
500
1000
1500
2000
2500
Window Location
-1
Topological Pressure
Known Coding Sequence Density
Gene Distribution Detection
Gene Distribution Detection
Comparative Analysis:
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
20
24
24
1
5
10
15
20
24
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
20
24
24
1
5
10
15
Max: 0.252 (chrY, chr7)
20
24
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
20
24
24
1
5
10
15
Max: 0.252 (chrY, chr7)
Mean: 0.1815
20
24
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
20
24
24
1
5
10
15
Max: 0.252 (chrY, chr7)
Mean: 0.1815
Expected: 1.41421
20
24
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
20
24
24
1
5
10
15
Max: 0.252 (chrY, chr7)
Mean: 0.1815
Expected: 1.41421
20
24
Median Parameter Values:
Gene Distribution Detection
Comparative Analysis:
24 potentials with 64 variables each
Similar parameter values
Euclidean Distance
Pairwise Euclidean Distance
1
5
10
15
20
24
1
1
5
5
10
10
15
15
20
Median Parameter Values:
2nd Base
U
UUU HPheL
U UUC HPheL
C
UCU HSerL
UCC HSerL
A
UAU HTyrL
UAC HTyrL
G
UGU HCysL
UGC HCysL
UUA HLeuL
UCA HSerL
UAA Stop
UGA Stop
UUG HLeuL
UCG HSerL
UAG Stop
UGG HTrpL
CUU HLeuL
CCU HProL
CAU HHisL
CGU HArgL
C CUC HLeuL
CCC HProL
CAC HHisL
CGC HArgL
CUA HLeuL
CCA HProL
CAA HGlnL
CGA HArgL
CUG HLeuL
CCG HProL
CAG HGlnL
CGG HArgL
1st
AUU HIleL
baseA AUC HIleL
ACU HThrL
ACC HThrL
AAU HAsnL
AAC HAsnL
AGU HSerL
AGC HSerL
AUA HIleL
AUG HMetL
GUU HValL
G GUC HValL
ACA HThrL
ACG HThrL
GCU HAlaL
GCC HAlaL
AAA HLysL
AAG HLysL
GAU HAspL
GAC HAspL
AGA HArgL
AGG HArgL
GGU HGlyL
GGC HGlyL
GUA HValL
GCA HAlaL
GAA HGluL
GGA HGlyL
GUG HValL
GCG HAlaL
GAG HGluL
GGG HGlyL
20
24
24
1
5
10
15
Max: 0.252 (chrY, chr7)
Mean: 0.1815
Expected: 1.41421
20
24
Unit
Square
Gene Distribution Detection
Gene Distribution Detection
Compared to Mouse Genome:
Gene Distribution Detection
Compared to Mouse Genome:
Use the median potential j obtained before
Gene Distribution Detection
Compared to Mouse Genome:
Use the median potential j obtained before
Compare associated sliding window PHw, log jL
to the mouse genome (chromosome 1)
Gene Distribution Detection
Compared to Mouse Genome:
Standardized Value
Use the median potential j obtained before
Compare associated sliding window PHw, log jL
to the mouse genome (chromosome 1)
Topological Pressure
Known mm Coding Sequence Density
2
1
500
-1
-2
1000
1500
2000
2500
Window Location
Gene Distribution Detection
Compared to Mouse Genome:
Standardized Value
Use the median potential j obtained before
Compare associated sliding window PHw, log jL
to the mouse genome (chromosome 1)
Topological Pressure
Known mm Coding Sequence Density
2
1
500
-1
-2
1000
1500
2000
2500
Window Location
Correlation: 0.827
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Topological Entropy of Finite Sequences
Topological Entropy
Topological Pressure
Application to DNA Sequences
Intron and Exons
Gene Distribution Detection
Equilibrium Measures
Measure of Coding Potential
Measure of Coding Potential
Measure from parameter values:
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
Conceptually:
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously:
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously: Gibbs measure (in the sense of Bowen)
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously: Gibbs measure (in the sense of Bowen)
$ M s.t. M
-1
mH@wDL
§
§M
n-3
i
exp 9-nPH , log jL + ⁄i=0 log jHs wL=
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously: Gibbs measure (in the sense of Bowen)
$ M s.t. M
So,
-1
mH@wDL
§
§M
n-3
i
exp 9-nPH , log jL + ⁄i=0 log jHs wL=
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously: Gibbs measure (in the sense of Bowen)
$ M s.t. M
-1
mH@wDL
§
§M
n-3
i
exp 9-nPH , log jL + ⁄i=0 log jHs wL=
So,
n-3
mH@wDL º ‰jIsi wM
i=0
Measure of Coding Potential
Measure from parameter values:
Potential that depends on the first m letters gives ris
to a Markov measure m of memory m-1.
m is the equilibrium measure for log j
max
n
w: w 4 +n-1
PHw, log jL sup :hm + ‡ log j dm> = hm + ‡ log j dm
m
Conceptually: m favors sequences similarly to log j while still respecting entropy constraints.
Rigorously: Gibbs measure (in the sense of Bowen)
$ M s.t. M
-1
mH@wDL
§
§M
n-3
i
exp 9-nPH , log jL + ⁄i=0 log jHs wL=
So,
n-3
mH@wDL º ‰jIsi wM
i=0
Utilizing The Measure m
Utilizing The Measure m
Single sequence measure of coding potential:
Utilizing The Measure m
Single sequence measure of coding potential:
Given a single (short) sequence, measure the likelihood that said sequence is
(part of) a coding sequence.
Utilizing The Measure m
Single sequence measure of coding potential:
Given a single (short) sequence, measure the likelihood that said sequence is
(part of) a coding sequence.
Comparative measures have enjoyed success (RNAcode, Washietl et al. 2011)
Utilizing The Measure m
Single sequence measure of coding potential:
Given a single (short) sequence, measure the likelihood that said sequence is
(part of) a coding sequence.
Comparative measures have enjoyed success (RNAcode, Washietl et al. 2011)
Lin, Jungreis, Kellis (Bioinformatics, 2011):
“Measures of coding potential based on primary
sequence composition are still lacking “
Utilizing The Measure m
Single sequence measure of coding potential:
Given a single (short) sequence, measure the likelihood that said sequence is
(part of) a coding sequence.
Comparative measures have enjoyed success (RNAcode, Washietl et al. 2011)
Lin, Jungreis, Kellis (Bioinformatics, 2011):
“Measures of coding potential based on primary
sequence composition are still lacking “
Current Work:
Utilizing The Measure m
Single sequence measure of coding potential:
Given a single (short) sequence, measure the likelihood that said sequence is
(part of) a coding sequence.
Comparative measures have enjoyed success (RNAcode, Washietl et al. 2011)
Lin, Jungreis, Kellis (Bioinformatics, 2011):
“Measures of coding potential based on primary
sequence composition are still lacking “
Current Work:
Evaluating usefulness of the measure m in determining coding potential of short sequences
Summary
Summary
Topological Pressure:
Summary
Topological Pressure: Leads to new biological insight
Summary
Topological Pressure: Leads to new biological insight
Relevant thermodynamics properties hold
Summary
Topological Pressure: Leads to new biological insight
Relevant thermodynamics properties hold
Symbolic dynamics fruitful when applied to biology
Summary
Topological Pressure: Leads to new biological insight
Relevant thermodynamics properties hold
Symbolic dynamics fruitful when applied to biology
Accurately approximates coding sequence density distribution
Summary
Topological Pressure: Leads to new biological insight
Relevant thermodynamics properties hold
Symbolic dynamics fruitful when applied to biology
Accurately approximates coding sequence density distribution
Clear and comprehensible application of thermodynamic concepts
Thank you