Diapositiva 1 - Agenda Catania

A T-Entropy analysis of integrated volcanic data
Nunnari G. (1), Cannavò F. (1), Spata A. (1)
1) Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi – Univ. di Catania
2) Istituto Nazionale di Geofisica e Vulcanologia – Sezione di Catania
Introduction
Clinometric Radial component
Magnetic data
Clinometric Tangenzial component
Serie Temporali
Codifica Simbolica
Tilt
0100110100101…
Magnetici
2233233232223…
Tremore
445544544545…
Informazione Complessa
0100-2233-4455-1101-2332-4454-...
T-Complexity
T-Information
T-Entropy
T-Code Self-Synchronisation
One of the distinguishing properties of T-Codes is their strong tendency
towards self-synchronisation.
Serie Temporali
Codifica Simbolica
Tilt
0100100101…
Magnetici
0101010101…
Tremore
1001001011…
T-codes
Synch ?
Notation
Let A = {a1, a2, a3,…, a#A-1, a#a} a finite alphabet, where ai is called a symbol or
character.
We use A* to denote the set of all finite strings that can be generated by
concatenations of characters from A
We have λ denote the empty string and let A+ = A* \ { λ }
For x, y A*, we denote the concatenation of x and y as xy
We use xk to denote the concatenation of k copies of x, such that x0 = λ
The length of x is denoted as |x|
T-Codes
•A finite code set S is a T-code set if:
1) S is an alphabet, or
2) S can be derived from a T-code set via a process know as
T-augmentation.
(k )
T-augmentation S ( p ) of a code set S is defined as follows:
S   x | x  p y, where 0  k  k and y  S \ y   p
k'
(k )
'
k 1
( p)


where p  C and k  N
We say C can be derived from C if  p  C and k  N : C  C( p )
We call p the T-prefix and k the T-expansion parameter of that T-augmentation.
'

'
(k )
( k1 , k 2 ,..., k m )
A series of m successive T-augmentations of an alphabet S is denoted S ( p , p
1
2
,..., p m )
T-Codes are constructed with no regard to symbol probabilities. Their construction
focuses instead on a recursive tree structure.
T-Augmentation
The Significance of the Longest Codewords
The number of longest codewords equals the cardinality of the alphabet
Given an arbitrary finite string over an arbitrary finite alphabet it is always possible
to find a T-Code set for which this string is one of its longest codewords.
This set is unique, i.e., there is no other T-Code set for which the same string is
also one of the longest codewords.
This duality between strings and T-Code sets permits us to think of the T-Code set
construction algorithm not only as a code construction algorithm, but also as a
string construction (production) algorithm. The T-augmentations are the steps in
this algorithm.
How an existing string can be parsed to yield the associated T-Code set ?
T-Decomposition
Suppose that, for a given string x and a letter a from the alphabet S, we want
to find the T-Code set for which xa is one of the longest codewords.
1. Set m = 0.
( k1 , k 2 ,..., k m )
2. Decode xa as a string of codewords from S ( p , p
2
,..., p m )
1
2
,..., k m )
1
2
,..., p m )
1
3. If xa decoded into a single codeword from S ((pk ,,pk
set n = m and
finish
4. Otherwise, set the T-prefix pm+1 to be the second-to-last codeword in the
( k1 , k 2 ,..., k m )
decoding over S ( p , p
1
2
,..., p m )
5. Count the number of adjacent copies of pm+1 that immediately precede
the second-to-last codeword. Add 1 to this number, and define it to be
the T-expansion parameter km+1.
6. T-augment with pm+1 and km+1.
7. Increment m by 1 and goto step 2 above.
Example:
Let x = 011000101010 and a = 0.
Let xa = 0110001010100 be the longest codeword in some T-Code set
Decoded over S = {0, 1}, we obtain
xa = 0.1.1.0.0.0.1.0.1.0.1.0.0.
from which we identify p1 = 0 and k1 = 1. Decoded over
xa = 01.1.00.01.01.01.00.
i.e., p2 = 01 and k2 = 3.
Hence, decoded over
S
(1, 3 )
( 0 , 01)
, we get
xa = 011.00.01010100.
such that p3 = 00, k3 = 1, and p4 = 011 with k4 = 1.
S
(1)
(0 )
we obtain
T-Complexity
When Lempel and Ziv proposed their production complexity, they recognised
that the number of parsing steps would give a meaningful measure of string
complexity.
Titchener pursued a similar thought and proposed a “T-complexity” measure
as follows :
n
C ( xa)   log (k  1)
T
2
i 1
i
where the ki are the T-expansion parameters found in the decomposition of xa.
The units of CT (xa) are effective T-augmentation steps, or taugs.
Lower Bound
C ( xa)  ln n
T
Upper Bound
C ( xa)  li(ln 2 ln(# S ))
n
T
0
li( z )  
z
du
ln u
logarithmic integral function
T-Information and T-Entropy
The T-information I T (xa ) of the string xa is defined as the inverse logarithmic
integral of the T-complexity divided by a scaling constant ln 2:
 C ( xa) 
I ( xa)  li 

 ln 2 
1
T
T
The neperian logarithm implicitly gives to the T-information the units of nats.
The average T-information rate per symbol, referred to here as the average
T-entropy of xa and denote by hT (x(n)) is defined simply:
I ( xa)
h ( xa) 
n

T
T