A T-Entropy analysis of integrated volcanic data Nunnari G. (1), Cannavò F. (1), Spata A. (1) 1) Dipartimento di Ingegneria Elettrica Elettronica e dei Sistemi – Univ. di Catania 2) Istituto Nazionale di Geofisica e Vulcanologia – Sezione di Catania Introduction Clinometric Radial component Magnetic data Clinometric Tangenzial component Serie Temporali Codifica Simbolica Tilt 0100110100101… Magnetici 2233233232223… Tremore 445544544545… Informazione Complessa 0100-2233-4455-1101-2332-4454-... T-Complexity T-Information T-Entropy T-Code Self-Synchronisation One of the distinguishing properties of T-Codes is their strong tendency towards self-synchronisation. Serie Temporali Codifica Simbolica Tilt 0100100101… Magnetici 0101010101… Tremore 1001001011… T-codes Synch ? Notation Let A = {a1, a2, a3,…, a#A-1, a#a} a finite alphabet, where ai is called a symbol or character. We use A* to denote the set of all finite strings that can be generated by concatenations of characters from A We have λ denote the empty string and let A+ = A* \ { λ } For x, y A*, we denote the concatenation of x and y as xy We use xk to denote the concatenation of k copies of x, such that x0 = λ The length of x is denoted as |x| T-Codes •A finite code set S is a T-code set if: 1) S is an alphabet, or 2) S can be derived from a T-code set via a process know as T-augmentation. (k ) T-augmentation S ( p ) of a code set S is defined as follows: S x | x p y, where 0 k k and y S \ y p k' (k ) ' k 1 ( p) where p C and k N We say C can be derived from C if p C and k N : C C( p ) We call p the T-prefix and k the T-expansion parameter of that T-augmentation. ' ' (k ) ( k1 , k 2 ,..., k m ) A series of m successive T-augmentations of an alphabet S is denoted S ( p , p 1 2 ,..., p m ) T-Codes are constructed with no regard to symbol probabilities. Their construction focuses instead on a recursive tree structure. T-Augmentation The Significance of the Longest Codewords The number of longest codewords equals the cardinality of the alphabet Given an arbitrary finite string over an arbitrary finite alphabet it is always possible to find a T-Code set for which this string is one of its longest codewords. This set is unique, i.e., there is no other T-Code set for which the same string is also one of the longest codewords. This duality between strings and T-Code sets permits us to think of the T-Code set construction algorithm not only as a code construction algorithm, but also as a string construction (production) algorithm. The T-augmentations are the steps in this algorithm. How an existing string can be parsed to yield the associated T-Code set ? T-Decomposition Suppose that, for a given string x and a letter a from the alphabet S, we want to find the T-Code set for which xa is one of the longest codewords. 1. Set m = 0. ( k1 , k 2 ,..., k m ) 2. Decode xa as a string of codewords from S ( p , p 2 ,..., p m ) 1 2 ,..., k m ) 1 2 ,..., p m ) 1 3. If xa decoded into a single codeword from S ((pk ,,pk set n = m and finish 4. Otherwise, set the T-prefix pm+1 to be the second-to-last codeword in the ( k1 , k 2 ,..., k m ) decoding over S ( p , p 1 2 ,..., p m ) 5. Count the number of adjacent copies of pm+1 that immediately precede the second-to-last codeword. Add 1 to this number, and define it to be the T-expansion parameter km+1. 6. T-augment with pm+1 and km+1. 7. Increment m by 1 and goto step 2 above. Example: Let x = 011000101010 and a = 0. Let xa = 0110001010100 be the longest codeword in some T-Code set Decoded over S = {0, 1}, we obtain xa = 0.1.1.0.0.0.1.0.1.0.1.0.0. from which we identify p1 = 0 and k1 = 1. Decoded over xa = 01.1.00.01.01.01.00. i.e., p2 = 01 and k2 = 3. Hence, decoded over S (1, 3 ) ( 0 , 01) , we get xa = 011.00.01010100. such that p3 = 00, k3 = 1, and p4 = 011 with k4 = 1. S (1) (0 ) we obtain T-Complexity When Lempel and Ziv proposed their production complexity, they recognised that the number of parsing steps would give a meaningful measure of string complexity. Titchener pursued a similar thought and proposed a “T-complexity” measure as follows : n C ( xa) log (k 1) T 2 i 1 i where the ki are the T-expansion parameters found in the decomposition of xa. The units of CT (xa) are effective T-augmentation steps, or taugs. Lower Bound C ( xa) ln n T Upper Bound C ( xa) li(ln 2 ln(# S )) n T 0 li( z ) z du ln u logarithmic integral function T-Information and T-Entropy The T-information I T (xa ) of the string xa is defined as the inverse logarithmic integral of the T-complexity divided by a scaling constant ln 2: C ( xa) I ( xa) li ln 2 1 T T The neperian logarithm implicitly gives to the T-information the units of nats. The average T-information rate per symbol, referred to here as the average T-entropy of xa and denote by hT (x(n)) is defined simply: I ( xa) h ( xa) n T T
© Copyright 2024 Paperzz