Informa(on-‐Theore(c Tools for Social Media Greg Ver Steeg a nd Aram Galstyan ICWSM Tutorial, July 11, 2013 You could be non-‐parametrically esDmaDng entropies before the tutorial starts… Wifi: Cambridge MS0711 Or visit hJp://www.isi.edu/~gregv/npeet.html to download code If you don’t have “scipy” (scienDfic-‐python) installed, I recommend the “Scipy Superpack”: hJp://fonnesbeck.github.com/ScipySuperpack/ InformaDon-‐TheoreDc Tools for Social Media Greg Ver Steeg and Aram Galstyan July 11, 2013 ICWSM Tutorial InformaDon theory: Reliable communicaDon over a noisy channel 0 000 Encoder 001 Noisy Channel 0 Decoder “How much informaDon can we send?” is an ill-‐posed quesDon What is the maximum rate of error-‐free communicaDon over all possible codes? Surprises: -‐ Error free is possible! -‐ Simple formula for this rate! (Mutual informaDon) Examples of noisy channels 1956 “InformaDon theory has, in the last few years, become something of a scienDfic bandwagon… It will be all too easy for our somewhat arDficial prosperity to collapse overnight when it is realized that the use of a few exciDng words like informa1on, entropy, redundancy do not solve all of our problems” p(Y |X) I(X : Y ) = E log p(Y ) E.g. mutual informa1on We will emphasize two things: – Es1ma1on – Useful, meaningful measures • InformaDon Theory Basics – Entropy, MI, Discrete IT esDmators – Entropy esDmaDon demo – Example: predicDng verdicts from text • Social network dynamics – Entropic measures for Dme series – Transfer entropy & Granger causality – Examples Coffee Break (4:00-‐4:30) • Content on social networks – RepresenDng content – ConDnuous IT esDmators • Informa(on Theory Basics – Entropy, MI, Discrete IT esDmators – Entropy esDmaDon demo – Example: predicDng verdicts from text • Social network dynamics – Entropic measures for Dme series – Transfer entropy & Granger causality – Examples Coffee Break (4:00-‐4:30) • Content on social networks – RepresenDng content – ConDnuous IT esDmators • Plain Old Entropy – Why “log”?, Building intuiDon – ConDnuous variable caveats • Mutual informaDon – DefiniDon/interpretaDon/forms – ConDnuous variables – Dependence/mulDvariate measures • EsDmaDon, Part 1: Discrete variables • DemonstraDon – My first name has 4 leJers, therefore… • InformaDon in human communicaDon (using discrete measures) Why “log”? • How to quanDfy uncertainty? X, H(X) p(X = x) = p(x) = 1/6 x = 1, . . . , 6 • 6*6 = 36 states • log(6*6) = log(6) + log(6) = 2 log(6) AxiomaDc approach • Which funcDons quanDfy uncertainty? – Con(nuous (a small change in p(x) should lead to a small change in our uncertainty) – Increasing (If there are n equally likely outcomes, uncertainty goes up with n) – Composi(on (The uncertainty for two independent coins should equal the sum of uncertainDes for each coin) H(X) = E(log 1/p(x)) X = p(x) log p(x) x Alternate interpretaDon: compression Guess my square game: • I pick a square uniformly at random • You can ask yes/no quesDons to determine the square • How many quesDons are required? • To disDnguish between N squares, we need log2 N quesDons • In Round 2: I prefer the boJom two rows, and half the Dme pick one of those squares • Find the correct square with fewer quesDons on average • Find the correct square with fewer quesDons on average • A distribuDon pHxL Entropy of a ConDnuous Random Variable 1êa 0 a 0 x • What is the probability of observing x= 3.1415926… ? • p(x)dx tells us the probability observe a number in [x,x+dx) (DifferenDal) Entropy dx pHxL • p(x)dx tells us the probability observe a number in [x,x+dx) 1êa 0 Each discrete bin has probability dx/↵ ↵/dx H(X) = X dx/↵ log dx/↵ i=1 = log ↵ Hdif f (X) = Z log dx +1 As dx ! 0 . . . dx p(x) log p(x) = E(log 1/p(x)) a 0 x • Plain Old Entropy – Why “log”?, Building intuiDon – ConDnuous variable caveats • Mutual informa(on – DefiniDon/interpretaDon/forms – ConDnuous variables – Dependence/mulDvariate measures • EsDmaDon, Part 1: Discrete variables • DemonstraDon – My first name has 4 leJers, therefore… • InformaDon in human communicaDon (using discrete measures) Mutual informaDon X Noisy Channel Y C = max I(X : Y ) p(X) Mutual informaDon! Mutual informaDon I(X : Y ) = H(X) + H(Y ) Uncertainty if X and Y are independent Some things to noDce: • Symmetric • A difference of entropies • Non-‐negaDve H(X, Y ) Uncertainty considered as one system Mutual informaDon H(Y |X) = X x p(x)H(Y |X = x) Read off other the ways of describing mutual informaDon: I(X : Y ) = H(X) + H(Y ) = H(X) H(X|Y ) = H(Y ) H(Y |X) H(X, Y ) Independence I(X : Y ) = H(X) + H(Y ) Uncertainty if X and Y are independent H(X) = E (log 1/p(x)) I(X : Y ) = E (log 1/p(x) + log 1/p(y) ✓ ◆ p(x, y) = E log p(x)p(y) I(X : Y ) = 0 H(X, Y ) Uncertainty considered as one system log 1/p(x, y)) ! p(x, y) = p(x)p(y) Extends to CondiDonal Independence • Bayesian networks, e.g., can be read as encoding a set of “condiDonal independence” relaDonships p(X, Y |Z) = p(X|Z)p(Y |Z)8Z X ? Y |Z ! X ? Y |Z ! I(X : Y |Z) = 0 I(X : Y |Z) = H(X|Z) H(X|Z, Y ) First useful(?) property for M.L. I(X : Y ) = 0 ! p(x, y) = p(x)p(y) • You don’t get this for other “correlaDon” measures: (Pearson, Kendall, Spearman…) • MI captures nonlinear relaDonships, the size of MI has many nice interpretaDons • Extends to mulDvariate relaDonships/CMI • But, is it “useful”? We have to esDmate p(x,y) first anyway… • Plain Old Entropy – Why “log”?, Building intuiDon – ConDnuous variable caveats • Mutual informaDon – DefiniDon/interpretaDon/forms – ConDnuous variables – Dependence/mulDvariate measures • Es(ma(on, Part 1: Discrete variables • DemonstraDon – My first name has 4 leJers, therefore… • InformaDon in human communicaDon (using discrete measures) EsDmaDon, Part 1: Discrete Variables h i lim E ĤN (X) = H(X) • An “asymptoDcally unbiased” esDmator: N !1 • x(i) ⇠ p(X), i = 1, . . . , N h i lim E ĤN (X) = H(X) N !1 (i) entropy, the ‘plug-‐in’ esDmator: For discrete x Ĥ(X) = ⇠ p(X), i = 1, . . . , N X p̂(x) log p̂(x) x p̂(x) = (number of times to observe x)/N How well do we do? pHX=iL Entropy HbitsL = 4 1 16 1 2 3 4 5 6 7 10 11 12 13 14 15 16 i Entropy HbitsL = 3.5 N=32 1 9 # states = 16 # samples = 32 p` HX=iL 1 16 8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 i How well do we do? pHX=iL Entropy HbitsL = 4 1 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 i Probability 0.15 0.10 0.05 2.8 True HHXL 3.0 3.2 3.4 3.6 3.8 4.0 Est. entropy HbitsL # states = 16 # samples = 32 Naïve esDmator for MI? Again, standard formula using ✓ observed freq. ◆ counts: p̂(x, y) ˆ I(X : Y ) = E log p̂(x)p̂(y) One way to think of it is as: ˆ : Y ) = Ĥ(X) Ĥ(X|Y ) I(X • (Under-‐esDmate) bias is worse here, fewer samples for each Y. So MI is over-‐esDmated… Bias for MI E.g., for x = 1, . . . , 16 and y = 1, . . . , 16 p(x, y) = 1/(16 · 16) Then I(X : Y ) = 0. Again, let # samples = 2· # states Probability 0.15 True IHX:YL 0.10 0.05 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 Est. MI HbitsL 0.6 Three possible soluDons • AnalyDc esDmate of bias (Panzeri-‐Treves) • Bootstrap • Shuffle Test Bias for MI #states / #samples Probability 0.15 True IHX:YL 0.10 0.05 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 Est. MI HbitsL 0.6 Bias for MI -‐ Bootstrap: generate new samples based on p̂(x, y) -‐ EsDmate bias for those samples, use as correcDon PermutaDon test • For a given set of samples (i) (i) (x , y ), i = 1, . . . , N • Generate many “shuffled” versions (x⇡(i) , y (i) ), i = 1, . . . , N • For these, I(X shuf f le , Y ) = 0 this gives empirical CI for correlaDons to be due to chance. • Plain Old Entropy – Why “log”?, Building intuiDon – ConDnuous variable caveats • Mutual informaDon – DefiniDon/interpretaDon/forms – ConDnuous variables – Dependence/mulDvariate measures • EsDmaDon, Part 1: Discrete variables • Demonstra(on – My first name has 4 leJers, therefore… • InformaDon in human communicaDon (using discrete measures) Example: InformaDon in human speech
© Copyright 2026 Paperzz