Statistics of Persistence Diagrams Katharine Turner University of Chicago [email protected] Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 1/1 What is a statistic? A statistic (singular) is a quantity that describes some attribute of a sample of data. It summarizes some information about the data. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 2/1 What is a statistic? A statistic (singular) is a quantity that describes some attribute of a sample of data. It summarizes some information about the data. A statistic is found using some statistical algorithm with the set of data as input. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 2/1 What is a statistic? A statistic (singular) is a quantity that describes some attribute of a sample of data. It summarizes some information about the data. A statistic is found using some statistical algorithm with the set of data as input. Basic descriptive statistics are often measures of central tendency and their corresponding measures of variability or dispersion. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 2/1 What is a statistic? A statistic (singular) is a quantity that describes some attribute of a sample of data. It summarizes some information about the data. A statistic is found using some statistical algorithm with the set of data as input. Basic descriptive statistics are often measures of central tendency and their corresponding measures of variability or dispersion. Measures of central tendency include the mean, median and mode. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 2/1 What is a statistic? A statistic (singular) is a quantity that describes some attribute of a sample of data. It summarizes some information about the data. A statistic is found using some statistical algorithm with the set of data as input. Basic descriptive statistics are often measures of central tendency and their corresponding measures of variability or dispersion. Measures of central tendency include the mean, median and mode. Measures of variability include the standard deviation (or variance), the absolute deviation, and the range of the values (distance between the minimum and maximum values of the variables). Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 2/1 Central tendencies as function minimisers Central tendencies (and their corresponding measures of variability) are solutions for optimizing different cost functions. These cost functions are based on Lp norms on function spaces. We mainly care about when p = 1, 2 and ∞. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 3/1 Central tendencies as function minimisers Central tendencies (and their corresponding measures of variability) are solutions for optimizing different cost functions. These cost functions are based on Lp norms on function spaces. We mainly care about when p = 1, 2 and ∞. Given a sample a1 , a2 , . . . , an , some familiar statistical quantities are minimisers of functions of the form !1/p N X Fp (x) = |ai − x|p F∞ (x) = sup |ai − x| i i=1 Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 3/1 Central tendencies as function minimisers Central tendencies (and their corresponding measures of variability) are solutions for optimizing different cost functions. These cost functions are based on Lp norms on function spaces. We mainly care about when p = 1, 2 and ∞. Given a sample a1 , a2 , . . . , an , some familiar statistical quantities are minimisers of functions of the form !1/p N X Fp (x) = |ai − x|p F∞ (x) = sup |ai − x| i i=1 p = 2: the mean minimizes the mean squared error Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 3/1 Central tendencies as function minimisers Central tendencies (and their corresponding measures of variability) are solutions for optimizing different cost functions. These cost functions are based on Lp norms on function spaces. We mainly care about when p = 1, 2 and ∞. Given a sample a1 , a2 , . . . , an , some familiar statistical quantities are minimisers of functions of the form !1/p N X Fp (x) = |ai − x|p F∞ (x) = sup |ai − x| i i=1 p = 2: the mean minimizes the mean squared error p = 1 : the median minimizes absolute deviation Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 3/1 Central tendencies as function minimisers Central tendencies (and their corresponding measures of variability) are solutions for optimizing different cost functions. These cost functions are based on Lp norms on function spaces. We mainly care about when p = 1, 2 and ∞. Given a sample a1 , a2 , . . . , an , some familiar statistical quantities are minimisers of functions of the form !1/p N X Fp (x) = |ai − x|p F∞ (x) = sup |ai − x| i i=1 p = 2: the mean minimizes the mean squared error p = 1 : the median minimizes absolute deviation p = ∞ : the mid-range point minimizes the maximum absolute deviation Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 3/1 Mean and variance for data on the real line The mean of a1 , a2 , . . . aN is the number µ which minimizes the of the mean squared error F2 (x) = N X !1/2 |ai − x|2 i=1 The mean is thus N 1 X µ= ai N i=1 The standard deviation is the value F2 (µ). Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 4/1 Median and absolute deviation for data on the real line The median of a1 , a2 , . . . , aN , written in non-decreasing order, is the number m which minimizes the absolute deviation F1 (x) = N X |ai − x|. i=1 For N odd is unique and is a N+1 . For N even can be any number in the 2 interval [a N , a N+2 ]. The total cost of moving everything from the sample 2 2 data to the median is F1 (m) Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 5/1 Midrange point and range for data on the real line The range of a1 , a2 , . . . , aN , written in non-decreasing order, is aN − a1 N and its midpoint is a1 +a 2 . Consider the function F∞ (x) = max{|ai − x|}. The minimizer of F∞ is the midpoint and its value is half the range. This represents the maximal cost of moving any point to the midpoint. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 6/1 Persistence diagrams Persistence diagrams describe how the topology changes with the progressive inclusion of sublevel sets. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 7/1 Persistence diagrams Persistence diagrams describe how the topology changes with the progressive inclusion of sublevel sets. Definition 2 A persistence diagram is a countable multiset of points in R along with the diagonal ∆ = {(x, y ) ∈ R2 | x = y }, where each point on the diagonal has infinite multiplicity. We require some niceness assumptions. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 7/1 Given two diagrams X , Y we can considers bijections φ between the points in X and the points in Y . Bijections always exist because there are countably many points at every location on the diagonal. We only need to consider bijection where off-diagonal points are either paired with off-diagonal points or with the point on the diagonal that is closest to it. Example: φ : X (red) → Y (blue) Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 8/1 Distance between diagrams There are many choices of metrics in the space of persistence diagrams just like there are different choices of metric on spaces of functions. We will consider three choices which are analogous to L1 , L2 and L∞ on the space of real valued functions. One family of distances are !1/p dp (X , Y ) = inf φ:X →Y X kx − φ(x)kpp x∈X for p ≥ 1 and taking limits for p = ∞ getting d∞ (X , Y ) = Katharine Turner (U of Chicago) inf φ:X →Y max{kx − φ(x)k∞ } Statistics of Persistence Diagrams 9/1 The optimal pairing optimal pairing We will call a bijection between points optimal for dp if it achieves the infimum in the definition of dp . Returning to our example. The optimal choice is the first one for all values of p. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 10 / 1 Non-uniqueness away from the diagonal This example works for every p ∈ [1, ∞]. Matching the points vertically or horizontally involves the same cost. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 11 / 1 Non-uniqueness involving the diagonal, p = 1 Given diagram (red), there region which distinguishes whether it costs less to pair both points to the diagonal (purple) than pairing them to each other (blue). It costs the same on the boundary (green). Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 12 / 1 Non-uniqueness involving the diagonal, p = 2 Given diagram (red), there is a parabola which bounds the region which distinguishes whether it costs less to pair both points to the diagonal than pairing them to each other. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 13 / 1 Statistical qualities can thus be defined on the space of persistence diagrams by analogy using these different distances. Given diagrams X1 , X2 . . . XN let Fp (Y ) = N X !1/p dp (Xi , Y )p F∞ (Y ) = sup d∞ (Y , Xi ). i i=1 The mean µ is the diagram which minimizes F2 and F2 (µ) is the standard deviation. The median m is the diagram which minimizes F1 and F1 (m) is the absolute deviation. The “midpoint” m̂ is the diagram which minimizes F∞ and F∞ (m̂) is the maximal absolute deviation. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 14 / 1 “Mean” of points in R2 and copies of the diagonal Lemma Let (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) be points in the plane. Let x̂ be the mean of a1 , a2 . . . ak and ŷ be the mean of b1 , b2 , . . . , bk . Then ! k ŷ + (n − k) x̂+ŷ k x̂ + (n − k) x̂+ŷ 2 2 , (x, y ) := n n is the point in R2 which minimizes k X k(x, y ) − (ai , bi )k22 + i=1 Katharine Turner (U of Chicago) n X k(x, y ) − ∆k22 i=k+1 Statistics of Persistence Diagrams 15 / 1 “Mean” of points in R2 and copies of the diagonal Lemma Let (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) be points in the plane. Let x̂ be the mean of a1 , a2 . . . ak and ŷ be the mean of b1 , b2 , . . . , bk . Then ! k ŷ + (n − k) x̂+ŷ k x̂ + (n − k) x̂+ŷ 2 2 , (x, y ) := n n is the point in R2 which minimizes k X k(x, y ) − (ai , bi )k22 + i=1 n X k(x, y ) − ∆k22 i=k+1 We call this (x, y ) the “mean” of (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) and n − k copies of the diagonal. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 15 / 1 “Mean” of 3 points in R2 and 2 copies of the diagonal The red, blue and purple points are the (ai , bi ). The black point is their arithmetic mean - (x̂, ŷ ). Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 16 / 1 “Mean” of 3 points in R2 and 2 copies of the diagonal The orange is the point on the diagonal closest to the black. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 17 / 1 “Mean” of 3 points in R2 and 2 copies of the diagonal The green point is the mean of the red, blue, purple and 2 copies of the orange. It is the weighted average of the black and orange. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 18 / 1 “Median” of points in R2 and copies of the diagonal Lemma Suppose k > n/2. Let (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) be points in the plane. Let (x, y ) be the point in R2 where x is the median of a1 , a2 . . . ak with n − k copies of −∞ and y is the median of b1 , b2 , . . . , bk with n − k copies of ∞. Then (x, y ) is the point in R2 which minimizes k X k(x, y ) − (ai , bi )k1 + i=1 Katharine Turner (U of Chicago) n X k(x, y ) − ∆k1 i=k+1 Statistics of Persistence Diagrams 19 / 1 “Median” of points in R2 and copies of the diagonal Lemma Suppose k > n/2. Let (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) be points in the plane. Let (x, y ) be the point in R2 where x is the median of a1 , a2 . . . ak with n − k copies of −∞ and y is the median of b1 , b2 , . . . , bk with n − k copies of ∞. Then (x, y ) is the point in R2 which minimizes k X k(x, y ) − (ai , bi )k1 + i=1 n X k(x, y ) − ∆k1 i=k+1 We call this (x, y ) the “median” of (a1 , b1 ), (a2 , b2 ), . . . , (ak , bk ) and n P−k k copies of the diagonalPifn Pk i=1 k(x, y ) − (ai , bi )k1 + i=k+1 k(x, y ) − ∆k1 < i=1 k∆ − (ai , bi )k1 . Otherwise we say the “median” is the diagonal. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 19 / 1 “Median” of points in R2 and copies of the diagonal Lemma If k < n/2 then k X k(x, y ) − (ai , bi )k1 + i=1 n X k(x, y ) − ∆k1 > i=k+1 k X k∆ − (ai , bi )k1 i=1 for every point (x, y ) ∈ R2 When k < n/2 we say that the “median” of (x1 , y1 ), (x2 , y2 ), . . . , (xk , yk ) and n − k copies of the diagonal is the diagonal. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 20 / 1 “Median” of 3 points in R2 and 2 copies of the diagonal The red, blue and purple points are the (ai , bi ). Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 21 / 1 “Median” of 3 points in R2 and 2 copies of the diagonal To find the x coordinate we take the median of {a1 , a2 , a3 , ∞, ∞}. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 22 / 1 “Median” of 3 points in R2 and 2 copies of the diagonal To find the y coordinate we take the median of {b1 , b2 , b3 , −∞, −∞}. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 23 / 1 “Median” of 3 points in R2 and 2 copies of the diagonal Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 24 / 1 Selections and Matchings Given a set of diagrams X1 , · · · , XN , a selection is a choice of one point from each diagram, where that point could be ∆. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 25 / 1 Selections and Matchings Given a set of diagrams X1 , · · · , XN , a selection is a choice of one point from each diagram, where that point could be ∆. A matching is a set of selections so that every off-diagonal point of every diagram is part of exactly one selection. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 25 / 1 Selections and Matchings Given a set of diagrams X1 , · · · , XN , a selection is a choice of one point from each diagram, where that point could be ∆. A matching is a set of selections so that every off-diagonal point of every diagram is part of exactly one selection. Each matching Φ gives a candidate µΦ for the mean by taking the diagram whose points are the means of each of the selections. The mean is one of these µΦ so we only need to compare the F2 (µΦ ) over all matchings Φ. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 25 / 1 Selections and Matchings Given a set of diagrams X1 , · · · , XN , a selection is a choice of one point from each diagram, where that point could be ∆. A matching is a set of selections so that every off-diagonal point of every diagram is part of exactly one selection. Each matching Φ gives a candidate µΦ for the mean by taking the diagram whose points are the means of each of the selections. The mean is one of these µΦ so we only need to compare the F2 (µΦ ) over all matchings Φ. Each matching Φ gives a candidate mΦ for the median by taking the diagram whose points are the medians of each of the selections. The median is one of these mΦ so we only need to compare the F1 (mΦ ) over all matchings Φ. Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 25 / 1 Description of local minimums of F2 Theorem Let X1 , . . . , Xm be persistence diagrams with only finitely many off-diagonal points. W = {wj } is a local minimum of P 2 1/2 if and only if there is a unique optimal F2 (Y ) = m1 m i=1 d2 (Xi , Y ) pairing from W to each of the Xi , which we denote as φi , and each wj is the arithmetic mean of the points {φi (wj )}i=1,2,...,m . Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 26 / 1 Description of local minimums of F1 Theorem Let X1 , . . . , Xm be persistence diagrams with only finitely many off-diagonal P points. W = {wj } is a local minimum of F1 (Y ) = m1 m i=1 d1 (Xi , Y ) if and only if there whenever φi : W → Xi for every optimal pairings each wj is the “median” of the points {φi (wj )}i=1,2,...,m . Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 27 / 1 Mean (green) of three diagrams Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 28 / 1 Median (green) of three diagrams Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 29 / 1 The End Katharine Turner (U of Chicago) Statistics of Persistence Diagrams 30 / 1
© Copyright 2026 Paperzz