Finding Similar Movements in Positional Data Streams Jens Haase and Ulf Brefeld Knowledge Mining & Assessment [email protected] Prague, 27.9.2013 Monday, September 16, 13 Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 2 B. Charlton v F. Beckenbauer David Marsh 1966 World Cup Final, England - W. Germany Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 3 Player Briefing (coach, before game) Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 4 Analyses (newspapers, next day) Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 5 Youth Soccer: Tactics and Paths Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 6 Tactics and Trajectories ๏ Understanding player movements precondition for analyzing tactics ๏ Requires efficient computation of similar movements ๏ This talk: Efficient retrieval of near-duplicate trajectories given a query movement Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 7 and efficiently compute similarities between trajectories in D. That is, given query trajectory q, we aim at finding the N most similar trajectories in D. 3.2 Representation Representation We aim to exploiting the symmetries of the pitch and use Angle/Arc-Length (AAL) [11] transformations to guarantee translation, rotation, and scale invari ๏ Position of = trajectories. player coordinates on the pitchis to represent a move ant representations The main idea of AAL ment p = hx1 , . . . , xm i in terms of distances and angles ๏ ๏ A game of soccer = positional data stream p 7! p̄ = h(↵1 , kv 1 k), . . . , (↵m , kv m k)i, Player trajectory = sequence of consecutive positions (1 where v i = xi xi 1 . The difference v i is called the movement vector at tim ๏ Positions represented by angles wrt reference vector i and the corresponding angle with respect to a reference vector v ref = (1, 0)> is defined vasref (translation, rotation, scale invariant) ✓ ◆ > v i v ref 1 ↵i = sign(v i , v ref ) cos , kv i k kv ref k where the sign function computes the direction (clockwise ore counterclock Vlachos et al. (KDD, 2004) wise) of the movement with respect to the reference. In the remainder, we dis card the norms in EquationKnowledge (1) and represent trajectories by their sequences o Ulf Brefeld Mining & Assessment Group 8 angles, p 7! p̃ = h↵1 , . . . , ↵m i. Monday, September 16, 13 card the norms in Equation (1) and represent trajectories by their sequences of Dynamic Time Warping angles, p 7! p̃ = h↵1 , . . . , ↵m i. this section, we propose a distance measure for trajectories. The propose 3.3 of Dynamic Time Warping resentation the previous section fulfils the required invariance in term In this section,and we propose distance measure some for trajectories. The proposed translation, rotation scalinga [11]. However, movements may star representation the previous required in terms w and end fast, whileof others start section fast andfulfils thenthe slow downinvariance at the end. Thus of ๏translation, rotation and scaling [11]. However, some movements may start should befor independent speed A remed additionallyMovements need to compensate phase shiftsofofplayer trajectories. slow and end fast, while others start fast and then slow down at the end. Thus, mes fromwethe area of speech recognitionforand is called dynamic time warpin need to compensate phase shifts of trajectories. A remedy ๏additionally Dynamic time warping compensates phase shifts TW) [9].comes Given two s= hs1 , . . . and , smisi called and qdynamic = hq1 ,time . . . ,warping qm i an from thesequences area of speech recognition ๏ Distance (DTW) [9]. Given two sequences =⇥ hsR . . , sRm(e.g., i and qEuclidean = hq1 , . . . ,distance) qm i and measure element-wise distance function dist s: R 1 , .! an element-wise distance function distas : Rfollows ⇥ R ! R (e.g., Euclidean distance), define the DTW function g recursively DTW sequences s and q defined we๏define thefor DTW function g recursively as followsrecursively g(;, ;) =g(;, 0 ;) = 0 g(s, ;) =g(s, dist(;, =1 ;) = q) dist(;, q) = 1 8 8 9 9 hq2 ,hq. 2.,..,. q. ,mqm i)i) < g(s, = < g(s, = g(hs , . . . , s i, q) q) =1 ,dist(s q1 ) + min 2 m g(hs , . . . , s i, q) g(s, q) =g(s, dist(s q1 ) +1 , min 2 m : ; : g(hs2 , . . . , sm i, hq2 , . . . , qm i) ; g(hs2 , . . . , sm i, hq2 , . . . , qm i) Dynamic Time Warping Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 9 card the norms in Equation (1) and represent trajectories by their sequences of Dynamic Time Warping angles, p 7! p̃ = h↵1 , . . . , ↵m i. this section, we propose a distance measure for trajectories. The propose 3.3 of Dynamic Time Warping resentation the previous section fulfils the required invariance in term Rabiner & Juang (1993) In this section,and we propose a [11]. distance measure some for trajectories. The proposed translation, rotation scaling However, movements may star representation the previous required in terms w and end fast, whileof others start section fast andfulfils thenthe slow downinvariance at the end. Thus of ๏translation, rotation and scaling [11]. However, some movements may start should befor independent speed A remed additionallyMovements need to compensate phase shiftsofofplayer trajectories. slow and end fast, while others start fast and then slow down at the end. Thus, mes fromwethe area of speech recognitionforand is called dynamic time warpin need to compensate phase shifts of trajectories. A remedy ๏additionally Dynamic time warping compensates phase shifts TW) [9].comes Given two s= hs1 , . . . and , smisi called and qdynamic = hq1 ,time . . . ,warping qm i an from thesequences area of speech recognition ๏ Distance (DTW) [9]. Given two sequences =⇥ hsR . . , sRm(e.g., i and qEuclidean = hq1 , . . . ,distance) qm i and measure element-wise distance function dist s: R 1 , .! an element-wise distance function distas : Rfollows ⇥ R ! R (e.g., Euclidean distance), define the DTW function g recursively DTW sequences s and q defined we๏define thefor DTW function g recursively as followsrecursively g(;, ;) =g(;, 0 ;) = 0 O(|s||q|) g(s, ;) =g(s, dist(;, =1 ;) = q) dist(;, q) = 1 8 8 9 9 hq2 ,hq. 2.,..,. q. ,mqm i)i) < g(s, = < g(s, = g(hs , . . . , s i, q) q) =1 ,dist(s q1 ) + min 2 m g(hs , . . . , s i, q) g(s, q) =g(s, dist(s q1 ) +1 , min 2 m : ; : g(hs2 , . . . , sm i, hq2 , . . . , qm i) ; g(hs2 , . . . , sm i, hq2 , . . . , qm i) Dynamic Time Warping Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 10 Approximate DTW The time complexity of DTW is O(|s||q|) which is clearly intractable for c puting similarities of thousands of trajectories. However, recall that we ai finding the N best matches for a given query. This allows for pruning s ๏ Approximate DTW computationsDTW using lower bounds f , i.e., f (s, q) g(s, q), with an by lower bounds propriate function f that can be more efficiently computed than g [10]. We ๏ Focus on characteristic values two different lower bound functions, fkim [6] and fkeogh [5], that are defi as follows: fkim focuses on the first, last, greatest, and smallest values of ๏ Kim et al. (ICDE, 2001) sequences [6] ๏ ๏ first, last, greatest, smallest value fkim (s, q) = max {|s1 q1 |, |sm Keogh (VLDB, 2002) qm |, | max(s) max(q)|, | min(s) min(q)|} and can be computed in O(m). However, the greatest (or smallest) entr minimum/maximum values the ๏transformed paths is always closeof or subsequences identical to ⇡ (or ⇡) and can be ignored. Consequentially, the time complexity reduces to O(1) [10]. ๏ Complexity in O(|s|) second lower bound fkeogh [5] uses minimum `i and an maximum values u subsequences of the query q given by Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group `i = min(qi r , . . . , qi+r ) and ui = max(qi 11 r , qi+r ), ing them with Algorithm 1. The idea same bucket, so that all objects of a b near-neighbours. An interesting equiv based hashes (DBH) [1] that can be a two randomly drawn members s , s 2 D we define the function h as follows: Athitsos et al. (2008), 1 2 Gionis et al., (1999) metric) distance measures. 2 the function h a two randomly drawn 2 dist(s, D we define dist(s, s1 )2members + dist(s1s,1s,To )22 define s ) 2s 2 a hash .family for our pu hs1 ,s2 (s) = 2 dist(s , s ) 1 2 ๏ randomly two drawn members s , s 2 D we define the function h as2 follows: 1 2 h : D ! R that maps trajectory Distance-based hash function 2 dist(s, s1 ) + dist(s1 , s2 ) a dist(s, s2 )s2 2 hsuse (s)2identity = . 1 ,s2the In the remainder, we will dist(s , s ) = f (s , s ) for sim2 2 1 2 kims1) 2 dist(s dist(s, s1 ) + dist(s1 , s21) 2 dist(s, 2 , s2 ) Locality Sensitive Hashing . plicity. Tohscompute a discrete hash value for s we verify whether h(s) lies in a 1 ,s2 (s) = 2 dist(s1 , s2 ) certain interval [t1 ,remainder, t2 ], In the we will use the identity dist(s1 , s2 ) = fkim (s1 , s In the we will use the identity dist(s (swe plicity. To compute a⇢discrete hash whether h( 1 , svalue 2 ) = ffor 1 , sverify 2 ) for simkims s1 remainder, and s2 randomly Kim2et[tal. 1 : hs1use (s) ,(ICDE, t2 ] 2001) [t1 ,t2 ] ,s 1 2 plicity. Tofrom compute a interval discrete hash for s as wedistance verify function whether h(s) lies in a hs1 ,s2 (s) certain [t1= , tvalue drawn database 2 ], 0: otherwise certain interval [t1 , t2 ], ⇢ 1 : hsso 2 [t [t1and ,t2 ] t2 are chosen ⇢ 1 , t2 ] 1 ,sthat 2 (s)the Optimally, thedetermined interval boundaries t probability ๏ Bucket 1 by 1h:s1h,ss12,s(s) = 1 , t2 ] [t1 ,t2 ] 2 (s) 2 [t : otherwise that a randomly drawn s 22(s) X lies and with 50% chance hs1 ,s = with 50% chance 0within 0: otherwise outside of the interval. The set T defines the set of admissible intervals, ๏ Set of admissible intervals Optimally, the interval boundaries t1 and t2 are chosen so that the n o Optimally, the interval boundaries t[t11 ,tand t2 are chosen so that probability [t1 ,t2the ] 2] X= lies T (s1 , s2 )that = a[trandomly , t2 ] : P rdrawn (hs1 ,ss2 2 (s)) 0)with = P r50% (hschance (s))within = 1) and . with 5 ,s 1 D D 1 2 that a randomly drawn s 2 X lies with 50% chance within and with 50% chance thesetinterval. The the set of admissible interva outside of the outside interval.of The T defines theset setTofdefines admissible intervals, Given h and T we can now ndefine the DBH hash family that can be directly n o [t1 ,t2 ] [t ,t ] 1 2 [t1,,tt2 ]] : P r (hs ,s (s)) [t1= ,t2 ]0) = P r (hs ,s (s)) integrated in standard LSH algorithms: , s ) = [t 1 2 1 2 D D 12 1 2 T (s1 , s2 ) = T[t(s , t ] : P r (h (s)) = 0) = P r (h . 1 2 Ulf Brefeld Knowledge Mining & Assessment Group 1 2 D s1 ,s2 D s1 ,s2 (s)) = 1) n o Monday, September 16, 13 [t1 ,t2 ] Empirical Evaluation ๏ DEBS Grand Challenge http://www.orgs.ttu.edu/debs2013/index.php?goto=cfchallengedetails ๏ 8 vs. 8 soccer game recorded by Fraunhofer IIS ๏ In total 33 sensors ๏ Ulf Brefeld Monday, September 16, 13 ๏ 1 sensor per shoe (200Hz) ๏ 1 sensor in the ball (2000Hz) 15,000 positions per second (3 dimensional) Knowledge Mining & Assessment Group 13 Coordinates on the Pitch ๏ Coordinate system, origin (0,0) is at kick-off ๏ Discarding additional data, players are represented by triplet: (sensor/player id, timestamp, player coordinates) Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 14 Representation ๏ ๏ Further preprocessing: ๏ Discarding positions outside of the pitch ๏ Removing half-time effect of changing sides ๏ Averaging player positions over 100ms Trajectory windows of size 10 t+1 Ulf Brefeld Monday, September 16, 13 t+2 Knowledge Mining & Assessment Group 15 Evaluation ๏ Given: a query trajectory ๏ Task: Find near-duplicates ๏ ๏ Focus on 15k consecutive positions of one player ๏ Ulf Brefeld Monday, September 16, 13 (i.e., N=1000 most similar trajectories) (for baseline comparisons) Knowledge Mining & Assessment Group 16 Run-time ๏ Exact computation infeasible ๏ Dynamic time warping very effective ๏ DBH adds only little Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 17 nof. trajetories Pruned Trajectories Kim Keough DBH total 1000 0.00% 0% 11.42% 11.42% 5000 0.28% 34.00% 16.33% 50.61% 10000 9.79% 41.51% 17.80% 60.10% 15000 17.5% 46.25% 11.82% 75.57% ๏ Effectiveness of DBH depends only on data ๏ Kim and Keogh effective for constant N Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 18 DBH Accuracy ๏ On average DBH performs very accurate ๏ However, worst cases clearly inappropriate Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 19 Example Query: Ulf Brefeld Monday, September 16, 13 Retrieved: Knowledge Mining & Assessment Group 20 Conclusion ๏ Efficient computation of near duplicate movements in positional data streams ๏ Dynamic time warping (DTW) ๏ Distance-based hashing (DBH) ๏ (Super-)linear complexity ๏ Accurate results Ulf Brefeld Monday, September 16, 13 Knowledge Mining & Assessment Group 21
© Copyright 2026 Paperzz