Travel Time Estimation of a Path using Sparse Trajectories Dr. Yu Zheng [email protected] Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University r1 r5 r2 r3 r6 r7 r4 Tr1 Tr2 Tr3 Tr4 Goal • Estimate the travel time of any given path – on road network instantly – using historical and current trajectories generated by a sample of vehicles D S Challenges • Data sparsity • Trajectory concatenation – Multiple ways to combine sub-trajectories – Length of a sub-trajectory and its support • Scalability and efficiency – A citywide estimation – E.g. Beijing has over 100,000 road segments S r1 r5 𝑟1 → 𝑟2 → 𝑟3 → 𝑟4 r2 r3 r6 r7 r4 D Tr1 Tr2 Tr3 Tr4 𝑟1 + 𝑟2 + 𝑟3 (𝑟1 → 𝑟2 ) + 𝑟3 𝑟1 + (𝑟2 → 𝑟3 ) Methodology • Framework – Context-Aware Tensor Decomposition (CATD) – Optimal Concatenation (OC) MapMatching Trajectories Tensor Decomposition Tensor Construction Ar Road Networks Trajectory Database Frequent Trajectory Pattern Mining Context Feature Extraction Features Arec Patterns Optimal Concatenation Path cost Methodology • Supplementing missing values g1 g2 g3 g4 g5 g6 g7 g8 g9 R10 g11 g12 MG= g13 g14 g15 g16 g1 g2 g3 tj tk t'j t'k g16 p2 0 21 0 0 1 0 6 17 9 0 14 tj 0 t1 t2 22 0 16 8 0 0 0 27 ti tj 0 0 11 0 0 42 15 0 0 0 0 31 0 0 0 tn 0 r1 0 tj 0 0 35 0 p3 r4 0 0 42 g1 g2 g3 Ar g1 g2 g16 p t M G= t i i+1 g16 7 r2 Tr1 v1 v2 v3 r1 v4 v4 v6 r2 v5 r4 r3 0 r5 r4 Tr3 trk1: (v1, v2, v4) r2: (v4, v5, v6t')k r3: (v5, v7) t'j Y r4: (v2, v3) uM r1 r2 u2 u1 rN r1 r2 rN r2 r3 Xh Xr r3 r1 d1 v5 v7 Tr2 r6 r1 r2 rN f1 f2 fr fq fp Methodology • Supplementing missing values Ah X tj tk t'j t'k g1 g2 g16 Xh Ar t'k t'j Y tk tj uM Xr r1 r2 𝜆3 2 2 + 𝑅 fp 2 + 𝑈 A = Ar || Ah rN 1 𝒜 − 𝑆 ×𝑅 𝑅 ×𝑈 𝑈 × 𝑇 𝑇 2 𝑆 fr fq rN u2 u1 r1 r2 ℒ 𝑆, 𝑅, 𝑈, 𝑇, 𝐹, 𝐺 = f1 f2 2 + 𝑇 2 + 2 𝜆1 𝑋 − 𝑇𝐺 2 + 𝐹 2 + 𝐺 2 + 2 𝜆2 𝑌 − 𝑅𝐹 2 2 + Methodology • Framework MapMatching Trajectories Tensor Decomposition Tensor Construction Ar Road Networks Trajectory Database Frequent Trajectory Pattern Mining Context Feature Extraction Features Arec Patterns Optimal Concatenation Path cost Methodology • Optimal Concatenation Suppose 𝑷 is decomposed as 𝑃1 ||𝑃2 || ⋯ ||𝑃𝑘 true travel time: 𝜇𝑷 . estimated travel time: 𝑡𝑃1 + 𝑡𝑃2 + ⋯ + 𝑡𝑃𝑘 𝐿𝑆𝐸𝑷,𝑃1,𝑃2,⋯,𝑃𝑘 ≜ 𝐸 𝜇𝑷 − 𝑡𝑃1 − 𝑡𝑃2 − ⋯ − 𝑡𝑃𝑘 argmin𝑃1,𝑃2,⋯,𝑃𝑘 𝐿𝑆𝐸𝑷,𝑃1,𝑃2,⋯,𝑃𝑘 , subject to 𝑃1 ||𝑃2 || ⋯ ||𝑃𝑘 = 𝑷 S 𝑡𝑃1 𝑡𝑃2 𝑡𝑃𝑘 D 𝑃1 𝑃2 𝑃𝑘 𝜇𝑷 2 Methodology • Optimal Concatenation 𝐿𝑆𝐸𝑷,𝑃1 ,𝑃2 ,⋯,𝑃𝑘 = 𝐸 𝜇𝑷 − 𝑡𝑃1 − 𝑡𝑃2 − ⋯ − 𝑡𝑃𝑘 = 𝐸 𝜇𝑃1 + 𝜇𝑃2 + ⋯ 𝜇𝑃𝑘 − 𝑡𝑃1 − 𝑡𝑃2 − ⋯ − 𝑡𝑃𝑘 𝑘 𝑖=1 =𝐸 𝜇𝑃𝑖 − 𝑡𝑃𝑖 2 𝑘 𝑖=1 + 𝑘 𝑘 𝑗=1(𝜇𝑃𝑖 𝑘 𝑖=1 2 − 𝑡𝑃𝑖 )(𝜇𝑃𝑗 − 𝑡𝑃𝑗 ) 𝑘 𝐸(𝜇𝑃𝑖 − 𝑡𝑃𝑖 )2 + = 2 𝐸 (𝜇𝑃𝑖 − 𝑡𝑃𝑖 )(𝜇𝑃𝑗 − 𝑡𝑃𝑗 ) 𝑖=1 𝑗=1 assuming 𝑡𝑃𝑖 and 𝑡𝑃𝑗 are independent 𝐸 (𝜇𝑃𝑖 − 𝑡𝑃𝑖 )(𝜇𝑃𝑗 − 𝑡𝑃𝑗 ) = 𝐸 𝜇𝑃𝑖 − 𝑡𝑃𝑖 𝐸(𝜇𝑃𝑗 − 𝑡𝑃𝑗 )=0 𝐿𝑆𝐸𝑷,𝑃1,𝑃2,⋯,𝑃𝑘 = 𝐸(𝜇𝑃𝑖 − 𝑡𝑃𝑖 )2 = 𝐸(𝜇𝑃𝑖 − = 1 2 𝑛𝑃 𝑛 𝑃𝑖 𝑖 𝑗=1 𝐸(𝜇𝑃𝑖 − 1 𝑛𝑃𝑖 𝑘 𝑖=1 𝐸(𝜇𝑃𝑖 − 𝑡𝑃𝑖 )2 𝑛 𝑃𝑖 𝑗=1 𝟏 𝑡𝑃𝑖 ,𝑗 )2 = 𝒏 𝑷𝒊 𝑡𝑃𝑖 ,𝑗 )2 = 1 𝐸 𝑛𝑃2 𝑖 𝑽𝒂𝒓(𝒕𝑷𝒊 ,𝒋 ) 𝑛 𝑃𝑖 (𝜇𝑃𝑖 − 𝑡𝑃𝑖 ,𝑗 )2 𝑗=1 Methodology • Support 𝑛𝑃𝑖 vs. Variance 𝑉𝑎𝑟 𝑡𝑃𝑖 ,𝑗 – The bigger the support is, the smaller the error is – The bigger the variance the bigger the error argmin𝑃1,𝑃2,⋯,𝑃𝑘 1 𝑘 𝑖=1 𝑛 𝑃𝑖 𝑉𝑎𝑟(𝑡𝑃𝑖,𝑗 ) subject to 𝑃1 ||𝑃2 || ⋯ ||𝑃𝑘 = 𝑷 • Solved by dynamic programming 1 – Denote 𝑔 𝑃𝑖 = 𝑛 𝑃𝑖 𝑉𝑎𝑟(𝑡𝑃𝑖 ,𝑗 ) argmin𝑃1,𝑃2,⋯,𝑃𝑙 𝑙 𝑗=1 𝑔 𝑃𝑗 subject to 𝑃1 ||𝑃2 || ⋯ ||𝑃𝑙 = 𝑃′. 𝑜𝑝𝑡𝑛 = min (𝑜𝑝𝑡𝑖 + 𝑔(𝑃𝑟𝑖+1||𝑟𝑖+2⋯||𝑟𝑛 ) 1≤𝑖<𝑛 Methodology • Make optimal concatenation more efficient • Frequent trajectory pattern mining – Not necessary to check every combination – Using suffix-tree-based method P: r1 r2 Root 2 r1 r5 r2 r3 r6 r7 r4 Tr1 Tr2 Tr3 Tr4 r1 r2 2 2 1 r2 1 r3 3 r3 1 r6 3 1 (1 1 r3 r5 1 1 r7 r2 1 r6 r7 Pattern r 1 r6 r3 A) An example of suffix-tree B) Filling Methodology • Combining the suffix tree with tensors – Searching for frequent trajectory pattern from the suffix tree – Find the travel time of a particular user from the tensor P: r1 Root 3 3 1 r3 r4 tr3 (1) 1 r3 r5 1 1 r7 r2 t r4,u2,k , t r4,u3,k , t r4,u4,k 1 r6 r4,u2,k = tr3,u2,k+t r4,u2,k tk r7 (3) tj uM Patterns: (1) r2 r3, r3 r2 1 r3r r1 r2 5 mple of suffix-tree r3 r6 r7 r4 Tr1 Tr2 Tr3 Tr4 r4 Tr2,Tr3,Tr4 (u2, u3, u4) u2 u1 r1 r2 (3) rN Arec B) Filling in the missing time for a pattern N Methodology • Deal with efficiency and scalability – – – – data-driven space partition an element-wise optimization algorithm Use trajectory patterns as concatenation candidates Indexing recent trajectories for a fast online retrieval Root g1 g2 g1 tr tr 1 g2 1 tr tr g4 g3 g4 1 r2 1 r2 Tr1 Tr2 r1 Tr1 Tr2 r2 tr 1 r2 r3 Tr2 r6 Tr1 r2 Tr2 Tr2 2 2 r3 g3 tr tr tr 2 r3 tr 1 r3 r6 r3 r6 t r Tr 1 Tr 1 tr Tr2 Tr 1 tr r2 3 r6 6 2 r6 Experiments • Datasets – Taxi Trajectories: • • • • • Generated by 32,670 taxicabs in Beijing From Sept. 1 to Oct. 31, 2013. 673+ million GPS points Total length: over 26 million km. Sampling rate: 96 seconds per point. – Road networks: • 148,110 nodes and 196,307 edges. • Covers a 40×50km spatial range • Total length of road segments: 21,985km. – POIs: • Th273,165 POIs of Beijing • 195 tier two categories. • Chose the top 10 categories that occur around road segments Download data here Experiments 0.74 – 𝒜r /(5×5): 4,736×12,674×4 – 0.09% 300 MAE Time Cost MAE (min) 0.73 MAE (min) RMSE 250 200 0.72 150 100 0.71 Time Cost (minute) • Performance of CATD 50 𝑇𝐷 0.747 1.646 0.70 𝑇𝐷 + 𝐻 𝐶𝐴𝑇𝐷 (𝑇𝐷 + 𝐻 + 𝐶) 0.732 1.629 0.714 1.613 0 2 3 4 5 Number of time slices L 1.0 180 160 𝑖 |𝑦𝑖 −𝑦𝑖 | 𝑀𝐴𝐸 = tj g2 g1 g2 𝑖 |𝑦𝑖 −𝑦𝑖 | 𝑀𝑅𝐸 = 𝑖 𝑦𝑖 u2 • g1 𝑛 u 𝑅𝑀𝑆𝐸1 = r r 1 2 MAE (min) 120 tk uM • 140 0.8 100 80 0.7 60 40 0.6 2 𝑖(𝑦𝑖 −𝑦𝑖 ) 𝑛rN g4 g3 g4 20 g3 0 0.5 1 2 3 4 5 The number of partitions (hxh) Time cost (min) Metrics • MAE (min) Time cost (min) 0.9 Experiments • Query paths – From taxi trajectories • • • • 12,384 queries, 50 paths per hour/day Traveled by at least two drivers Total length 76,412.6km Effective time span: 2,734 hours – In the field study • • • • 114 queries from Sept. 1 to Oct. 30, 2013 Total length 999km Effective time span: 62 hours B) Distribution of the length A) Geographical distribution C) Distribution of time length Experiments • Baselines – – – – Speed-Constraint-based (SC) method Trajectory-based Simple Concatenation (TSC) method. Optimal Concatenation with Historical Travel Time (OC+H) method. Optimal Concatenation with Nonnegative Matrix Factorization (OC+MF) Query paths from taxi data 𝑆𝐶 𝑇𝑆𝐶 𝑂𝐶 + 𝐻 𝑂𝐶 + 𝑀𝐹 𝑷𝑻𝑻𝑬 MAE (min) 8.808 5.244 3.245 3.061 2.545 MRE 0.665 0.396 0.245 0.231 0.192 MAE/L (min/km) 1.428 0.850 0.526 0.496 0.412 In-the-field study 𝑆𝐶 𝑇𝑆𝐶 𝑂𝐶 + 𝐻 𝑂𝐶 + 𝑀𝐹 𝑷𝑻𝑻𝑬 MAE (min) 18.193 11.300 4.990 4.052 3.771 MRE 0.561 0.349 0.154 0.125 0.116 MAE/L (min/km) 2.075 1.289 0.569 0.462 0.430 Experiments • Efficiency Components Time Memory (MB) Building matrix 𝑋, 𝑌 34ms 9 𝒜𝑟 44ms 4.4 𝒜ℎ 233ms 14.6 5×5 6.31min 6.4min 2.3ms 116 144 995 w/o trajectory patterns 8.6ms 877 w/o index 12.2s 714 Decomposition Total Best OC 800 700 2 600 Size of Index (MB) P: r1 r2 r3 r4 tr3 Root r1 r2 2 500 400 r2 300 r3 1 200 3 2 1 r3 r6 3 1 1 r3 r5 1 1 r7 8 1 r6 Patterns: 6 r2 r3, r3 r4 1 r6 7 (3) tj Time/Query (ms) MAE 2.8 uM (1) r2 1 t r4,u2,k , t r4,u3,k , t r4,u4,k tk r7 3.0 r4,u2,k = tr3,u2,k+t r4,u2,k 9 (1) Time/Query (ms) Optimal Concatenation (OC) Tr2,Tr5 3,Tr4 (u2, u3, u4) r3 A) An example of suffix-tree 2.6 u2 u1 r1 r2 (3) rN Arec 2.4 4 B) Filling in the missing time for a pattern 3 100 2.2 2 0 0 200 400 600 Support 800 1000 1 2.0 0 200 400 600 Support 800 1000 MAE Tensor construction Deal with missing values (𝐶𝐴𝑇𝐷) Conclusion • A very fundamental but challenging task – Data sparsity – Trade off between length of a path and support of trajectories – Efficiency and scalability • Our method – Context-Aware Tensor Decomposition – Optimal Concatenation • Results – Effectiveness • 12,384 query paths and 114 in the field study • Relative error ratio: 19% and 11.6% – Efficiency • Partition a city into grids • Suffix-tree-based pattern mining and index • Infer the travel time of a path in 2.3ms Download data and codes Search for “Urban Computing” Thanks! Yu Zheng [email protected] Homepage Experiments Download data here • Datasets – Taxi Trajectories: • • • • • Generated by 32,670 taxicabs in Beijing From Sept. 1 to Oct. 31, 2013. 673+ million GPS points Total length: over 26 million km. Sampling rate: 96 seconds per point. Max. Num. of Trajectories (Support) 10.0k 8.0k 6.0k 4.0k 2.0k 0.0 0 – Road networks: – POIs: • Th273,165 POIs of Beijing • 195 tier two categories. • Chose the top 10 categories that occur around road segments 20 30 40 50 Length of a Path 1-2 3-4 5-6 7-8 >8 50k Num. of Road Segments • 148,110 nodes and 196,307 edges. • Covers a 40×50km spatial range • Total length of road segments: 21,985km. 10 40k 30k 20k 10k 0 0 4 8 12 Time of Day 16 20
© Copyright 2025 Paperzz