201309_SIGCAM_DataAnalysis_monthly report_SD Lin

NTU/Intel M2M Project: Wireless Sensor Networks
Content Analysis and Management Special Interest Group
Data Analysis Team
Monthly Report
1. Team Organization
Principal Investigator: Shou-De Lin
Co-Principal Investigator: Mi-Yen Yeh
Team Members: Chih-Hung Hsieh (postdoc), Yi-Chen Lo (PhD student), Perng-Hwa Kung
(Graduate student), Ruei-Bin Wang (Graduate student), Yu-Chen Lu (Undergraduate student),
Kuan-Ting Chou (Undergraduate student), Chin-en Wang (Graduate student)
2. Discussion with Champions
a. Number of meetings with champion in current month: 1(8/27 phone)
b. Major comments/conclusion from the discussion: PRD for phase II
3. Progress between last month and this month
a. Topic1: Clustering streams using MSWave.
1) Potential Problem: So far, the whole discussions are based on the assumption that
the top coefficient can represent the whole vector. Nevertheless, it is possible to
fail when the vector is not time series. According to the last experiment, we can find
that the convergence of bounds is very bad. Therefore, we think it is very important
to deal with this problem first, or the result might be bad.
2) Exchange Dimension:
-
Define d(dim1,dim2) = d( (a11,a21), (a12,a22) )
Find a best sequence (i1,i2,i3,i4) that minimize ∑d(dim ij , dim ij+1)
Travelling salesman problem (NP hard)
There are some approximate solutions.
3) Two possible directions to work:
3.1) Is it possible that the result by MsWave is better than the original
method?
E.g. Use less frame to summarize.
Since the MsWave is the estimate of the real distance, we think the
result by MsWave would not be better and only save little bandwidth.
3.2) Is there any application which needs the features without sending the
frame to the root?
We think this has to discuss with O since he has more experience in
the video analysis.
b. Topic2: Exploiting Correlation among Sensors
1) Observations on over sampling:
1.1) Reason 1: Each cluster, certain sensors send more.
-
-
Change the “picked sensor” to send more?
Choose sensors have max/min total similarity (centroid) within
the cluster
1. No significant difference
Random choose the one to send more
Parameters to control sample rate
1. Cycle period (5 mod 3)
2. Time rate (1→ 2)
1.2) Reason 2: Unbalanced clustering.
In ( ) are practical sampled rate on expected 2.5%
Cycle period=10, Time rate = 4
Berkeley cluster size:
10(2.59%), 20(2.53%), 13(2.79%), 11(2.71%)
NO2 cluster size:
5(5.07%), 7(3.63%), 10(2.53%), 38(2.57%)
1.3) Summary:
-
Although we have some results but still hard to give explanation
Collecting more trivial trials to report.
c. Topic 3: Distributed Nearest Neighbor Search of Time Series Using Dynamic Time
Warping
1) Progress:
1.1) Rewriting testing code of both frameworks
1.2) Investigation on previous experiments
1.3) Some fixing of segmentation and framework
1.4) Review of segmentation and bounding technique
2) Experiment Conclusion:
-
Smaller S / M, better pruning power
1.
When T is large, the experiments support the intuitive
2.
As S and T are getting larger, the variance of DTWs among all sensors
will be larger i.e. it is easy for Framework 1 and 2 to prune candidate
sensors
3.
As S and T are small, the experiment outcome is likely to violate the
intuitive since it is hard for Framework 1 and 2 to prune candidate
sensors
3) Future Work:
-
-
-
Analyze the number of sites for initialization in Framework 2
Analyze the parameters for segmentation
1.
Any background theory?
2.
Different segmentation accuracy for different frameworks?
Analyze where to use Framework 1 and Framework 2
1.
K large or small?
2.
Shape of time series?
Others
d. Topic 4: Intelligent Transportation System (ITS) Machine Learning: Predict whether
driver will stop at intersection or not without using video data.
1) New region is adopted to generate features:
2) Feature Extraction:
-
We extract the data estimated in the region [15m, 45m] ahead of the
intersection.
-
For the range of 30 meters, we generate 30 feature values per each distance
of one meter
1.
Ex : GPS speed[1], GPS speed[2], …, GPS speed[30] for distance of 15
meters, 16 meters, …, 44 meters.
2.
Because of the sampling rate of GPS is one time per second, some values
of feature have to be generated by means of interpolation.
-
240 Used Feature
1.
GPS speed[1], GPS[2], …, GPS speed[30] for distance of 15 meters,
16 meters, …, 44 meters.
2.
Acceleration_X[1], …, Acceleration_X[30].
3.
Acceleration_Y[1], …, Acceleration_Y[30].
4.
Acceleration_Z[1], …, Acceleration_Z[30].
5.
Orientation_W[1], …, Orientation_W[30].
6.
Orientation_X[1], …, Orientation_X[30].
7.
Orientation_Y[1], …, Orientation_Y[30].
8.
Orientation_Z[1], …, Orientation_Z[30].
3) Experimental Results
-
max validation accuracy: 75.641000
select 27 features: [13, 9, 3, 14, 1, 10, 12, 2, 11, 4, 18, 7, 23, 6, 8, 20, 17, 28,
30, 25, 19, 15, 24, 21, 22, 5, 29]
-
All features are from GPS_Speed group!
-
best (c,g)= [16.0, 0.001953125], cv-acc = 75.641000
testing accuracy = 0.743590 (the best so far!)
4. Brief plan for the next month
a. We will continuous paper survey and refine our proposed approaches.
b. To implement our proposed approaches and evaluate their performance.
5. Research Byproducts
a. Paper: N/A
b. Served on the Editorial Board of International Journals: N/A
c. Invited Lectures: N/A
d. Significant Honors / Awards: N/A