Data Analytics to Support the Acquisition and

Data Analytics to
Support the
Acquisition and Visualisation
of
Real-Time Marine Data
SmartBay Ireland User Workshop
November 10th 2015
Siobhan Moran
Dr. Enda Howley
Dr. Jim Duggan
10/11/20
1
Outline of Talk
MSc Overview
Problem Definition
•
•
•
Study objective
Challenges
Deliverables
Model
•
•
Construction
Solution
o Data collection
o Exploratory data analysis
o Time Series Analysis
Results
•
Preliminary
o
o
Data Analysis
Visualisation
Future Work
Siobhan Moran, Discipline of Information Technology
MSc Overview
IRC Enterprise
Partnership
Irish Research
Council
2 years research Masters
2014 – 2016
SmartBay Ireland
Ltd.
NUIG
Wave Height
Prediction Project
10/11/20
3
Problem Definition – Study Objective 1/3
•
•
global warming and climate change has a huge impact on coastal
communities.
more recent research [1] indicates that the risk of extreme storms along
Irelands west coast has increased by 25% as a direct result of human
induced global warming.
Mean monthly sea level observed at Newlyn UK (1916-2011)
(blue dots) and this is thought indicative of the situation in the
south of Ireland. [2]
[1] Allen, Myles “Loading the weather dice – the role of climate change in storms and floods” EPA, climate change lecture series, The Mansion House,
Dawson Street, Dublin 2, 11th March 2015. Keynote lecture.
[2] Excerpt from climateireland.ie
Siobhan Moran, Discipline of Information Technology
Problem Definition – Study Objective 2/3
Siobhan Moran, Discipline of Information Technology
Problem Definition – Study Objective 3/3
Project Aim
develop an alert system based on predicting wave height for a given time period.
• extreme scenarios :
- prior to extreme weather events it will provide alerts to members of
the public and relevant coastal management bodies, so that
appropriate precautionary measures can be put in place.
• operational day to day activities for marine traffic:
- on a daily practical level, predictions of wave height will be very useful
to marine traffic in planning journeys, routes and berthing windows in
port.
Siobhan Moran, Discipline of Information Technology
Problem Definition – Challenges 1/1
Research Challenges
– explore data analytics approaches to reliably interpret weather
and marine data, sourced from a range of data providers.
– identify the most suitable approach (machine learning / time series)
– identify the data sources for this particular problem.
Siobhan Moran, Discipline of Information Technology
Problem Definition – Deliverables 1/2
1. Analyse data using R and develop a model for predicting
wave height
2. Develop an intuitive visualisation and early warning alert
system.
– represent this data in the form of a web interface that
will offer a range of key functionality.
Siobhan Moran, Discipline of Information Technology
Problem Definition – Deliverables 2/2
The IRC / SmartBay/NUIG research schedule involves 4 specific phases:
Year 1
2014/2015
Year 2
2015/2016
1: Initial specification with key
stakeholders, and experts.
3. Design of a suitable visualisation
system.
2. Analysis of data sources
4. Deployment and user testing.
•
reviewing these at quarterly meetings with the industry partner
Siobhan Moran, Discipline of Information Technology
Model – Construction 1/1
Methodology
•
•
•
•
initial data analysis
AI techniques
model development and validation
visualisation
Tools and Techniques
• R
o R is a language and environment for statistical computing and
graphics.
o data analysis
o many useful packages – caret, ggplot2, forecast
o time series analysis
o modelling
•
other tools
o Java EE, postman, RESTful API’s
o nodeJS. MEAN stack
Siobhan Moran, Discipline of Information Technology
Model – Solution – Data Collection 1/3
SmartBay Ireland Ltd. deployed two Mobilis DB800 buoys off the west
and south coast of Ireland:
• record wave and wind data on a
continuous basis.
• in conjunction with tidal and weather
data from other sources will be
interpreted to identify the key
indicators for influencing and
predicting wave height.
Siobhan Moran, Discipline of Information Technology
Model – Solution – Data Collection 2/3
two sensors targeted:
• TRIAXYS (TRIAXYSTM)
Directional Wave Sensor
- wave data every 26 min approx.
• Airmar Weather Station
- wind, temperature and pressure
data every 3 min approx.
The Galway buoy was deployed in April 2014,
in the SmartBay National Test and
Demonstration Site
The Cork buoy was deployed
off Roches point Lighthouse in
June 2013.
Siobhan Moran, Discipline of Information Technology
Model – Solution – Data Collection 3/3
Airmar weather data
TRIAXYS
wave data
Siobhan Moran, Discipline of Information Technology
Model – Solution – Exploratory Data Analysis 1/3
data sets had to be merged to plot meaningful graphs of various weather factors
against wave data
initial merging
- due to the different time series on each sensor, an algorithm was developed
in R, which converted the timestamp variable in each set to an absolute time
object
- this allowed each timestamp to be
compared with a reference time.
-
corresponding indices were then
included to create a new vector
which ultimately allowed a merging
of the data sets.
Siobhan Moran, Discipline of Information Technology
Model – Solution – Exploratory Data Analysis 2/3
current merging
- subsets of each data set are created, containing only the relevant variables
- a new variable ‘byhour’ is included
- each subset is then aggregated
by this hour variable to get the
hourly mean
- the datasets are then merged
Siobhan Moran, Discipline of Information Technology
Model – Solution – Exploratory Data Analysis 3/3
Some initial plots:
results showed good
correlation between
factors
Graphs showing barometric pressure (top) and significant wave height (bottom)
for Cork sensor data : Oct – Dec 2014
Siobhan Moran, Discipline of Information Technology
Model – Solution – Time Series Analysis 1/3
TIME SERIES
when a variable is measured sequentially in time over or at a fixed interval
(ie the sampling interval )
i.e. a sequence of observations which are ordered in time
• there are various methods for forecasting time series data which predict
future observations based on past information
• predictions are usually uncertain
•
forecasts are more accurate for predicting shorter time periods
Siobhan Moran, Discipline of Information Technology
Model – Solution – Time Series Analysis 2/3
TIME SERIES COMPONENTS
•
•
A trend is an increase
or decrease in data
over a long period of
time.
It can be upward,
downward, linear or
non-linear
trend
•
Seasonal components
are short-term, regular
wave-like patterns,
observed within a year,
usually monthly or
quarterly
seasonal
•
Irregular components
•
Residuals after trend
and seasonality has
been accounted for
irregular
Siobhan Moran, Discipline of Information Technology
Model – Solution – Time Series Analysis 3/3
ARIMA
• a popular forecasting approach: Auto Regressive Integrated Moving Average
• usually superior to smoothing techniques when the data is large and the
correlation between past observations is stable
• the method predicts a value in a time series as a linear combination of it’s
past values and errors
• the first step with ARIMA is to check that your series is stationary
• stationary means that the data has no trend or seasonality
o
o
a stationary process has a mean and variance that do not change over time and the process
does not have trends.
a white noise series is stationary
- it does not matter when you observe it - it should look much the same at any period of time
Siobhan Moran, Discipline of Information Technology
Overview: Results to Date: Aug – Oct 2015
•
Data Analysis
• Subsetted dataframes
• Aggregated hourly
• Merged Jan-Aug 2015
•
Visualisation
• RESTful services
• Java & NetBeans
• NodeJS
• MEAN stack
Siobhan Moran, Discipline of Information Technology
Overview: Results to Date: Aug – Oct 2015
•
Data Analysis
• Subsetted dataframes
• Aggregated hourly
• Merged Jan-Aug 2015
•
Visualisation
• RESTful services
• Java & NetBeans
• NodeJS
• MEAN stack
Siobhan Moran, Discipline of Information Technology
Results – Preliminary -
Data Analysis 1/3
Graphs showing barometric pressure (bottom), wind speed (middle) and significant wave height (top) for
Cork sensor data : Sept 2014 – Aug 2015
Siobhan Moran, Discipline of Information Technology
Results – Preliminary - Plots Q1 2015 2/3
Plots showing Significant Wave Height, Barometric Pressure and Wind Speed for Jan – Mar 2015: Cork Data
Siobhan Moran, Discipline of Information Technology
Results – Preliminary -
Feature Plot 3/3
-0.48
-0.11
-0.40
0.60
-0.04
-0.07
Feature plot showing paired correlations between air temperature, wind speed, barometric pressure and
significant wave height for Cork sensor data Jan – Aug 2015
Numbers indicate correlation between pairs
Siobhan Moran, Discipline of Information Technology
Overview: Results to Date: Aug – Oct 2015
•
Data Analysis
• Subsetted dataframes
• Aggregated hourly
• Merged Jan-Aug 2015
•
Visualisation
• RESTful services
• Java & NetBeans
• NodeJS
• MEAN stack
Siobhan Moran, Discipline of Information Technology
Overview: Results to Date: Aug – Oct 2015
•
Data Analysis
• Subsetted dataframes
• Aggregated hourly
• Merged Jan-Aug 2015
•
Visualisation
• RESTful services
• Java & NetBeans
• NodeJS
• MEAN stack
Siobhan Moran, Discipline of Information Technology
Preliminary- Visualisation– NodeJS
The main idea of Node.js:
use non-blocking, event-driven I/O to remain lightweight and efficient in the face of dataintensive real-time applications that run across distributed devices.
•
web development in a dynamic language (JavaScript) on a VM that is incredibly fast (V8).
•
ability to handle thousands of concurrent connections with minimal overhead on a single process.
•
JavaScript is perfect for event loops with first class function objects and closures. People already know
how to use it this way having used it in the browser to respond to user initiated events.
•
a lot of people already know JavaScript - it is arguably the most popular programming language.
•
using JavaScript on a web server as well as the browser reduces the impedance mismatch between
the two programming environments which can communicate data structures via JSON that work the
same on both sides of the equation.
Siobhan Moran, Discipline of Information Technology
10/11/20
27
Preliminary- Visualisation– MEAN Stack
“MEAN is a full stack JavaScript platform for modern web applications“
•
builds full powerful applications using just JavaScript on both the back-end and
front-end.
MEAN brings together four of the most used and appreciated technologies for
JavaScript development, laying down the foundation for easily building complex web
applications.
Siobhan Moran, Discipline of Information Technology
10/11/20
28
Preliminary- Visualisation– MEAN Stack
MongoDB is the database
where the data is stored
NodeJS is used in the backend
of the app – basically will be the
server on which the app runs
Express is a framework for node
that will make writing the code for
the server a lot easier
Angular is the front end framework
of the app – the interface that users
see
Siobhan Moran, Discipline of Information Technology
10/11/20
29
Future Work- Data Analysis/Visualisation-1/2
V 1.0 Nov ‘15
UNIVARIATE with one
input
V 2.0 Jan ‘16
Server
Server
Web
Client
Web
Client
Dec‘15
UNIVARIATE with
multiple inputs
Proposed timeline to Jan 2016
Siobhan Moran, Discipline of Information Technology
10/11/20
30
Future Work- Data Analysis/Visualisation-2/2
• continue with univariate time series analysis in choosing and fitting models
- Box-Jenkins ARIMA models
• progress to analysis of univariate models with multiple inputs
• evaluate forecasting models
- once the model is selected and parameters estimated, it is used to
make forecasts
- assess the accuracy of the forecasts using a number of developed
methods
• build an initial visualisation system for assessment by SmartBay
Siobhan Moran, Discipline of Information Technology
Thank You For Listening
Any Questions?