Data Analytics to Support the Acquisition and Visualisation of Real-Time Marine Data SmartBay Ireland User Workshop November 10th 2015 Siobhan Moran Dr. Enda Howley Dr. Jim Duggan 10/11/20 1 Outline of Talk MSc Overview Problem Definition • • • Study objective Challenges Deliverables Model • • Construction Solution o Data collection o Exploratory data analysis o Time Series Analysis Results • Preliminary o o Data Analysis Visualisation Future Work Siobhan Moran, Discipline of Information Technology MSc Overview IRC Enterprise Partnership Irish Research Council 2 years research Masters 2014 – 2016 SmartBay Ireland Ltd. NUIG Wave Height Prediction Project 10/11/20 3 Problem Definition – Study Objective 1/3 • • global warming and climate change has a huge impact on coastal communities. more recent research [1] indicates that the risk of extreme storms along Irelands west coast has increased by 25% as a direct result of human induced global warming. Mean monthly sea level observed at Newlyn UK (1916-2011) (blue dots) and this is thought indicative of the situation in the south of Ireland. [2] [1] Allen, Myles “Loading the weather dice – the role of climate change in storms and floods” EPA, climate change lecture series, The Mansion House, Dawson Street, Dublin 2, 11th March 2015. Keynote lecture. [2] Excerpt from climateireland.ie Siobhan Moran, Discipline of Information Technology Problem Definition – Study Objective 2/3 Siobhan Moran, Discipline of Information Technology Problem Definition – Study Objective 3/3 Project Aim develop an alert system based on predicting wave height for a given time period. • extreme scenarios : - prior to extreme weather events it will provide alerts to members of the public and relevant coastal management bodies, so that appropriate precautionary measures can be put in place. • operational day to day activities for marine traffic: - on a daily practical level, predictions of wave height will be very useful to marine traffic in planning journeys, routes and berthing windows in port. Siobhan Moran, Discipline of Information Technology Problem Definition – Challenges 1/1 Research Challenges – explore data analytics approaches to reliably interpret weather and marine data, sourced from a range of data providers. – identify the most suitable approach (machine learning / time series) – identify the data sources for this particular problem. Siobhan Moran, Discipline of Information Technology Problem Definition – Deliverables 1/2 1. Analyse data using R and develop a model for predicting wave height 2. Develop an intuitive visualisation and early warning alert system. – represent this data in the form of a web interface that will offer a range of key functionality. Siobhan Moran, Discipline of Information Technology Problem Definition – Deliverables 2/2 The IRC / SmartBay/NUIG research schedule involves 4 specific phases: Year 1 2014/2015 Year 2 2015/2016 1: Initial specification with key stakeholders, and experts. 3. Design of a suitable visualisation system. 2. Analysis of data sources 4. Deployment and user testing. • reviewing these at quarterly meetings with the industry partner Siobhan Moran, Discipline of Information Technology Model – Construction 1/1 Methodology • • • • initial data analysis AI techniques model development and validation visualisation Tools and Techniques • R o R is a language and environment for statistical computing and graphics. o data analysis o many useful packages – caret, ggplot2, forecast o time series analysis o modelling • other tools o Java EE, postman, RESTful API’s o nodeJS. MEAN stack Siobhan Moran, Discipline of Information Technology Model – Solution – Data Collection 1/3 SmartBay Ireland Ltd. deployed two Mobilis DB800 buoys off the west and south coast of Ireland: • record wave and wind data on a continuous basis. • in conjunction with tidal and weather data from other sources will be interpreted to identify the key indicators for influencing and predicting wave height. Siobhan Moran, Discipline of Information Technology Model – Solution – Data Collection 2/3 two sensors targeted: • TRIAXYS (TRIAXYSTM) Directional Wave Sensor - wave data every 26 min approx. • Airmar Weather Station - wind, temperature and pressure data every 3 min approx. The Galway buoy was deployed in April 2014, in the SmartBay National Test and Demonstration Site The Cork buoy was deployed off Roches point Lighthouse in June 2013. Siobhan Moran, Discipline of Information Technology Model – Solution – Data Collection 3/3 Airmar weather data TRIAXYS wave data Siobhan Moran, Discipline of Information Technology Model – Solution – Exploratory Data Analysis 1/3 data sets had to be merged to plot meaningful graphs of various weather factors against wave data initial merging - due to the different time series on each sensor, an algorithm was developed in R, which converted the timestamp variable in each set to an absolute time object - this allowed each timestamp to be compared with a reference time. - corresponding indices were then included to create a new vector which ultimately allowed a merging of the data sets. Siobhan Moran, Discipline of Information Technology Model – Solution – Exploratory Data Analysis 2/3 current merging - subsets of each data set are created, containing only the relevant variables - a new variable ‘byhour’ is included - each subset is then aggregated by this hour variable to get the hourly mean - the datasets are then merged Siobhan Moran, Discipline of Information Technology Model – Solution – Exploratory Data Analysis 3/3 Some initial plots: results showed good correlation between factors Graphs showing barometric pressure (top) and significant wave height (bottom) for Cork sensor data : Oct – Dec 2014 Siobhan Moran, Discipline of Information Technology Model – Solution – Time Series Analysis 1/3 TIME SERIES when a variable is measured sequentially in time over or at a fixed interval (ie the sampling interval ) i.e. a sequence of observations which are ordered in time • there are various methods for forecasting time series data which predict future observations based on past information • predictions are usually uncertain • forecasts are more accurate for predicting shorter time periods Siobhan Moran, Discipline of Information Technology Model – Solution – Time Series Analysis 2/3 TIME SERIES COMPONENTS • • A trend is an increase or decrease in data over a long period of time. It can be upward, downward, linear or non-linear trend • Seasonal components are short-term, regular wave-like patterns, observed within a year, usually monthly or quarterly seasonal • Irregular components • Residuals after trend and seasonality has been accounted for irregular Siobhan Moran, Discipline of Information Technology Model – Solution – Time Series Analysis 3/3 ARIMA • a popular forecasting approach: Auto Regressive Integrated Moving Average • usually superior to smoothing techniques when the data is large and the correlation between past observations is stable • the method predicts a value in a time series as a linear combination of it’s past values and errors • the first step with ARIMA is to check that your series is stationary • stationary means that the data has no trend or seasonality o o a stationary process has a mean and variance that do not change over time and the process does not have trends. a white noise series is stationary - it does not matter when you observe it - it should look much the same at any period of time Siobhan Moran, Discipline of Information Technology Overview: Results to Date: Aug – Oct 2015 • Data Analysis • Subsetted dataframes • Aggregated hourly • Merged Jan-Aug 2015 • Visualisation • RESTful services • Java & NetBeans • NodeJS • MEAN stack Siobhan Moran, Discipline of Information Technology Overview: Results to Date: Aug – Oct 2015 • Data Analysis • Subsetted dataframes • Aggregated hourly • Merged Jan-Aug 2015 • Visualisation • RESTful services • Java & NetBeans • NodeJS • MEAN stack Siobhan Moran, Discipline of Information Technology Results – Preliminary - Data Analysis 1/3 Graphs showing barometric pressure (bottom), wind speed (middle) and significant wave height (top) for Cork sensor data : Sept 2014 – Aug 2015 Siobhan Moran, Discipline of Information Technology Results – Preliminary - Plots Q1 2015 2/3 Plots showing Significant Wave Height, Barometric Pressure and Wind Speed for Jan – Mar 2015: Cork Data Siobhan Moran, Discipline of Information Technology Results – Preliminary - Feature Plot 3/3 -0.48 -0.11 -0.40 0.60 -0.04 -0.07 Feature plot showing paired correlations between air temperature, wind speed, barometric pressure and significant wave height for Cork sensor data Jan – Aug 2015 Numbers indicate correlation between pairs Siobhan Moran, Discipline of Information Technology Overview: Results to Date: Aug – Oct 2015 • Data Analysis • Subsetted dataframes • Aggregated hourly • Merged Jan-Aug 2015 • Visualisation • RESTful services • Java & NetBeans • NodeJS • MEAN stack Siobhan Moran, Discipline of Information Technology Overview: Results to Date: Aug – Oct 2015 • Data Analysis • Subsetted dataframes • Aggregated hourly • Merged Jan-Aug 2015 • Visualisation • RESTful services • Java & NetBeans • NodeJS • MEAN stack Siobhan Moran, Discipline of Information Technology Preliminary- Visualisation– NodeJS The main idea of Node.js: use non-blocking, event-driven I/O to remain lightweight and efficient in the face of dataintensive real-time applications that run across distributed devices. • web development in a dynamic language (JavaScript) on a VM that is incredibly fast (V8). • ability to handle thousands of concurrent connections with minimal overhead on a single process. • JavaScript is perfect for event loops with first class function objects and closures. People already know how to use it this way having used it in the browser to respond to user initiated events. • a lot of people already know JavaScript - it is arguably the most popular programming language. • using JavaScript on a web server as well as the browser reduces the impedance mismatch between the two programming environments which can communicate data structures via JSON that work the same on both sides of the equation. Siobhan Moran, Discipline of Information Technology 10/11/20 27 Preliminary- Visualisation– MEAN Stack “MEAN is a full stack JavaScript platform for modern web applications“ • builds full powerful applications using just JavaScript on both the back-end and front-end. MEAN brings together four of the most used and appreciated technologies for JavaScript development, laying down the foundation for easily building complex web applications. Siobhan Moran, Discipline of Information Technology 10/11/20 28 Preliminary- Visualisation– MEAN Stack MongoDB is the database where the data is stored NodeJS is used in the backend of the app – basically will be the server on which the app runs Express is a framework for node that will make writing the code for the server a lot easier Angular is the front end framework of the app – the interface that users see Siobhan Moran, Discipline of Information Technology 10/11/20 29 Future Work- Data Analysis/Visualisation-1/2 V 1.0 Nov ‘15 UNIVARIATE with one input V 2.0 Jan ‘16 Server Server Web Client Web Client Dec‘15 UNIVARIATE with multiple inputs Proposed timeline to Jan 2016 Siobhan Moran, Discipline of Information Technology 10/11/20 30 Future Work- Data Analysis/Visualisation-2/2 • continue with univariate time series analysis in choosing and fitting models - Box-Jenkins ARIMA models • progress to analysis of univariate models with multiple inputs • evaluate forecasting models - once the model is selected and parameters estimated, it is used to make forecasts - assess the accuracy of the forecasts using a number of developed methods • build an initial visualisation system for assessment by SmartBay Siobhan Moran, Discipline of Information Technology Thank You For Listening Any Questions?
© Copyright 2025 Paperzz