Improving Fuel Price Predictions With Open Data Raphael Volz WebST 2016 Agenda • Approach: Linking Open Data • Case Study: Fuel Price Predictions • Conclusion Oct 2015 Prof. Dr. Raphael Volz 2 Linking Data requires Data Integration ID in Wiki Data Name does not work ID in other data sets Source: https://tools.wmflabs.org/reasonator/?&q=1339 3 Example: Linking with WikiData Which movies feature Johann Sebastian Bach music? Q1339wikidata = nm0001925imdb Wikidata Oct 2015 Prof. Dr. Raphael Volz IMDb 4 Equivalence at core of linked data Example Symmetry Johann Sebastian Bach Q1339wikidata = nm0001925imdb nm0001925imdb = Q1339wikidata • Can derive equivalence if predicates are (inverse) functional Prof. Dr. Raphael Volz Source: Raphael Volz, Web Ontology Reasoning with Logic Databases, Dissertation University of Karlsruhe, 2004, p. 24 and p. 110 5 Linking Geographical Data Matching via names Matching coordinates • Easier but not trivial – Translation of coordinate formats – polygon containment Wikidata Coordinate OpenStreetMap Polygon WGS84 48° 50′ 14″ N, 10° 5′ 37″ E 48.837222, 10.093611 UTM 32U 580249 5409937 – Precision to use for matching ? Source: Raphael Volz, Joachim Kleb, and Wolfgang Mueller. "Towards Ontology-based Disambiguation of Geographical Identifiers." I3 Workshop at WWW2007, 2007. decimal places 0 1 2 3 decimal degrees 1.0 0.1 0.01 0.001 4 0.0001 5 6 0.00001 0.000001 7 0.0000001 Qualitative country or large region large city or district town or village neighborhood, street individual street, land parcel individual trees individual humans practical limit of commercial surveying N/S or E/W at equator 111.32 km 11.132 km 1.1132 km 111.32 m 11.132 m 1.1132 m 11,132 cm 1.1132 cm 6 Linking Data as new step in analytics pipeline Analytics pipeline and required competencies 1 2 3 Data identification Data Curation 4 5 Statistical Analysis 6 Model Creation 7 Model Assessment Model Selection Model Use Linking Data as a new core step – Data Science Rocket 2 1 Data identification 3 Data Curation 4 Data Data identification identification Data Data Curation Curation Data identification Data Curation Core Competency 5 Statistical Analysis Subject matter expertise 6 Model Creation Computer Science 7 Model Assessment 8 Model Selection Mathematics and statistics Model Use Integration Source: Raphael Volz, Collaborative Business / Business Intelligence, Slides of Lecture 1, HS Pforzheim, summer term 15 Oct 2015 Prof. Dr. Raphael Volz 7 Agenda • • • • Open Data Linked Data Case Study: Fuel Price Predictions Conclusion ? Oct 2015 Prof. Dr. Raphael Volz 8 Open fuel price data in Germany Fuel prices in Germany • Since Sep 2013 companies operating a public fuel station must report prices to the German anti trust agency in real-time • Objective: – Increase price transparency – “Improve the Bundeskartellamts’ possibilities to intervene in the case of illegal predatory strategies and other forms of market power abuse”(1) Data Set Characteristics(2) • 3 fuel types (E5,E10,Diesel) • 14.957 fuel stations • 30.231.752 price changes in one year( Jul 14- Jun 15) – 82.827 price changes per day – 5,6 price changes per station+day 30.6.15 9am 30.6.15 5pm • Open Data published at MDM portal • Data basis for fuel-price apps Source (1) http://www.bundeskartellamt.de/EN/Economicsectors/MineralOil/MTU-Fuels/mtufuels_node.html (2) Figure and Statistics own analysis of MTK data 9 A similar price pattern repeats every day Day of year 2015 Data: Diesel sales price at OMV station Bad Herrenalb via MTS-K, Rotterdam Market Price of Brent North Sea Crude Oil in Euro, Interbank USD/EUR day closing price Own analysis of MTS-K data for OMV Bad Herrenalb (Jul 14- Jun 14) 10 Despite regularity of price pattern need (open) market data to robust predictions for a station Linear Regression Model of Brutto Sales Price of 1l Diesel yˆ hour h day w oil o c Coefficient Estimate 0,96 - 0,00 c 1 am Raw Oil Price (Brent) … 6 am - 0,01 … o noon - 0,09 … hour 6 pm - 0,12 … EUR/USD exchange rate 9 pm - 0,00 … Mo 0,00 … day Data: Diesel sales price at OMV station Bad Herrenalb via MTS-K, Crude Oil (petroleum), Dated Brent, light blend 38 API, fob U.K.in Euro At Interbank EUR/USD closing price, Jul 2014 – Jun 2015 oil Note: Factorial coefficients can be read as € savings We Th Fr Sa > R2_train 0.8185748 > R2_test 0.8178038 > RMSE 0.03838322 Oct 2015 Prof. Dr. Raphael Volz 0,00 0,01 0,00 0,00 1,01 ? 11 Leveraging open data we can better understand competitive dynamics and improve predictions OpenStreetMap Nearby competitors Nearby cities WikiData Population of cities Mobilitätsdatenmarktplatz Oct 2015 Operator Brand Real -2,1% Jet -1,9% Shell 1,3% Aral 1,4% 12 Agenda • • • • Open Data Linked Data Case Study: Fuel Price Predictions Conclusion Oct 2015 Prof. Dr. Raphael Volz 13 Linked Open Data can improve prediction models and provides interesting data sets for teaching and research Conclusion • At minimum, we have learned today when and where to get fuel for the lowest price • We can obtain novel insights from open data • Linking data sets has allowed us to improve prediction quality and thereby strengthens automated decision making and analytics Outlook • Can showcase “interesting” non-confidential case studies to students and potential research partners • Many interesting new research questions arise from leveraging linked open data for analytics, predictive systems and building intelligent systems around data Oct 2015 Prof. Dr. Raphael Volz 14 Average Quality of Linear Regression Models per Fuel Station Quality of Single Model for all stations linked by open data Model Type Oct 2015 R2 Train R2 Test RMSE Deep Learning 0,8081 0,3156 0,052 Random Forest 0,8938 0,3305 0,052 Linear Regression 0,8125 0,2818 0,054 Prof. Dr. Raphael Volz 15 Skaled variable importance in the Random Forest model for all stations Oct 2015 Prof. Dr. Raphael Volz 16
© Copyright 2026 Paperzz