here - Acadia Institute for Data Analytics

Cloud Analytics Platforms
Christian Frey
About AIDA
• Our mission is to advance
knowledge in data analytics
through research, education
and outreach
• Our goal is to foster
collaboration and the
sharing of data analytics
methods, technologies, and
ethical practices among its
stakeholders
About me – Christian Frey
• Graduated in May with Business and Computer Science
• Work at AIDA full time now, managing the Institute
• You can find me at the top floor of Patterson, in the Rural Innovation
Centre
• [email protected]
• 1-902-585-1777
Agenda
• Cloud-based analytics space
• Pros and Cons of:
•
•
•
•
Google Cloud Predictions API
Amazon ML
Microsoft Azure
IBM SPSS Modeler Gold on Cloud
• Tutorial using BigML
• Loading in data and creating a dataset
• Customizing the model creation
Data Analytics Roadmap
Tonight, we are here
Source: https://www.informs.org/ORMS-Today/Public-Articles/October-Volume-37-Number-5/Back-in-Business
Google Cloud Predictions API - cloud.google.com/prediction/
• API Explorer on the website,
great for prototyping
• Gives you $300 for 60 days of
experimentation
• Integrates into other Google
services, most notably Google
Sheets
• Uses online learning to allow
for addition of new data
Google Results on Iris Dataset
Pros and Cons of Google Cloud Predictions API
Pros
Cons
Integrates with Google Sheets so you can predict your
spreadsheets
No choice in the model it uses
Very good accuracy on the training set
No method of exporting the model that was created
Fast training and prediction times, usually under 1
minute to train smaller datasets, half a second to
predict
Amazon Machine Learning –
aws.amazon.com/machine-learning
• 3 Ways to access your model:
• Though a web interface
• Through an API in a variety of languages
• Through the AWS command line
interface
• No free trial available
• Picky with the data it accepts – No
more than 10k errors in your data, or
10%
• No choice in model that is used
Pros and Cons of Amazon Machine Learning
Pros
Cons
Easy to load data into Amazon S3, then create the
model
Data must be located in Amazon S3 or Redshift
storage, locked into Amazon for everything
Model can easily be integrated with other Amazon
services
No choice of model, it uses variants of regressions for
everything. (Linear, Logistic, and Multiclass)
Accuracy is slightly lower than other products on the
Iris data set
Microsoft Azure ML - studio.azureml.net
• Drag and drop modules onto an
infinite background, then
connect
• Many models to choose from,
requires some understanding of
the data
• Offers a web service to access
your model
• Good for those who know about
Machine Learning, but don’t
want to code
Results on the Iris Dataset - Azure
Pros and Cons of Azure ML
Pros
Cons
Lots of machine learning algorithms available,
including Neural Networks, Naïve Bayes, Clustering,
and Decision Trees
It is difficult to find options or the correct module to
use
Easy to use free trial, no sign up required!
Free trial only saves data for 8 hours
Allows you to run arbitrary Python or R code as a
module to process or analyze your data
IBM SPSS Modeler Gold on Cloud
• IBM SPSS Modeler – create
decision trees, regressions, from
an arbitrary dataset
• Pay as you go – only pay for what
you need to use
• Drag and drop with easy to find
modules
• Auto classifier runs all models
and allows you to compare them
SPSS Modeler – Results on IRIS Dataset
Pros and Cons of IBM SPSS Modeler Gold on Cloud
Pros
Cons
Easy to figure out drag and drop interface
Cannot run arbitrary Python or R code in cloud version
of SPSS Modeler
Tied for first place in accuracy
Only supports hosted DB2 database connections, no
connection to other databases
Bulk loading of data into SPSS Modeler from DB2
requires a support ticket
BigML – BigML.com
• Easy to sign up, with an unlimited number of
small models (up to 16MB of data)
• 1-click dataset, 1-click models, and 1-click
model evaluation to go from dataset to
evaluation in 3 clicks
• Also offers customization options, with
suggestions driven by the data