Notes on Prediction.io

Notes on Prediction.io
Created 04/22/14
Updated 06/28/14, Updated 09/21/14, Updated 11/21/14, Updated 02/10/15
Introduction
PredictionIO is an open source machine learning server for software developers to create predictive features, such
as personalization, recommendation and content discovery.
Their goal is to be the “MySQL” or “LAMP Stack” of Machine Learning and Analytics.
Examples of use:


Le Tote, a clothing subscription/rental service that is using PredictionIO to predict customers’ fashion preferences.
PerkHub, which is using PredictionIO to personalize product recommendations in the weekly ‘group buying’ emails they send out.
Current version is 0.86. The download was 151MB.
Features
The server is written in Scala and runs on Spark. As a complete example, it includes many elements of Hadoop
and Mahout (however, the Prediction.io marketing pitch is slowly changing from being a replacement for Hadoop
to being an easy implementation of Spark).
Recommendation engine example
cli = predictionio.Client("<my key>")
cli.identify("John")
cli.record_action_on_item("view", "HackerNews" )
# predict top preferences near a specified location
r = cli.get_itemrec_topn("myEngine", 5, {"pio_latlng":[37.9, 91.2]})
Algorithms Supported
Item recommendation
Item similarity
Item rank
The implementations are from MLlib in Spark, and including Naive Bayes and ALS.
Company Information
Formed in early 2013. Pivoted in late 2013, got next funding in mid-2014. Located in Palo Alto and somewhere
in the UK.
The company competes with “closed ‘black box” MLaaS services or software’, such as Google Prediction API,
Wise.io, BigML, and Skytree. However, since Prediction.io is open and extensible, with a developer community,
the company feels that it has an advantage.
The problem PredictionIO is setting out to solve is that building Machine Learning into products is expensive and
time-consuming — and in some instances is only really within the reach of major and heavily-funded tech
companies, such as Google or Amazon, who can afford a large team of PhDs/data scientists. By utilizing the
startup’s open source Machine Learning server, startups or larger enterprises no longer need to start from scratch,
while also retaining control over the source code and the way in which PredictionIO integrates with their existing
wares.
People
Simon Chan, CEO (was at UMich, then startups in China, then UCL)
Donald Szeto, CTO (Stanford, UC Berkeley)
Page 1 of 5
Kennieth Chan, engineer (UCB)
Thomas Stone (VP Sales) (Cornell, University College London)
Funding
Raised $2.5M in July 2014, from the following list: StartX, XG Ventures (founded by ex-Googlers), Sood
Ventures, Ironfire Capital (activist investor firm), Quest Venture Partners (Menlo Park), Azure Capital Partners
(San Francisco and Menlo Park).
Business Model
There was no discussion of pricing for the server, or pricing for service/support.
Architecture of the PredictionIO server
PredictionIO is mainly built with Scala. Scala runs on the JVM, so Java and Scala stacks can be freely mixed for
totally seamless integration. PredictionIO Server consists of a few components:
 Admin Server
 IO Server
 Scheduler
 Data Store
 Data Processing Stack
The “DASE” Concept – their counterpart of “MVC”
PredictionIO's DASE architecture brings the separation-of-concerns design principle to predictive engine
development. DASE stands for the following components of an engine:




Data - includes Data Source and Data Preparator
Algorithm(s)
Serving
Evaluator
As you can see from the Quick Start, MyRecommendation takes a JSON prediction query, e.g.{ "user":
"1", "num": 4 }, and return a JSON predicted result. In
MyRecommendation/src/main/scala/Engine.scala, the Query case class defines the format of such query:
1
2
3
4
case class Query(
user: String,
num: Int
) extends Serializable
The PredictedResult case class defines the format of predicted result, such as
1
{"itemScores":[
Page 2 of 5
2
3
4
5
6
{"item":22,"score":4.07},
{"item":62,"score":4.05},
{"item":75,"score":4.04},
{"item":68,"score":3.81}
]}
with:
1
2
3
4
5
6
7
8
case class PredictedResult(
itemScores: Array[ItemScore]
) extends Serializable
case class ItemScore(
item: String,
score: Double
) extends Serializable
Finally, RecommendationEngine is the Engine Factory that defines the components this engine will use:
Data Source, Data Preparator, Algorithm(s) and Serving components.
1
2
3
4
5
6
7
8
9
10
object RecommendationEngine extends IEngineFactory {
def apply() = {
new Engine(
classOf[DataSource],
classOf[Preparator],
Map("als" -> classOf[ALSAlgorithm]),
classOf[Serving])
}
...
}
Spark's MLlib ALS algorithm takes training data of RDD type, i.e. RDD[Rating] and train a model, which is
a MatrixFactorizationModel object.
The PredictionIO Recommendation Engine Template, which MyRecommendation is based on, integrates this
algorithm under the DASE architecture.
Data Processing Stack
Built on top of solid data frameworks and technology, such as Hadoop, Cascading, Scalding and Mahout,
PredictionIO can handle a huge amount of data efficiently. A variety of machine learning algorithms are available
for you to implement with just a few clicks.
Admin Server
PredictionIO's Admin Server component provides a web interface for developers to manage applications, engines
and algorithms. It is built on top of Play Framework.
IO Server
IO Server offers scalable REST API services to communicate with your web or mobile app. It is responsible for
handling data input and prediction output. It is built on top of Play Framework.
Page 3 of 5
Scheduler
A scalable scheduler that can be used to manage schedules for executing tens, hundreds, or even tens-of-thousands
of jobs. Quartz is the default scheduler.
Data Store
Data store manages the collected data, the predictive model and the cached prediction results. MongoDB is the
default data store.
Documentation
Android and Java SDK Endpoints
There are commands to send information, to request recalc, and to request results.
Page 4 of 5
PHP API
Delivery in the Cloud
There are EC2 instances which can be spun up preconfigured for Prediction.io
https://aws.amazon.com/marketplace/pp/B00ECGJYGE
For usage information, see http://docs.prediction.io/current/installation/install-predictionio-on-aws.html
Developer Community
There is a forum at https://groups.google.com/forum/#!forum/predictionio-user
The developer community of PredictionIO supports a number of projects. To list a project on their site, please
contact them or do a pull request through PredictionIO Docs Project.
In early 2015, the CEO said they had over 300 developers in their ecosystem.
Questions and Open Issues
How does the server store and manage trained models?
What data sources can be integrated?
Chronology
Late Spring 2014: Learned about this tool
Summer 2014: Initial Evaluation
Fall 2014: Started another round of evaluation, since it was clearer that they were providing a server, and that the
server used Scala / Spark. They were developing templates which captured usage patterns. Also, they were
working to create a developer ecosystem.
Presentation at Predictive API’s conference in November 2014
http://www.slideshare.net/predictionio/predictionio-the-1st-international-conference-on-predictive-apis-and-apps
This was not very technical.
02/09/15: Went to presentation by CEO, hosted by Scala Bay group. The presentation summarized what we
mostly knew, but gave a number of key directions, such as they have developed usage templates that greatly
improve the ease of learning. The presentation is available at
https://www.youtube.com/watch?v=EUDHFOyUumE&feature=youtu.be
Page 5 of 5