Machine Learning in Python

scikit-learn
Machine Learning in Python

Simple and efficient tools for data mining and data analysis
 Accessible to everybody, and reusable in various contexts
 Built on NumPy, SciPy, and matplotlib
 Open source, commercially usable - BSD license
A very broad package which can be used for a much wider range of things than our product will
require.
Pros:






Licence only seemingly needed for commercial use
Open source
Tutorials on offer
Prepressing section; Application: Transforming input data such as text for use with machine
learning algorithms - useful
Classification section: Identifying to which set of categories a new observation belongs to –
very useful
Large amount of graphical representations of results
Cons:

Seemingly highly advanced, would perhaps require a large amount of invested time to utilise
fully
Average at best, un-focused and perhaps too difficult
Java-ML:
http://java-ml.sourceforge.net/
Java-ML in a nutshell:

A collection of machine learning algorithms

Common interface for each type of algorithms

Library aimed at software engineers and programmers, so no GUI, but clear interfaces

Reference implementations for algorithms described in the scientific literature.

Well documented source code.

Plenty of code samples and tutorials.
Pros:


Open source
Tutorials on offer, very extensive
Cons:

No GUI
Seemingly not very technical nor focused
LingPipe
http://alias-i.com/lingpipe/
LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do
tasks like:



Find the names of people, organizations or locations in news
Automatically classify Twitter search
results into categories
Suggest correct spellings of queries
To get a better idea of the range of possible LingPipe uses, visit our tutorials and sandbox.
Architecture
LingPipe's architecture is designed to be efficient, scalable, reusable, and robust. Highlights
include:

Java API with source code and unit tests;

multi-lingual, multi-domain, multi-genre models;

training with new data for new tasks;

n-best output with statistical confidence estimates;

online training (learn-a-little, tag-a-little);

thread-safe models and decoders for concurrent-read exclusive-write (CREW)
synchronization; and

character encoding-sensitive I/O.
Pro’s:



GUI provided (also usable as shell or web)
Seemingly very detailed and comprehensive set of tutorials
Word sense disambiguation
Lingpipe seems like it could be an all-in-one solution. I would strongly suggest that we use this one
and rethink how the machine learning is going to be used.
It offers confidence checks: http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html
Classification: http://alias-i.com/lingpipe/demos/tutorial/classify/read-me.html
Parts of speech (useful for categories): http://alias-i.com/lingpipe/demos/tutorial/posTags/readme.html
And more: