SaS - Predictive Modeling techniques

Data Mining, Machine Learning,
Data Analysis, etc.
scikit-learn
http://scikit-learn.org/stable/
scikit-learn
Machine Learning in Python
• Simple and efficient tools for data mining and data analysis
• Built on NumPy, SciPy, and matplotlib
• Open source, commercially usable - BSD license
• Language: Python
http://scikit-learn.org/stable/index.html
Techniques:
• Classification
• Identifying to which category an object belongs to.
• Regression
• Clustering
• Dimensionality reduction
• Model selection
• Preprocessing
• Examples
• Face completion with a multi-output estimators
• Multilabel classification
Multilabel classification
• Face completion with a multioutput estimators
• use of multi-output estimator to
complete images
• goal: predict the lower half of a
face given its upper half
Classification
• Identifying to which category an object belongs to.
• Applications: Spam detection, Image recognition.
Algorithms: SVM, nearest neighbors,random forest, ...
• Example: Multilabel classification
Classification
• Examples based on real world
datasets
• Visualizing the stock market structure
•
•
unsupervised learning techniques
extract the stock market structure from variations in
historical quotes.
Classification- Examples
http://scikit-learn.org/stable/auto_examples/index.html#dataset-examples
Regression
• Predicting a continuous-valued attribute associated with an object.
• Applications: Drug response, Stock prices.
• Algorithms: SVR, ridge regression, Lasso, ...
Regression - examples
Clustering
• Automatic grouping of similar objects into sets.
• Applications: Customer segmentation, Grouping experiment
outcomes
• Algorithms: k-Means, spectral clustering,mean-shift, ...
Clustering - Examples
Dimensionality reduction
• Reducing the number of random variables to consider.
• Applications: Visualization, Increased efficiency
• Algorithms: PCA, feature selection, non-negative matrix factorization.
.
Model selection
• Comparing, validating and choosing parameters and models.
• Goal: Improved accuracy via parameter tuning
• Modules: grid search, cross validation,metrics.
Preprocessing
• Feature extraction and normalization.
• Application: Transforming input data such as text for use with
machine learning algorithms.
Modules: preprocessing, feature extraction.
SAS® Enterprise Miner™
https://www.sas.com/en_us/software/enterprise-miner.html
• Descriptive and predictive modeling
• Descriptive Modeling:
• uncovers shared similarities or groupings in historical data
• Categorizing customers by product preferences or sentiment
• Techniques:
• Predictive modeling
• Classify events in the future or estimate unknown outcomes.
• Helps uncover insights for things like customer churn, campaign response or credit
defaults.
• Example: using credit scoring to determine an individual's likelihood of repaying a loan
SAS - Descriptive Modeling
Clustering
Grouping similar records together.
Anomaly detection
Identifying multidimensional outliers.
Association rule learning
Detecting relationships between
records.
Principal component analysis
Affinity grouping
Detecting relationships between
variables.
Grouping people with common interests
or similar goals (e.g., people who buy X
often buy Y and possibly Z).
SAS - Predictive Modeling
• Classify events in the future or estimate unknown outcomes.
• Helps uncover insights for things like customer churn, campaign
response or credit defaults.
• Example: using credit scoring to determine an individual's likelihood of
repaying a loan
SaS - Predictive Modeling techniques
Regression
A measure of the strength of the
relationship between one dependent
variable and a series of independent
variables.
Neural networks
Computer programs that detect patterns,
make predictions and learn.
Decision trees
Tree-shaped diagrams in which each
branch represents a probable
occurrence.
Support vector machines
Supervised learning models with
associated learning algorithms.