A Few Projects To Share

A Few Projects To Share
Javad Azimi
May 2015
Data Clustering
• Separating the data into similar
groups without any supervision
• My master thesis project
• Clustering ensemble
• Aggregate the results of different
clustering algorithms into one single
result
• Constraint clustering
• Clustering based on must-link and
cannot-link information
• Several publications in IJCAI2009,
IDEAL 2007, CSICC 2006 and …
Unsupervised Anomaly Detection
• Summer intern project at Biotronik MSE
• Test system identified some devices
(pacemakers) as safe while there were not
(10 out of 20k)
• More than 2000 tests in each device
• How can we find those bad devices based on
the results of tests
• Key: Some tests have significant correlation
together
• Implemented in Statistica and Visual Basic
• A US patent has been submitted
Visual Appearance of Display Ads and Its Effect on
Click Through Rate
• Summer intern at Yahoo! Labs
• Is it possible to predict the CTR based on creative design?
• We developed 43 visual features
Log inverse CTR histogram
• Based on the generated features we are able to:
• Predict CTR up to 3 times better than weighted sampling method
• Introduce a set of recommendations to Ads designer in order to optimize
their design
• Support Vector Regression (SVR)
• 3 papers(WWW2012, KDD 2012 and CIKM 2012 ) and one US patent
• MATLAB and C++ implementation ( also used LibSVM, CVX and NCut)
A few more
• The sensitivity of CTR to the user entrance time to the website
• The earlier they come, the higher CTR is likely to get
• Advertiser(EA and Insurance) email targeting
• To whom we should send email?
• Estimating the income based on zip code, browser, OS, and others
browsing features
• Porsche or Hundai? Which one we should place
Keyword Transformation(1)
• Cold Start listings usually have noncommon keywords that is hard to
find in the searched queries.
• Keywords are scraped using Bing
Search engine
• Algorithm:
•
•
•
•
•
N-gram Extraction
N-gram frequency filtration
Entity detection
POS filtration
DSSM filtration
Keyword Transformation(2)
Keyword
Generated NenaKey (alphabetic sorted)
35fcread20bk
card memory reader
mig50q7csa0x
intelligent power toshiba
canine Iris melanoma
cancer dogs eye
boots size 70 mark nason
mark nason shoes
break 2014 xmas bargain
2014 cheap christmas vacations
casinoroulettegame
casino game roulette
buy www.seatgeek.com show ticket
buy seatgeek show ticket
3m gold privacy filters gpfmr13 - notebook
privacy filter
3m filter gold privacy
56 harbour breeze low profile ceiling fans
breeze ceiling fan harbor
a4 hammered and linen brilliant white
paper suppliers
a4 linen paper white
apply for parents private loan for school
for kids
loans parent student
Bayesian Optimization: Motivating Application
This is how an MFC works
Nano-structure of
anode significantly
impact the electricity
production.
ee-
Oxidation
products (CO2)
Anode
bacteria
Fuel (organic matter)
SEM image of bacteria sp. on
Ni nanoparticle enhanced
carbon fibers.
O2
H+
H2 O
Cathode
We should optimize anode nano-structure to maximize power by selecting a set of experiment.
Parameters Tuning
• Suppose you have n different learning algorithms which generates n
different prediction (p1,p2,…,pn) for a given input query.
• The final prediction would be:
pf= (a1*p1)+(a2*p2)+…(an*pn) where a1,a2,…an are constant.
• Challenge:
• What should be the set of (a1,a2,…an)?
• Extensive search is not possible since every evaluation will take some times.
• What is the best way to set a1,a2,…an?
Other Applications
• Financial Investment
• Reinforcement Learning
• Drug test
• Mechanical Engineering
• And …
10
Bayesian Optimization: Steps
• We have a black box function and
we don’t know anything about its
distribution
• We are able to sample the
function but it is very expensive
• We are interested to find the
maximizer (minimizer) of the
function
• Assumption:
• lipschitz continuity
Bayesian Optimization: Big Pictures
Current Experiments
Posterior Model
Select Experiment(s)
Run Experiment(s)
Bayesian Optimization: Main Steps
• Surrogate Function(Response Surface, Posterior Model)
• Make a posterior over unobserved points based on the prior.
• Its parameter might be based on the prior. Remember it is a BAYESIAN
approach.
• Acquisition Criteria( Selection Function)
• Which sample should be selected next.
Surrogate Function
• Simulates the unknown function distribution based
on the prior.
• Deterministic (Classical Linear Regression,…)
• There is a deterministic prediction for each point x in the input
space.
• Stochastic (Bayesian regression, Gaussian Process,…)
• There is a distribution over the prediction for each point x in the
input space. (i.e Normal distribution)
• Example
• Deterministic: f(x1)=y1, f(x2)=y2
• Stochastic: f(x1)=N(y1,0.1) f(x2)=N(y2,5)
Gaussian Process(GP)
• Gaussian Process is used to build the
posterior model
Points with high
output expectation
• The prediction output at any point is a
normal random variable
• Variance is independent from
observation y
Points with high
output variance
15
Selection Criterion
• Maximum Mean (MM)
MM
MUI
MPI
MEI
• Selects the points which has the highest output mean
• Purely exploitative
• Maximum Upper bound Interval (MUI)
• Select point with highest 95% upper confidence bound
• Purely explorative approach
• Maximum Probability of Improvement (MPI)
• It computes the probability that the output is more than
(1+m) times of the best current observation , m>0.
• Explorative and Exploitative
• Maximum Expected of Improvement (MEI)
• Similar to MPI but parameter free
• It simply computes the expected amount of improvement
after sampling at any point
16
Bayesian Optimization: Results
Questions
[email protected]