Machine Learning

Machine Learning
University of Tartu, Spring 2008
Course Description: Machine learning is concerned with the development of
efficient learning algorithms that perform well on novel data. With the masses of data
available in today's world, the implementation of such algorithms together with a
rigorous statistical validation of those methods are essential. This course covers the
development of such algorithms (basic optimization theory, support vector machines,
classification and regression, training and testing data, data preprocessing, probabilistic
methods), the validation of them (cross-validation, statistical evaluative methods and
ways of comparing algorithms) and real-world applications of these methods (for
example, clustering methods in Bioinformatics, web mining, character and voice
recognition ...etc).
At the end of the course the student will be familiar with the various subfields of machine
learning, will be able to choose objectively those methods suited for particular datasets,
and will independently perform a project related to machine learning. There will also be
literature assigned for the course for which reading presentations will be prepared by the
students. Finally, there will be some homeworks and practical assignments designed to
help in the understanding of the material.
Schedule
Lectures + Reading Presentation: Tuesdays 12pm, Liivi 2-315
Lectures + Reading Presentation: Wednesdays 4pm, Liivi 2-402
Help Session: Thursdays, Liivi 2-402 on the following dates only: Feb 14, Feb 28, Mar
13, Mar 27, Apr 24, May 8, May 22
Grades
The grade categories are as follows: A 91-100%
B
C
D
F
81-90%
71-80%
61-70%
<60% (failed)
The grade for the course will be calculated as follows:
30%
20%
25%
30%
Two Exams (mid-term and final).
Reading presentations
Homework
Project
Details on the course components
Two Exams.
Mid Term Exam (kontrolltöö), April 1.
Final Exam: June3.
Please note: The exams will only contain questions relevant to material covered in class and in the reading
presentations. The final exam will not be cumulative and will only concern material discussed from midApril onwards.
20% Reading presentations
10% - Reading write-up
5%+5% - Reading presentation
You should read the at least one relevant reading prior to the lecture. I would really like to see
active participation during class discussions. Most of the readings are available online and if you
have any difficulties accessing them, please let me know, I will help you.
You will be assigned one reading – I will try as much as possible to match up the reading with
your interests and project plan, but this is not always possible. On the day of your presentation,
you must turn in a one page summary describing the principle idea of the paper, the major
contributions of the paper, and thoughts on possible limitations of the work and/or how the work
could be extended; this will contribute towards 10% of your reading presentation grade.
For that same reading, you will have a 20-30 minute time slot during a class period to present the
reading. Note that I will not be the only one reviewing your reading presentation. Your peers will
also be active participants in judging your presentation skills, so try to deliver the material
effectively. Length is not important. You should be concerned with the organization, your means
of delivery, your overall knowledge of the material and your overall presentation; above all, your
talk should induce discussion and interest. Remember, I am available during the Wednesday
period for help and guidance with this.
Finally, I will also request a copy of your presentation slides so that we can post it up on
the course website for easy reference.
25% Homework
These will be assigned as necessary and will contain questions about the reading material
and a few problems to solve.
30% Project
15% - Write-up
5% (2.5+2.5) – Two project-description deadlines
10% (5+5) - Presentation
You will be required to complete an original project related to Machine Learning.
The project is negotiable but these are the two main possibilities:
- Practical work: The project may be related to your current research or other
work, where you implement some novel machine learning method or where you use an
existing machine learning method for your work. A write up of 3-5 pages (not more, not
less) will be due where you describe the problem, your implementation, your results and
your conclusions. Note that your grade will be based on the quality of your research, not
on the results obtained.
- Research topic: You pick a slightly broad topic in machine learning; it could be
one related to material we discussed in class, or something new. (For example, if you are
interested in Bioinformatics, you might research Clustering methods in Bioinformatics, or
Sequence Algorithms in Bioinformatics, or both if you wish to be broader.) You will
research that topic thoroughly digging up all papers and book chapters that pertain to the
topic, and you will prepare a write up of 8-10 pages (not more, not less). Your write up
will include a description of the research for that topic, or the applications, and it should
also include your own thoughts and ideas on the shortcomings of current research in your
chosen topic, future goals and other ideas you may have on extending or implementing
current tools.
Project write-ups will be due on 27th May 2008.
I will be available for guidance on the project – to provide ideas and to help you with any
implementations. To make sure you are on target, there are two project deadlines to keep
in mind which will contribute towards 5% of your project grade. These are:
March 18th 2008 – Project Proposal (one-page description)
April 29th 2008 – Project Progress (one-page description)
The last week of May (May 27th – May 29th) will be dedicated towards your project
presentations. Just like your reading presentation, your peers will also help me in
grading your project presentation performance.
Major Calendar Deadlines
 Project Proposal – Mar 18 (2.5% of project grade)
***EXAM 1 – April 8 ***
 Project Progress report – Apr 29 (2.5% of project grade)
 Final Project Paper - May 27
** Project Presentations – May 27, 28**
*** EXAM 2 – June 3 ***
Suggested Textbooks - These are available on campus.
Kernel Methods for Pattern Analysis, Nello Cristianini and John Shawe Taylor
Pattern Recognition and Machine Learning, Christopher M. Bishop
Information Theory, Inference and Learning Algorithms, David J. C. MacKay
Learning with kernels, Bernard Scholkopf and Alexander J. Smola
Course outline – First half (Feb to mid April 2008)
Below are the topics we will be working on with the relevant literature indicated. You are
not expected to read them all! My intention is to expose you to multiple authors in the
field and to give you the opportunity to choose one reading over another. Those marked
with * are highly recommendable for the course. Readings marked with ** are to be
presented by you in class.
Introduction to Machine Learning: basic terms and some applications (Feb 12)
The Discipline of Machine Learning , Tom Mitchell
http://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf
Machine Learning, Nature Encyclopedia of Cognitive Science, Thomas D. Dietterich*
http://web.engr.oregonstate.edu/~tgd/projects/tutorials.html
Introduction to Machine Learning , Nils J. Nilsson
http://robotics.stanford.edu/people/nilsson/MLDraftBook/ch1-ml.pdf
Kernel Methods for Pattern Analysis
Chapter 1
Machine Learning and Pattern Recognition (slides)
Yann LeCun
http://cs.nyu.edu/~yann/2007f-G22-2565-001/diglib/lecture03-regularization.pdf
Optimization Theory (Feb 13, 19)
Learning with Kernels
Chapter 6
Practical Optimization: A Gentle Introduction – Chapter 1 *
John W. Chinneck, Systems and Computer Engineering, Carleton University
Ottawa, Ontario K1S 5B6, Canada
http://www.sce.carleton.ca/faculty/chinneck/po/Chapter1.pdf
Introduction to Optimization Methods: a Brief Survey of Methods **
Joa˜o S. D. Garcia, Se´rgio L. A´ vila, and Walter P. Carpes
http://www.ewh.ieee.org/soc/e/sac/meem/vol01iss02/MEEM_opt.pdf
The Interplay of Optimization and Machine Learning Research
Kristin Bennett, Emilio Parrado-Hernandez
http://jmlr.csail.mit.edu/papers/volume7/MLOPT-intro06a/MLOPT-intro06a.pdf
Kernel Methods (Feb 20, 26)
An Introduction to Kernel-Based Learning Algorithms
Klaus-Robert Muller, Sebastian Mika, Gnnar Ratsch, Koji Tsuda, Bernhard Scholkopf
Kernel Methods for Pattern Analysis
Part I deals with the theory behind Kernels
Part III is a plethora of kernel types
Learning with Kernels
Chapter 2
Fast String Kernels using Inexact Matching for Protein Sequences **
Christina Leslie and Rui Kuang
http://jmlr.csail.mit.edu/papers/volume5/leslie04a/leslie04a.pdf
Support Vector Machines (Feb 27)
Support Vector Machines: Hype or Hallelujah
Kristin Bennett, Colin Campbell
http://www.sigkdd.org/explorations/issue2-2/bennett.pdf
Statistical learning and kernel methods
Bernhard Scholkopf
ftp://ftp.research.microsoft.com/pub/tr/tr-2000-23.pdf
Support Vector Machines – an Introduction
Ron Meir
http://www.ee.technion.ac.il/~rmeir/SVMReview.pdf
A Tutorial on Support Vector Machines for Pattern Recognition *
Christopher J. C. Burges
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
*** No class on March 4th ***
Faster SVMs and SVM applications (Mar 5) [Konstantin]
SVM-light: Making Large-Scale SVM Learning Practical
Thorsten Joachims
http://www.cs.cornell.edu/People/tj/publications/joachims_99a.pdf
Training SVMs in linear time
Thorsten Joachims
http://www.cs.cornell.edu/People/tj/publications/joachims_06a.pdf
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily
recognition**
Iain Melvin1, Eugene Ie, Rui Kuang, Jason Weston, William Noble Stafford and Christina
Leslie
http://www.biomedcentral.com/content/pdf/1471-2105-8-S4-S2.pdf
Support Vector Regression (Mar 11)
A tutorial on Support Vector Regression Regression *
Alex J. Smola, Bernard Scholkopf
Duality, Geometry and Support Vector Regression
Jinbo Bi and Kristin Bennett
http://www.cs.rpi.edu/~bij2/rec.html
Statistical Analysis of Semi-Supervised Regression **
John Lafferty, Larry Wasserman
http://books.nips.cc/papers/files/nips20/NIPS2007_0293.pdf
Compressed Regression
Shuheng Zhou, John Lafferty, Larry Wasserman
http://books.nips.cc/papers/files/nips20/NIPS2007_0195.pdf
Ranking (Mar 12)
Optimizing Search Engines using Clickthrough Data **
Thorsten Joachims
http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf
Pranking with Ranking
Koby Crammer, Yoram Singer (NIPS 2001)
Learning to Order Things
William W. Cohen, Robert E. Shapire, Yoram Singer
http://people.csail.mit.edu/jrennie/papers/other/cohen-order-98.pdf
The Netflix challenge, http://www.netflixprize.com/
New SVMs (Mar 18)
New Support Vector Algorithms *
B. Scholkopf, A. Smola, R. Williamson, P. Bartlett
http://www.stat.purdue.edu/~yuzhu/stat598m3/Papers/NewSVM.pdf
******** Mar 18: Project Proposal due ********
Evaluative methods (Mar 19)
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach **
S. Salzberg
Crafting Papers in Machine Learning *
Pat Langley
http://www-csli.stanford.edu/icml2k/craft.html
Regression Error Characteristic Curves
Jinbo Bi and Kristin Bennett
http://www.cs.rpi.edu/~bij2/doc/RECcurve.pdf
Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance
Criteria
Rich Caruana and Alexandru Niculescu-Mizil
http://www.cs.cornell.edu/~caruana/perfs.kdd04.revised.rev1.pdf
Boosting (Mar 25)
AdaBoost
Jan Sochman, Jiri Matas
http://cmp.felk.cvut.cz/~sochmj1/adaboost_talk.pdf
A short introduction to boosting *
Y. Freund, R. Schapire
An efficient boosting algorithm for combining preferences **
Y. Freund, R. Iyer, R. Schapire and Y. Singer
Principle Component Analysis and Data Visualization (Mar 26)
Kernel Methods for Pattern Analysis
Ch 6, Section 6.2
A tutorial on Principal Component Analysis *
Lindsay I. Smith
http://kybele.psych.cornell.edu/~edelman/Psych-465-Spring-2003/PCA-tutorial.pdf
The Effect of Principal Component Analysis on Machine Learning Accuracy with High
Dimensional Spectral Data **
Tom Howley, Michael G. Madden, Marie-Louise O’Connell and Alan G. Ryder
http://www.it.nuigalway.ie/m_madden/profile/pubs/kbs-2006b.pdf
Clustering (Apr 1)
Towards a statistical theory of clustering
Ulrike von Luxburg and Shai Ben-David
http://www.cs.uwaterloo.ca/~shai/LuxburgBendavid05.pdf
Support Vector Clustering
Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik
http://jmlr.csail.mit.edu/papers/volume2/horn01a/rev1/horn01ar1.pdf
A sober look at clustering stability **
Shai Ben-David, Ulrike von Luxburg, and D´avid P´al
http://www.cs.uwaterloo.ca/~shai/sober.pdf
Spectral clustering and evaluating clusterings + review for exam (Apr 2)
Comparing Clusterings
Marina Meila.
Learning spectral clustering
Francis R. Bach, Michael I. Jordan.
http://cmm.ensmp.fr/~bach/nips03_cluster.pdf
A Comparison of Spectral Clustering Algorithms *
Deepak Varma and Marina Meila.
Functional Grouping of Genes Using Spectral Clustering and Gene Ontology **
Nora Speer, Holger Fröhlich, Christian Spieth and Andreas Zell
http://www.dkfz.de/mga2/gosim/GOFeatureMapsIJCNN05_submitted.pdf
******** Exam 1: April 8 ********
*** No class on April 9, 10 ***
Planned Calendar, Second half (April 15th to May 2008)
Manifold Learning (Apr 15)
Algorithms for manifold learning *
Lawrence Cayton
http://vis.lbl.gov/~romano/mlgroup/papers/manifold-learning.pdf
K nearest Neighbors and Novelty Detection (Apr 16)
K-Nearest-Neighbor Consistency in Data Clustering: Incorporating Local Information
into Global Optimization
Chris Ding and Xiaofeng He
http://delivery.acm.org/10.1145/970000/968021/p584ding.pdf?key1=968021&key2=1805861021&coll=GUIDE&dl=GUIDE&CFID=13294395&CFTOKE
N=62602314
Large Margin Nearest Neighbor Classifiers
Carlotta Domeniconi, Dimitrios Gunopulos, and Jing Peng
http://ieeexplore.ieee.org/iel5/72/31443/01461432.pdf
A Linear Programming Approach to Novelty Detection **
Colin Campbell and Kristin Bennett
Active Learning (Apr 22)
Fast Kernel Classifiers with Online and Active Learning
Antoine Bordes, Seyda Ertekin, Jason Weston, Leon Bottou
http://jmlr.csail.mit.edu/papers/volume6/bordes05a/bordes05a.pdf
Summary of current work on Active Learning
Rong Jin
http://www.cse.msu.edu/~rongjin/semisupervised/sum-activelearning.pdf
Query Learning with Large Margin Classifiers
Colin Campbell, Nello Cristianini, Alex J. Smola
url???still investigating
Active Learning in the Drug Discovery Process **
Manfred K.Warmuth, Gunnar R¨atsch, Michael Mathieson,
Jun Liao, Christian Lemmen
http://www.soe.ucsc.edu/~manfred/pubs/C60.pdf
Probabilistic Methods in Machine Learning (April 23, 29)
Information Theory, Inference and Learning Algorithms David MacKay, Chapters 2 and 3
Learning with Kernels
Chapter 6
The Latent Process Decomposition of cDNA Microarray Data Sets **
Simon Rogers, Mark Girolami, Colin Campbell, and Rainer Breitling
http://delivery.acm.org/10.1145/1080000/1070680/n0143.pdf?key1=1070680&key2=995484
2021&coll=GUIDE&dl=GUIDE&CFID=54016052&CFTOKEN=43348973
******** April 29: Project Progress report due ********
Kernel Fisher (May 6)
Learning with Kernels
Chapter 15
Asymptotic properties of the Fisher kernel
Koji Tsuda, Shotaro Akaho, Motoaki Kawanabe and Klaus-Robert M¨uller
http://www2.informatik.hu-berlin.de/Forschung_Lehre/wm/journalclub/pdf2268.pdf
Data fusion (May 7)
Kernel-based data fusion and its application to protein function prediction in yeast **
G.R.G. Lanckriet, M. Deng, N. Cristianini, M.I. Jordan, W.S. Noble
http://noble.gs.washington.edu/papers/lanckriet_kernel.pdf
Kernel-based data fusion for gene prioritization **
Tijl De Bie, Leon-Charles Tranchevent, Liesbeth M. M. van Oeffelen and Yves Moreau
http://bioinformatics.oxfordjournals.org/cgi/content/full/23/13/i125
Data Integration for Classification problems Employing Gaussian Process Priors **
Mark Girolami and Mingjun Zhong
http://books.nips.cc/papers/files/nips19/NIPS2006_0206.pdf
Multi-Task Learning (May 13)
Learning Multiple Tasks with Kernel Methods
Theodoros Evgeniou, Charles Micchelli, Massimiliano Pontil
http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/evgeniou+al-2005.pdf
Regularized Multi-Task Learning
Theodorus Evgeniou and Massimiliano Pontil
http://www.cs.ucl.ac.uk/staff/M.Pontil/reading/mt-kdd.pdf
Biological applications (May 14)
Feature Selection Methods for Improving Protein Structure Prediction with Rosetta **
Ben Blum, Michael Jordan, David Kim, Rhiju Das, Philip Bradley, David Baker
http://books.nips.cc/papers/files/nips20/NIPS2007_1055.pdf
Typing Staphylococcus aureus Using the spa Gene and Novel Distance Measures
P. Agius, B. Kreiswirth, S. Naidich, K. Bennett
Protein network inference from multiple genomic data: a supervised approach **
Y. Yamanishi, JP Vert, M Kanehisa
http://bioinformatics.oxfordjournals.org/cgi/reprint/20/suppl_1/i363
Other applications (May 20)
Finding Language-Independent Semantic Representation of Text Using Kernel Canonical
Correlation Analysis **
Alexei Vinokourov, John Shawe-Taylor, Nello Cristianini
Video Deconstruction: Recovering Scene Structures in Movies **
Timothee Cour, Ben Taskar
(paper currently under review, link to be provided later)
Concluding remarks, brainstorming on the future of ML, review for exam (May 21)
*** Final Project Paper due – May 27 ***
*** Project Presentations - May 27, 28 ***
******** Exam 2: June 3 ********