Probabilistic (a.k.a. Bayesian) machine learning models Sharing

Information Sharing and Retrieval using
Locally Inferred Probabilistic Models
Paul Blomstedt and Samuel Kaski
Helsinki Institute for Information Technology HIIT,
Department of Computer Science, Aalto University
[email protected]
Probabilistic (a.k.a. Bayesian) machine learning models
• Combine data and domain knowledge to extract inforKnowledge
mation and added value from raw data.
• Particularly useful in problems which require data-efficient
Data +
learning and/or estimates of uncertainty.
=f
• Can be updated as additional knowledge and more data
become available.
Sharing information through models
• Recent advances in distributed probabilistic modeling
g
(e.g. [2]) enable information sharing through models, without the need to disclose private raw data.
• Additional security guarantees can be given using
techniques such as differential privacy [3].
• In large-scale problems, models are used to exchange
information between distributed computational entities
[2, 4].
f1
f2
f3
f4
Retrieval of models
• Content-based information retrieval use measurement
Database
A
A
data instead of meta-data (e.g. key-words) to find relevant data sets in a database.
A,B,C?
Q
• Probabilistic modeling can be used to form informative
representations for retrieval.
C
• The retrieval task then consists in finding relevant mod-
B
els in the database [1].
B
C
References
[1] Blomstedt, P., Dutta, R., Seth, S., Brazma, A. and Kaski, S. Modelling-based experiment retrieval: A case study with gene expression clustering. Bioinformatics, 32(9), 1388–1394, 2016.
[2] Gelman, A., Vehtari, A., Jylänki, P., Sivula, T., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J. P., Schiminovich, D. and Robert, C. Expectation propagation as a way of life: A framework for
Bayesian inference on partitioned data. arXiv preprint, arXiv:1412.4869, 2017.
[3] Heikkilä, M., Okimoto, Y., Kaski, S., Shimizu, K. and Honkela, A. Differentially Private Bayesian Learning on Distributed Data. arXiv preprint, arXiv:1703.01106, 2017.
[4] Qin, X., Blomstedt, P., Leppäaho, E., Parviainen, P. and Kaski, S. Distributed Bayesian Matrix Factorization with Minimal Communication. arXiv preprint, arXiv:1703.00734, 2017.
Aalto Digi Matchmaking 2017