Information Sharing and Retrieval using Locally Inferred Probabilistic Models Paul Blomstedt and Samuel Kaski Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University [email protected] Probabilistic (a.k.a. Bayesian) machine learning models • Combine data and domain knowledge to extract inforKnowledge mation and added value from raw data. • Particularly useful in problems which require data-efficient Data + learning and/or estimates of uncertainty. =f • Can be updated as additional knowledge and more data become available. Sharing information through models • Recent advances in distributed probabilistic modeling g (e.g. [2]) enable information sharing through models, without the need to disclose private raw data. • Additional security guarantees can be given using techniques such as differential privacy [3]. • In large-scale problems, models are used to exchange information between distributed computational entities [2, 4]. f1 f2 f3 f4 Retrieval of models • Content-based information retrieval use measurement Database A A data instead of meta-data (e.g. key-words) to find relevant data sets in a database. A,B,C? Q • Probabilistic modeling can be used to form informative representations for retrieval. C • The retrieval task then consists in finding relevant mod- B els in the database [1]. B C References [1] Blomstedt, P., Dutta, R., Seth, S., Brazma, A. and Kaski, S. Modelling-based experiment retrieval: A case study with gene expression clustering. Bioinformatics, 32(9), 1388–1394, 2016. [2] Gelman, A., Vehtari, A., Jylänki, P., Sivula, T., Tran, D., Sahai, S., Blomstedt, P., Cunningham, J. P., Schiminovich, D. and Robert, C. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv preprint, arXiv:1412.4869, 2017. [3] Heikkilä, M., Okimoto, Y., Kaski, S., Shimizu, K. and Honkela, A. Differentially Private Bayesian Learning on Distributed Data. arXiv preprint, arXiv:1703.01106, 2017. [4] Qin, X., Blomstedt, P., Leppäaho, E., Parviainen, P. and Kaski, S. Distributed Bayesian Matrix Factorization with Minimal Communication. arXiv preprint, arXiv:1703.00734, 2017. Aalto Digi Matchmaking 2017
© Copyright 2026 Paperzz