A Java Toolbox for Analysis of MassIve Data
STreams using Probabilistic Graphical
Models
Andrés R.
1
Masegosa ,
Thomas D.
2
Nielsen ,
Antonio
2
1
Department of Computer and
Information Science, NTNU, Norway
Ana M.
2
Martı́nez ,
3
Salmerón ,
Darı́o
Rafael
3
Ramos-Lopez ,
2
Cabañas
Helge
1
Langseth ,
2,4
Madsen
& Anders L.
4
3
Hugin Expert A/S,
Aalborg, Denmark
Department of Mathematics,
University of Almerı́a, Spain
Department of Computer Science,
Aalborg University, Denmark
Presentation
Academic and Industrial partners
Dataminingframeworks
Sta4onary
datasets
Weka
RLibs
Matlab
PGMs
Elvira
Datastreams
AMIDST
Infer.net
Hugin
MOA
ApacheSAMOA
MLlib|Apache
Spark/Flink
VowpalWabbit
Description
• Distributed parallel algorithms: AMIDST provides parallel multi-core and distributed implementations of Bayesian parameter learning, using streaming variational
Bayes and variational message passing.
Maximum
Likelihood
• Analysis of big data streams: A complete collection of algorithms for inference and
learning of both static and dynamic Bayesian networks from streaming data. Existing
software systems for PGMs only focus on stationary datasets.
• NaïveBayes
• TAN
• (G)AODE/HODE
Bayesian
learning
STATIC
•
•
•
•
•
•
DYNAMIC
• DynamicNB
GaussianDiscriminantAnalysis
LatentClassifica?onModels(LCM)
GaussianMixtures
BayesianLinearRegression
FactorAnalysis
MixtureofFA
•
•
•
•
•
•
•
DynamicLCM
HiddenMarkovModel(HMM)
KalmanFilter(KF)
SwitchingKF
FactorialHMM
Auto-regressiveHMM
Input-OutputHMM
Main Features
Latent variable models
Integration
Big Data
Modularity
Open source
●
8
Java 8 based
6
C
●
G50
●
●
6000
Correlated with Unemployment Rate
5000
4000
Jun 2007
Sep 2007
Dec 2007
Mar 2008
Jun 2008
Sep 2008
Dec 2008
Mar 2009
Jun 2009
Sep 2009
Dec 2009
Mar 2010
Jun 2010
Sep 2010
Dec 2010
Mar 2011
Jun 2011
Sep 2011
Dec 2011
Mar 2012
Jun 2012
Sep 2012
Dec 2012
Mar 2013
Jun 2013
Sep 2013
Dec 2013
Mar 2014
2000
3000
Ht variable
//We create a SVB object
SVB parameterLearningAlgorithm = new SVB();
//We can activate the output
parameterLearningAlgorithm.setOutput(true);
30
Concept drift
//We can open the data stream using the static class DataStreamLoader
DataStream<DataInstance> data = DataStreamLoader.openFromFile("datasets/simulated/
WasteIncineratorSample.arff");
//We fix the size of the window
parameterLearningAlgorithm.setWindowsSize(100);
25
Use-case: Risk prediction in credit operations
Learn hidden naive Bayes model from data stream
//We fix the DAG structure
parameterLearningAlgorithm.setDAG(DAGGenerator.getHiddenNaiveBayesStructure(data.
getAttributes(),"GlobalHidden", 2));
15
20
Number of cores
6000
Code example
10
5000
5"
5
4000
0
3000
…"
2000
G1
Ht variable
M50
2
…"
●
0
M1
GBs/hour
4
LG
LM
●
10
15
20
25
Unemployement rate
//We set the data which is going to be used for leaning the parameters
parameterLearningAlgorithm.setDataStream(data);
//We perform the learning
parameterLearningAlgorithm.runLearning();
//And we get the model
BayesianNetwork bnModel = parameterLearningAlgorithm.getLearntBayesianNetwork();
And much more...
amidst.eu
amidst.github.io/toolbox/
//We print the model
System.out.println(bnModel.toString());
AMIDST project has received funding from the European Union’s Seventh Framework Programme
for research, technological development and demonstration under grant agreement no 619209.
30
35
© Copyright 2026 Paperzz