Supplementary Material for Online Publication Only Appendix S1

Supplementary Material for Online Publication Only
Appendix S1: Bayes theorem in more details
Bayes’ Theorem states that the posterior distribution of the mean score, µ, after observing the
data, y, is given by p( | y) . The posterior is proportional to the probability distribution of the data
given the mean score p(y | ) times the prior distribution of the mean score itself p() . The

proportionality
is due to the fact that the denominator of Bayes’ Theorem is p(y) and because this does

not contain
any model parameters of interest, it can be ignored. Bayes theorem
is then given by
p( | y)  p(y | )p(). In words, our prior knowledge is moderated by the current data to yield
updated knowledge in the form of the posterior distribution.

To obtain estimates for the posterior distribution one can use MCMC methods. There are a
number of MCMC algorithms available, including the Gibbs sampler, which is the default
algorithm available in most software. The Gibbs sampler makes use of an iterative process
where all parameters of the model (e.g., means, variances, regression parameters, etc.) are
repeatedly estimated. These repeated estimations can be summarized by plotting the results
obtained in each iteration. This distribution can subsequently be used to compute the mean or
confidence intervals.
Consider that the goal is to obtain the joint posterior distribution of two model parameters.
In our example, this might be the mean and variance of the reading score, or it could be two
regression coefficients from a simple multiple regression model. For simplicity, let’s refer to these
parameters generically as 1 and  2 . The Gibbs sampler begins by drawing a value from the
conditional distribution 1 given  2 . The value of  2 is set to some arbitrary starting value to get the


algorithm started. With the starting value set, a draw from the distribution of 1 given the start value



of  2 is obtained. Now, the obtained value of 1 is used to obtain a new value of  2 given the value of

1 . The Gibbs sampler continues to iteratively draw samples using previously obtained values until




two long chains of values for both parameters are formed. It is common for the first m of the total set
of samples to be dropped. These are the referred to as the burn-in samples. The remaining samples
are then considered to be draws from the marginal posterior distributions of 1 and  2 . Of course,
the Gibbs sampler can be extended to virtually all common statistical models used in child


development research – such as multilevel models, structural equation models, and growth models. .
Most available algorithms for MCMC generally, and Gibbs sampling specifically allow for multiple
chains to be specified. In principle, this allows sampling from a greater range of locations within the
posterior distribution. The theory behind MCMC sampling is that the results for multiple chains
should, after a large number of iterations, converge to the same marginal distribution of the model
parameters.
In Figure S1, the result of the Gibbs sampler of the mean reading skills score is
displayed in what is called a trace-plot. On the y-axis the parameter estimate is displayed and
on the x-axis the iterations of the Gibbs sampler are displayed, in our case 200 iterations.
Starting values have to be provided. In our example the starting value of the parameter was
arbitrary set at 10, which is why in the Figure, the chain starts at the value of 10. The closer
the starting value to the posterior mean, the faster the model converges. Starting values can be
set manually, or determined by the software.
In the second iteration the value increases, and after iteration 10 the values of the
mean reading skills score fluctuate around the value 102. Because the starting value is quite
low, it takes the Gibbs sampler a couple of iterations to get close to the stationary distribution
that represents the likely population mean.
To remove the influence of the starting values, which are chosen arbitrarily, we omit
the first part of the Gibbs sampler; this is called the burn-in phase. By default, Mplus, for
example, omits the first half of the iterations and uses only the values from the right hand side
of the vertical line to construct the posterior distribution. If we simply plot a histogram of the
obtained values of each iteration, i.e. each value on the right hand side of the vertical line in
Figure S1, we get Figure S2. In Figure S2 we used 50, 200, 2000 and 20000 iterations,
respectively. As can be seen, the more iterations are used, the higher the accuracy of the
histogram. Based on this histogram the software can plot a smooth line as in Figure S3 that is
an approximation of the posterior distribution. Again, the more iterations, the higher the
accuracy of the histogram and the better the posterior distribution is approximated. Using this
plot, we can see that the posterior mean of reading skill scores is 102 and the 95% posterior
probability interval (PPI) is 96-103.
After running enough iterations, the Gibbs sampler should converge to the posterior
distribution of interest (Sinhary, 2004). Typically, the more parameters that are needed to be
estimated, the more iterations are required. The question is how many iterations to use and, as
such, how to determine convergence of our statistical model? The decision about whether a
chain has converged can be based on statistical criteria, but should always be accompanied by
a visual inspection of the trace-plot, as will become clear below. Although Sinhary (2004) and
others (see e.g., Brooks and Roberts, 1998) discuss several diagnostic tools to determine
convergence, there is no consensus around which statistical criterion can be considered as the
‘best’ one. The scope of the current paper is not to discuss all possible criteria, and we will
simply use the trace-plot to visually determine convergence.
To inspect convergence we can rerun our model, but instead of requesting one chain
of the Gibbs sampler we requested two chains. Now, two independent Gibbs samplers are
computed at the same time. The new trace-plot of the mean reading skill score is displayed in
Figure S4. The two lines represent the two chains of the Gibbs sampler that ran parallel, but
are independent. To determine whether a model has converged, one should check the stability
of the generated parameter values. In Figure S1, with only one chain, we concluded that after
iteration 10 the parameter values of the parameter reach a stable pattern. If we look at Figure
S4, with two chains, we can observe that both chains reach a stable pattern after just a couple
of iterations. Also, both chains are relatively similar. This indicates that the model has
reached convergence. In Figure S5A, a trace plot is displayed where convergence is an issue,
because the chains are not very similar until iteration number 460. Also, the other trace plots
in Figure S5 are examples of non-converging models. In conclusion, the stability of the
generated parameter values can be visually checked by running multiple chains and observing
from which iteration onwards the generated parameter values display a stable pattern.
After inspecting convergence one can use the posterior distribution for drawing
conclusions. The PPI can be used to determine significance or to determine whether there is
overlap with other parameters.
Figure S1. The chain of the Gibbs sampler for the reading skills scores.
Figure S2. The information of each iteration of the Gibbs sampler for the IQ score summarized in a histogram. In Figure S2A we used only 50
iterations, whereas in Figure B, C and D we used 200, 2000 and 20,000, respectively.
(A)
(C)
(B)
(D)
Figure S3. Kernal density plot.
Figure S4. Two chains specified for the Gibbs sampler of the IQ score.
Figure S5. Examples of chains specified for the Gibbs sampler where convergence is an issue.
(A)
(B)
(C)
(D)
Appendix S2: Bayesian statistics in Mplus
For an introduction to the non-Bayesian Mplus syntax we refer to Geiser (2013).
Mplus syntax
Explanation
DATA: FILE IS dataCOG.dat;
Select the data file.
VARIABLE:
NAMES ARE COGscore group;
-Provide variable names to all columns of the data file.
USEVARIABLES IS COGscore;
-Select those variables that will be used in the model.
ANALYSIS:
ESTIMATOR IS Bayes;
-To specify the Bayesian estimator.
POINT IS mean/median/mode;
-To select the point estimate which is shown in the results. The default
setting is median.
BCONVERGENCE IS .05;
-To specify the value of the Gelman-Rubin convergence criterion. The
default is .05, but we recommend using .01.
BITERATIONS IS a (b).
-To specify the maximum (a), or minimum (b) number of iterations for each
chain of the MCMC procedure (only available in combination with the
Gelman-Rubin convergence criterion). If one specifies (only) a number
between brackets this refers to the minimum number of iterations, for
example when BITER=5000(20000) is specified, the MCMC runs for a
minimum of 20000 iterations and a maximum of 50000. After the
minimum number of iterations is reached the convergence is again
determined by the Gelmin-Rubin criterion.
-To specify the number of iterations manually (change # to the required
FBITERATIONS IS #;
number of iterations).
-Mplus offers the option to run independent chains of the MCMC
CHAIN IS #;
procedure. The default setting is 2 chains. It is especially advantageous
when CHAINS is combined with the option PROCESSORS IS #; This will save
computational time because each chain will run on the different processor
of the computer.
-Since MCMC procedures are based on random sampling (from the prior
BSEED IS #;
and posterior distribution), you might get slightly different results every
time you run the analysis (on a different pc). If a value for the BSEED is
given, the same sequence of random values will be obtained and the
results are always the same. This option might also be useful if models do
not reach convergence and starting the MCMC process at a different
starting seed might help.
-Remember that the very first iteration of the MCMC process is based on
(arbitrary chosen) starting values. To solve issues with convergence or to
STVALUES IS ml;
speed up computational time, these starting values can be chosen less
arbitrary. This can be done by hand using the * after each model
statement. Alternatively, one can ask Mplus to first run the model using
maximum likelihood (ML) estimation and use these results as the starting
values for the Bayesian model.
- The THIN option is used to specify which iterations from the posterior
distribution are used for constructing the posterior distribution. When a
chain is mixing poorly with high auto-correlations, the estimation can be
THIN IS #;
based on every k-th iteration (for example every 10th iteration) rather than
every iteration. This is referred to as thinning.
MODEL: [COGscore] (p1);
-Here the statistical model needs to be specified. In our simple example we
only requested the mean score by specifying [COGscore]. To specify a prior
distribution for the mean score we need a label for the parameter, in our
case the label (p1).
MODEL PRIORS:
-To specify a prior distribution for the parameters specified in the model
p1 ~ N(#a, #b);
statement, one can change the default prior distribution. In our example a
normal prior distribution for the mean of COGscore is specified. ‘N’ refers
to a normal distribution, ‘#a’ refers to the prior mean and ‘#b’ refers to the
prior precision.
OUTPUT:
STAND;
-To request standardized results.
CINTERCAL;
-To request PPI intervals.
PLOT: TYPE IS PLOT3;
-To request for the plots (trace plots, histogram and Kernel density).
Appendix S3: Bayesian statistics in WinBugs
For a more detailed introduction to the WinBugs syntax we refer to Ntzoufras (2011).
WinBugs syntax
Explanation
y[]
This is how the data file is specified.
106.2451019
130.0098114
105.0036545
108.0337601
100.6291046
96.66761017
107.7711563
102.5716476
96.08522034
119.5092163
72.42178345
130.16362
92.96968079
95.4822998
105.3284225
93.37817383
87.68320465
122.9314728
78.78004456
88.33502197
END
model{
-This is the syntax to run the statistical model.
for (i in 1:20) {
-There are 20 individuals.
y[i] ~ dnorm(beta[i], tau)
-The dependent variable has a mean (beta) and prior precision
beta[i] <- mu [i]}
(tau).
mu ~ dnorm(80,.01)I(40,180)
-Beta consists of an intercept: mu (which is, without any
predictors in the model equal to the mean of our dependent
variable)
tau~dgamma(0.01, 0.01)
-Mu (=mean of dependent variable) has a normal prior
}
distribution with a mean of 80 and a prior precision of .01 and
the prior is limited to obtain scores between 40 and 180.
The prior precision of the dependent variable, tau, has a inverse
gamma prior distribution.
Appendix S4: Bayesian statistics in AMOS
For a more detailed introduction to the Bayesian options in AMOS we refer to the user’s guide page 385
(see
ftp://ftp.software.ibm.com/software/analytics/spss/documentation/amos/20.0/en/Manuals/IBM_SPSS_Amos_User_Guide.pdf).
WinBugs syntax
Explanation
To obtain the Bayesian estimation press the button
A new screen will appear and the Gibbs sampler is already
running. If the smiley looks happy,
, the model reached
convergence according to the Gelman-Rubin criterion. The
posterior results are displaed in the table.