Supplementary Material for Online Publication Only Appendix S1: Bayes theorem in more details Bayes’ Theorem states that the posterior distribution of the mean score, µ, after observing the data, y, is given by p( | y) . The posterior is proportional to the probability distribution of the data given the mean score p(y | ) times the prior distribution of the mean score itself p() . The proportionality is due to the fact that the denominator of Bayes’ Theorem is p(y) and because this does not contain any model parameters of interest, it can be ignored. Bayes theorem is then given by p( | y) p(y | )p(). In words, our prior knowledge is moderated by the current data to yield updated knowledge in the form of the posterior distribution. To obtain estimates for the posterior distribution one can use MCMC methods. There are a number of MCMC algorithms available, including the Gibbs sampler, which is the default algorithm available in most software. The Gibbs sampler makes use of an iterative process where all parameters of the model (e.g., means, variances, regression parameters, etc.) are repeatedly estimated. These repeated estimations can be summarized by plotting the results obtained in each iteration. This distribution can subsequently be used to compute the mean or confidence intervals. Consider that the goal is to obtain the joint posterior distribution of two model parameters. In our example, this might be the mean and variance of the reading score, or it could be two regression coefficients from a simple multiple regression model. For simplicity, let’s refer to these parameters generically as 1 and 2 . The Gibbs sampler begins by drawing a value from the conditional distribution 1 given 2 . The value of 2 is set to some arbitrary starting value to get the algorithm started. With the starting value set, a draw from the distribution of 1 given the start value of 2 is obtained. Now, the obtained value of 1 is used to obtain a new value of 2 given the value of 1 . The Gibbs sampler continues to iteratively draw samples using previously obtained values until two long chains of values for both parameters are formed. It is common for the first m of the total set of samples to be dropped. These are the referred to as the burn-in samples. The remaining samples are then considered to be draws from the marginal posterior distributions of 1 and 2 . Of course, the Gibbs sampler can be extended to virtually all common statistical models used in child development research – such as multilevel models, structural equation models, and growth models. . Most available algorithms for MCMC generally, and Gibbs sampling specifically allow for multiple chains to be specified. In principle, this allows sampling from a greater range of locations within the posterior distribution. The theory behind MCMC sampling is that the results for multiple chains should, after a large number of iterations, converge to the same marginal distribution of the model parameters. In Figure S1, the result of the Gibbs sampler of the mean reading skills score is displayed in what is called a trace-plot. On the y-axis the parameter estimate is displayed and on the x-axis the iterations of the Gibbs sampler are displayed, in our case 200 iterations. Starting values have to be provided. In our example the starting value of the parameter was arbitrary set at 10, which is why in the Figure, the chain starts at the value of 10. The closer the starting value to the posterior mean, the faster the model converges. Starting values can be set manually, or determined by the software. In the second iteration the value increases, and after iteration 10 the values of the mean reading skills score fluctuate around the value 102. Because the starting value is quite low, it takes the Gibbs sampler a couple of iterations to get close to the stationary distribution that represents the likely population mean. To remove the influence of the starting values, which are chosen arbitrarily, we omit the first part of the Gibbs sampler; this is called the burn-in phase. By default, Mplus, for example, omits the first half of the iterations and uses only the values from the right hand side of the vertical line to construct the posterior distribution. If we simply plot a histogram of the obtained values of each iteration, i.e. each value on the right hand side of the vertical line in Figure S1, we get Figure S2. In Figure S2 we used 50, 200, 2000 and 20000 iterations, respectively. As can be seen, the more iterations are used, the higher the accuracy of the histogram. Based on this histogram the software can plot a smooth line as in Figure S3 that is an approximation of the posterior distribution. Again, the more iterations, the higher the accuracy of the histogram and the better the posterior distribution is approximated. Using this plot, we can see that the posterior mean of reading skill scores is 102 and the 95% posterior probability interval (PPI) is 96-103. After running enough iterations, the Gibbs sampler should converge to the posterior distribution of interest (Sinhary, 2004). Typically, the more parameters that are needed to be estimated, the more iterations are required. The question is how many iterations to use and, as such, how to determine convergence of our statistical model? The decision about whether a chain has converged can be based on statistical criteria, but should always be accompanied by a visual inspection of the trace-plot, as will become clear below. Although Sinhary (2004) and others (see e.g., Brooks and Roberts, 1998) discuss several diagnostic tools to determine convergence, there is no consensus around which statistical criterion can be considered as the ‘best’ one. The scope of the current paper is not to discuss all possible criteria, and we will simply use the trace-plot to visually determine convergence. To inspect convergence we can rerun our model, but instead of requesting one chain of the Gibbs sampler we requested two chains. Now, two independent Gibbs samplers are computed at the same time. The new trace-plot of the mean reading skill score is displayed in Figure S4. The two lines represent the two chains of the Gibbs sampler that ran parallel, but are independent. To determine whether a model has converged, one should check the stability of the generated parameter values. In Figure S1, with only one chain, we concluded that after iteration 10 the parameter values of the parameter reach a stable pattern. If we look at Figure S4, with two chains, we can observe that both chains reach a stable pattern after just a couple of iterations. Also, both chains are relatively similar. This indicates that the model has reached convergence. In Figure S5A, a trace plot is displayed where convergence is an issue, because the chains are not very similar until iteration number 460. Also, the other trace plots in Figure S5 are examples of non-converging models. In conclusion, the stability of the generated parameter values can be visually checked by running multiple chains and observing from which iteration onwards the generated parameter values display a stable pattern. After inspecting convergence one can use the posterior distribution for drawing conclusions. The PPI can be used to determine significance or to determine whether there is overlap with other parameters. Figure S1. The chain of the Gibbs sampler for the reading skills scores. Figure S2. The information of each iteration of the Gibbs sampler for the IQ score summarized in a histogram. In Figure S2A we used only 50 iterations, whereas in Figure B, C and D we used 200, 2000 and 20,000, respectively. (A) (C) (B) (D) Figure S3. Kernal density plot. Figure S4. Two chains specified for the Gibbs sampler of the IQ score. Figure S5. Examples of chains specified for the Gibbs sampler where convergence is an issue. (A) (B) (C) (D) Appendix S2: Bayesian statistics in Mplus For an introduction to the non-Bayesian Mplus syntax we refer to Geiser (2013). Mplus syntax Explanation DATA: FILE IS dataCOG.dat; Select the data file. VARIABLE: NAMES ARE COGscore group; -Provide variable names to all columns of the data file. USEVARIABLES IS COGscore; -Select those variables that will be used in the model. ANALYSIS: ESTIMATOR IS Bayes; -To specify the Bayesian estimator. POINT IS mean/median/mode; -To select the point estimate which is shown in the results. The default setting is median. BCONVERGENCE IS .05; -To specify the value of the Gelman-Rubin convergence criterion. The default is .05, but we recommend using .01. BITERATIONS IS a (b). -To specify the maximum (a), or minimum (b) number of iterations for each chain of the MCMC procedure (only available in combination with the Gelman-Rubin convergence criterion). If one specifies (only) a number between brackets this refers to the minimum number of iterations, for example when BITER=5000(20000) is specified, the MCMC runs for a minimum of 20000 iterations and a maximum of 50000. After the minimum number of iterations is reached the convergence is again determined by the Gelmin-Rubin criterion. -To specify the number of iterations manually (change # to the required FBITERATIONS IS #; number of iterations). -Mplus offers the option to run independent chains of the MCMC CHAIN IS #; procedure. The default setting is 2 chains. It is especially advantageous when CHAINS is combined with the option PROCESSORS IS #; This will save computational time because each chain will run on the different processor of the computer. -Since MCMC procedures are based on random sampling (from the prior BSEED IS #; and posterior distribution), you might get slightly different results every time you run the analysis (on a different pc). If a value for the BSEED is given, the same sequence of random values will be obtained and the results are always the same. This option might also be useful if models do not reach convergence and starting the MCMC process at a different starting seed might help. -Remember that the very first iteration of the MCMC process is based on (arbitrary chosen) starting values. To solve issues with convergence or to STVALUES IS ml; speed up computational time, these starting values can be chosen less arbitrary. This can be done by hand using the * after each model statement. Alternatively, one can ask Mplus to first run the model using maximum likelihood (ML) estimation and use these results as the starting values for the Bayesian model. - The THIN option is used to specify which iterations from the posterior distribution are used for constructing the posterior distribution. When a chain is mixing poorly with high auto-correlations, the estimation can be THIN IS #; based on every k-th iteration (for example every 10th iteration) rather than every iteration. This is referred to as thinning. MODEL: [COGscore] (p1); -Here the statistical model needs to be specified. In our simple example we only requested the mean score by specifying [COGscore]. To specify a prior distribution for the mean score we need a label for the parameter, in our case the label (p1). MODEL PRIORS: -To specify a prior distribution for the parameters specified in the model p1 ~ N(#a, #b); statement, one can change the default prior distribution. In our example a normal prior distribution for the mean of COGscore is specified. ‘N’ refers to a normal distribution, ‘#a’ refers to the prior mean and ‘#b’ refers to the prior precision. OUTPUT: STAND; -To request standardized results. CINTERCAL; -To request PPI intervals. PLOT: TYPE IS PLOT3; -To request for the plots (trace plots, histogram and Kernel density). Appendix S3: Bayesian statistics in WinBugs For a more detailed introduction to the WinBugs syntax we refer to Ntzoufras (2011). WinBugs syntax Explanation y[] This is how the data file is specified. 106.2451019 130.0098114 105.0036545 108.0337601 100.6291046 96.66761017 107.7711563 102.5716476 96.08522034 119.5092163 72.42178345 130.16362 92.96968079 95.4822998 105.3284225 93.37817383 87.68320465 122.9314728 78.78004456 88.33502197 END model{ -This is the syntax to run the statistical model. for (i in 1:20) { -There are 20 individuals. y[i] ~ dnorm(beta[i], tau) -The dependent variable has a mean (beta) and prior precision beta[i] <- mu [i]} (tau). mu ~ dnorm(80,.01)I(40,180) -Beta consists of an intercept: mu (which is, without any predictors in the model equal to the mean of our dependent variable) tau~dgamma(0.01, 0.01) -Mu (=mean of dependent variable) has a normal prior } distribution with a mean of 80 and a prior precision of .01 and the prior is limited to obtain scores between 40 and 180. The prior precision of the dependent variable, tau, has a inverse gamma prior distribution. Appendix S4: Bayesian statistics in AMOS For a more detailed introduction to the Bayesian options in AMOS we refer to the user’s guide page 385 (see ftp://ftp.software.ibm.com/software/analytics/spss/documentation/amos/20.0/en/Manuals/IBM_SPSS_Amos_User_Guide.pdf). WinBugs syntax Explanation To obtain the Bayesian estimation press the button A new screen will appear and the Gibbs sampler is already running. If the smiley looks happy, , the model reached convergence according to the Gelman-Rubin criterion. The posterior results are displaed in the table.
© Copyright 2025 Paperzz