STATISTICS IN MEDICINE Statist. Med. 2003; 22:3073–3088 (DOI: 10.1002/sim.1544) Bayesian analysis of structural equation models with dichotomous variables Sik-Yum Lee∗; † and Xin-Yuan Song Department of Statistics; Chinese University of Hong Kong; Shatin; Hong Kong SUMMARY Structural equation modelling has been used extensively in the behavioural and social sciences for studying interrelationships among manifest and latent variables. Recently, its uses have been well recognized in medical research. This paper introduces a Bayesian approach to analysing general structural equation models with dichotomous variables. In the posterior analysis, the observed dichotomous data are augmented with the hypothetical missing values, which involve the latent variables in the model and the unobserved continuous measurements underlying the dichotomous data. An algorithm based on the Gibbs sampler is developed for drawing the parameters values and the hypothetical missing values from the joint posterior distributions. Useful statistics, such as the Bayesian estimates and their standard error estimates, and the highest posterior density intervals, can be obtained from the simulated observations. A posterior predictive p-value is used to test the goodness-of-t of the posited model. The methodology is applied to a study of hypertensive patient non-adherence to medication. Copyright ? 2003 John Wiley & Sons, Ltd. KEY WORDS: latent variables; posterior analysis; Gibbs sampler; conditional distributions; non-adherence 1. INTRODUCTION It has recently been pointed out that patient adherence to prescribed medication is crucial to the success of medical treatment [1], and that non-adherence leads to misjudgement of the eectiveness of medication [2]. In the promotion of adherence, it is desirable to establish a statistical model to study the relationships between non-adherence and its core factors such as patient knowledge, attitudes and beliefs concerning medication [3], physician interaction [4] and communication [5] etc. As these core factors are usually latent variables that cannot be measured directly by a single measurement, common correlation or regression analysis cannot be applied. ∗ Correspondence to: Sik-Yum Lee, Department of Statistics, Chinese University of Hong Kong, Shatin, N.T., Hong Kong † E-mail: [email protected] Contract=grant sponsor: HKSAR; contract=grant number: CUHK 4243=02H Copyright ? 2003 John Wiley & Sons, Ltd. Received January 2002 Accepted February 2003 3074 S.-Y. LEE AND X.-Y. SONG The factor analysis model [6] is the most basic statistical model for studying the relationships among latent and observed variables. To cope with the strong demand for models to deal with complex data sets in various elds, the factor analysis model has been generalized to more sophisticated models [7–10]. These multivariate models are commonly called structural equation models (SEMs) (or latent variable models). They have been extensively applied to behavioural, psychological and social research for assessing correlations and causations among latent and observed variables [12–14]. Recently, SEMs have been used in medicine [15–18]. Batista-Forguet et al. [19] used the simple factor analysis model to analyse multitrait-multimethod data of repeated measurements of blood pressure. Nowadays, there are more than a dozen standard SEM programs, including LISREL [7] and EQS [8]. The statistical development of most software occurs under the assumption that the data are continuous measurements with a multivariate normal distribution. However, in medical studies we frequently encounter dichotomous variables. Typically, respondents are asked to select answers from ‘yes or no’ about the existence of a symptom, ‘agree or disagree’ about a policy, ‘feeling better or worse’ about the eect of a drug, or ‘satisfactory or unsatisfactory’ about a medical treatment etc. The usual numerical values assigned to these dichotomous variables are the ad hoc numbers ‘1’ and ‘0’, or ‘1’ and ‘2’. Hence, the basic assumption that the data come from a continuous normal distribution is clearly violated. In fact, even for polytomous variables that are closer to continuous variables than dichotomous variables, ignoring the discrete nature of the data leads to erroneous results [20–22]. Hence, analysis of SEMs with dichotomous (or binary) variables has received much attention. Primary contributions using classical methods have come from Christoerson [23], Bartholomew [24] and Bock and Aitkin [25]. In these articles, the multi-dimensional integrals involved in the maximum likelihood (ML) analysis were approximated by numerical methods. For example, Bock and Aitkin [25] applied the EM algorithm [26] to an exploratory factor analysis model with the integrals in the E-step evaluated by the xed-point Gauss–Hermite quadrature. Meng and Schilling [27] proposed a Monte Carlo EM algorithm [28] and showed that it was better than Bock and Aitken’s algorithm [25]. Meng and Schilling [27] approximated the E-step with observations generated by the Metropolis–Hastings algorithm [29, 30] from the appropriate conditional distributions. The contributions mentioned above are based on the simple factor analysis model with dichotomous variables. So far, very limited results have been developed for the more general SEMs that are capable of assessing the causal eects among latent variables. To enrich existing knowledge about patient non-adherence, the Department of Medicine and Therapeutics, Community and Family Medicine, and Pharmacy at the Chinese University of Hong Kong conducted a survey of 853 ethnic Chinese patients diagnosed as suering from hypertension [31]. The main objectives were to measure and examine relationships among latent variables such as physician advice and concern, patient knowledge and belief, social cognition and social inuence, and the subsequent study reported non-adherence with reference to a structural equation model. Inspired by the fact that this study involved many dichotomous variables and that the simple factor analysis model is inadequate for assessing the causal relationships among the latent variables, we shall develop a general SEM with dichotomous variables, using a Bayesian approach. An advantage of this approach is its ability to incorporate useful prior information so that better estimates can be obtained. Moreover, the diculties induced by the complexities of the model and the multi-dimensional integrals can be handled eciently by means of powerful computing tools in statistics, such as the Gibbs sampler. Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3075 In Section 2 we describe the SEM with dichotomous variables. A Bayesian approach is developed in Section 3. We discuss the computational method, the choice of priors and the conditional distributions that are required in the implementation of the algorithm. An illustrative example based on a portion of the data set in the CUHK compliance study is presented in Section 4. A discussion is given in Section 5, and some technical details are presented in the Appendices. 2. A STRUCTURAL EQUATION MODEL WITH DICHOTOMOUS VARIABLES The SEM that we shall consider is a LISREL type model [7], which is the most common SEM and has wide applicability in substantive research. It is composed of a measurement equation and a structural equation. The measurement equation is dened by the following factor analysis model: yi = !i + i ; i = 1; : : : ; n (1) where yi is a p by 1 random vector of manifest variables, !i is a q by 1 random vector of latent variables, i is a p by 1 random vector of residuals, and is a p by q unknown parameter matrix which is usually called the factor loading matrix. It is assumed that the yi ’s are independent, !i is independently distributed as N[0; ], i is independently distributed as N[0; ], where is a diagonal matrix, and !i and i are uncorrelated. To deal with more complex situations, the latent vector !i is partitioned into (i ; i ) , where i and i are q1 by 1 and q2 by 1 vectors of latent variables, respectively. Moreover, suppose that these latent vectors satisfy the following structural equation: i = Bi + i + i (2) where B and are q1 by q1 and q1 by q2 matrices of unknown parameters, and i is independently distributed as N[0; ], where is a diagonal matrix. It is assumed that B0 = I − B is non-singular and its determinant is a constant independent of B. The elements of B represent direct eects of -variables on other -variables, whilst the elements of represent direct eects of -variables on -variables. As i and i are unobservable latent vectors, the regression method cannot be applied to estimate B and . For convenience, let = (B ; ) , then structural equation (2) can be written as i = !i + i (3) Now suppose that the exact measurement of yi = (yi1 ; : : : ; yip ) is not available and its information is given by an observed dichotomous vector zi = (zi1 ; : : : ; zip ) such that, for k = 1; : : : ; p 1 if yik ¿0 zik = (4) 0 otherwise Hence, the available observed data set is {zi ; i = 1; : : : ; n}. Consider the relationship between the factor analysis model dened by (1) with the dichotomous variables in z. Let k and kk be the kth row of and the kth diagonal element Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3076 S.-Y. LEE AND X.-Y. SONG of , respectively. It follows from (4) that Pr (zik = 1|!k ; k ; kk ) = Pr (yik ¿0|!k ; k ; kk ) = ∗ {(k = 1=2 kk )!i } (5) 1=2 1=2 ) = k = kk for any positive Note that both k and kk are not estimable, because Ck =(C kk constant C. There are many ways to solve this identication problem. Meng and Schilling [27] suggested xing kk implicitly and only estimating k , instead of both k and kk . Inspired by this, we x kk = 1:0. Note that the value 1.0 is chosen for convenience, and any other value would give an equivalent solution up to a change of scale. Moreover, the measurement and structural equations are not identied, because yi = ∗ ∗ + −1 −1 ∗ ∗ !i + i , and i = + !+ and !+ i + i , where = T1 , !i = T1 !i , = T2 i = T2 !i , for non-singular matrices T1 and T2 . A common method in SEMs for solving this problem is to x the approximate elements of and at preassigned values. In most applications of SEMs, the xed values can be decided on the basis of substantive theory. For example, see the analysis of the multitrait-multimethod model given by DiMatteo and Heidi [4]. In the following development of the Bayesian approach, we assume that the model is identied after xing elements in and . 3. BAYESIAN ANALYSIS Let Z = (z1 ; : : : ; zn ) be the observed data set of the dichotomous variables, and our objective is to develop a Bayesian procedure to estimate the unknown parameter vector , which contains parameters in , , and , based on Z. In contrast to the classical frequentist inference, the parameters in are considered to be random. Hence, there are prior distributions of parameters in that reect the prior knowledge of their values. In a Bayesian approach, it is possible to incorporate the prior information via the prior density of in the posterior analysis. This is done by analysing the following posterior density of given Z: p(|Z) = p(; Z) p(Z|)p() = p(Z) p(Z) Hence log p(|Z) ∝ log p(Z|) + log p() (6) where p(Z|) is the likelihood function and p() is the prior density of . The Bayesian estimate of can be obtained by estimating the mean or the mode of the posterior distribution. The posterior mode is usually computed by maximizing log p(|Z) with respect to . However, due to the nature of the dichotomous data, the likelihood function p(Z|) involves complicated multi-dimensional integrals that require tremendous programming and computational eorts to evaluate. Therefore, we obtain the Bayesian estimate by computing the mean of the posterior distribution. Owing to the complexity of the model and the nature of the data, the posterior distribution is very complicated. It is very dicult and tedious to compute the posterior mean directly from p(|Z). Hence, the technique of data augmentation [32] is employed in the posterior analysis. Let = (!1 ; : : : ; !n ) be the matrix of latent variables of the model, and Y = (y1 ; : : : ; yn ) be the matrix of latent continuous measurements underlying the matrix of observed dichotomous Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3077 data Z. In the analysis, the observed data Z is augmented with and Y, which can be considered as hypothetical missing data, to form a complete data set (Z; ; Y). Inspired by recent developments in statistical computing, a large number of observations will be sampled from p(; ; Y|Z) by the Gibbs sampler [33], and the statistics of interest, for example the Bayesian estimates, will be obtained via standard data analysis methods. To implement the Gibbs sampler, we start with initial values ( (0) ; (0) ; Y(0) ), simulate ( (1) ; (1) ; Y(1) ); : : : , and continue as follows. At the jth iteration with current values ( j) , ( j) and Y( j) : (a) Generate ( j+1) from p(|( j) ; Y( j) ; Z) (b) Generate ( j+1) from p(| ( j+1) ; Y( j) ; Z) ( c) Generate Y( j+1) from p(Y| ( j+1) ; ( j+1) (7) ; Z) There are three main steps, but step (a) involves components that correspond to , , and . Hence, it is divided into the following substeps which generate ( j+1) from p(|( j) ; ( j) ; ( j) ; ( j) ; Y( j) ; Z), ( j+1) from p(|( j+1) ; ( j) ; ( j) ; ( j) ; Y( j) ; Z), ( j+1) from p(|( j+1) ; ( j+1) ; ( j) ; ( j) ; Y( j) ; Z) and ( j+1) from p( |( j+1) ; ( j+1) ; ( j+1) ; ( j) ; Y( j) ; Z). It has been shown [33, 34] that under mild conditions and after a suciently large number of iterations, the joint distribution of (( j) ; ( j) ; Y( j) ) converges at an exponential rate to the desired posterior distribution [; ; Y|Z]. The required number of iterations for the convergence of the Gibbs sampler, say J , can be revealed by plots of the simulated sequences of the individual parameters. At convergence, parallel sequences generated with dierent starting values should be mixed well. Another method to monitor convergence is described in Gelman [35]. Basically, the ‘estimated potential scale reduction (EPSR)’ values [35] corresponding to the parameters are calculated sequentially as the runs proceed, and convergence is achieved when the EPSR values are less than 1.2. To obtain a more nearly independent sample, observations may be collected in cycles with indices t = J + c; J + 2c; : : : ; J + Tc for some spacing c [36]. However, in most practical applications a small c will suce for many statistical analyses [37, 38]. We will use c = 1 in our empirical study. 3.1. Conditional distributions The conditional distributions that are involved in equation (7) are required. It is necessary to specify the prior distributions of components in when deriving the conditional distribution of given , Y and Z in step (a). For brevity, the conditional distributions are discussed under the situation that all parameters are unknown; the treatment of xed parameters for identifying the model can easily be handled via the method given in Shi and Lee [11]. In general Bayesian analysis, the conjugate prior distributions have been found to be exible and convenient [39, 40]. This kind of prior distribution has been applied widely to many Bayesian analyses in SEMs [41–43]. Hence, we shall use the following well-known conjugate prior distributions: D p(k ) = N[0k ; H0k ]; p(k | D k ) = N[0k ; Copyright ? 2003 John Wiley & Sons, Ltd. p( k H0k ]; −1 D k ) = Gamma[0k ; 0k ] D p() = IWq [R0 ; 0 ] (8) Statist. Med. 2003; 22:3073–3088 3078 S.-Y. LEE AND X.-Y. SONG D where ‘p(·) =’ is dened as ‘the distribution of p(·) is equal to’, k is the kth diagonal element of , k and k are kth rows of and , respectively, and 0k , 0k , 0k , 0k , 0 and positive denite matrices H0k , H0k and R0 are hyperparameters whose values are assumed to be given by the prior information. In general, prior information can be obtained from causal observation or theoretical consideration of experts, the analysis of past data, or empirical information. The magnitude of the variance in the prior distributions that are associated with k and k can be taken to be small or large by selecting the appropriate H0k or H0k . When there is a lack of accurate prior knowledge, as pointed out by Kass and Raftery [44], priors are often picked for convenience, because the eect of the priors in Bayesian estimation is small when the sample is fairly large. Now, note that with given and Y, the SEM as dened in (1) and (2) with dichotomous variables becomes the familiar regression model with continuous normal variables. Under some mild assumptions of prior distributions, it can be shown that conditional distributions of the components of are the familiar normal, Gamma, and inverted Wishart distributions. Brief derivations are given in Appendix A. Simulating observations from the these distributions is fast and straightforward. In deriving the conditional distribution p(|; Y; Z), we rst note that given Y, the underlying model becomes one with continuous data and is relatively easy to handle. Moreover, as !i , i = 1; : : : ; n are mutually independent p(|; Y; Z) = p(|; Y) = n i=1 p(!i |; yi ) (9) It follows from similar derivations as in Shi and Lee [45] that [!i |; yi ] = [∗ −1 yi ; ∗ ]; D i = 1; : : : ; n (10) −1 −1 where ∗ = (−1 ! + ) , ! is the covariance matrix of !i , and is xed as the identify matrix for identifying the model. The conditional distribution of given , Y and Z can be obtained via (9) and (10). We now consider p(Y|; ; Z). As the yi are mutually independent, it follows that p(Y|; ; Z) = n i=1 p(yi |; !i ; zi ) Moreover, it follows from the denition of the model and (4) that N[k !i ; 1]I(−∞;0) (yik ); if zik = 0 p(yik |; !i ; zi ) ∼ N[k !i ; 1]I(0;∞) (yik ); if zik = 1 (11) where IA (y) is an indicator function that takes the value 1 if y is in A, and 0 otherwise. As the conditional distributions that are involved in steps (b) and (c) are also familiar, see (10) and (11), drawing observations from them is straightforward and fast. Consequently, the programming and computational burden involved in the Gibbs sampler is not heavy. 3.2. Some basic statistical inference Statistical inference of the model can be obtained on the basis of the simulated sample of observations from p(; ; Y|Z), namely {((t) ; (t) ; Y(t) )|; t = 1; : : : ; T }. The Bayesian estimate Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3079 of as well as their numerical standard errors estimates can be obtained as ˆ = Ê(|Z) = T −1 (t) T (12) t=1 (|Z) = (T − 1)−1 var T t=1 ˆ (t) − ) ˆ ((t) − )( (13) For any individual i, let !i be the corresponding vector of latent variables, E(!i |zi ) and ˆ i can var (!i |zi ) be the posterior mean and posterior covariance matrix. A Bayesian estimate ! (t) be obtained on the basis of { ; t = 1; : : : ; T } as follows: ! ˆ i = Ê(!i |zi ) = T −1 T t=1 !i(t) (14) where !i(t) is the ith column of i(t) . It can be shown that ! ˆ i is a consistent estimate of E(!i |zi ) [34]. A consistent estimate of var (!i |zi ) can be obtained as (!i |zi ) = (T − 1)−1 var T t=1 (!i(t) − ! ˆ i )(!i(t) − ! ˆ i ) (15) Moreover, it is desirable to obtain the highest posterior density (HPD) interval of an individual parameter [46, 47]. Achieving HPD intervals is not straightforward in the context of the current model (see Appendix B), and an algorithm [47] for doing this is given in Appendix B. A fundamental issue in data analysis is the assessment of the plausibility of a proposed model. The classical method in SEM associated with a non-Bayesian approach is to perform a goodness-of-t test based on the asymptotic distribution of a test statistic that measures the discrepancy between the posited model and the sample covariance matrix. When dealing with more complicated models such as the current one, it is dicult to derive the asymptotic distribution of such a statistic. In Bayesian analysis, a simple method for assessing the goodness-of-t of a posited model for tting a data set is the posterior predictive p-values (PP p-values) introduced by Meng [48] on the basis of the posterior assessment in Rubin [49]. It has been shown that this method is computationally and conceptually simple, and is useful in model-checking for a wide varieties of complicated situations [50]. Moreover, the required computation is a by-product of the common Bayesian simulation procedures such as the Gibbs sampler. A brief outline of this method when applied to our model is given in Appendix C. 4. AN ILLUSTRATIVE EXAMPLE The proposed methodology is used to assess the causal eects of some latent variables to patient non-adherence to medication in the CUHK compliance study conducted by the Department of Medicine and Therapeutics, Community and Family Medicine, and Pharmacy at the Chinese University of Hong Kong. A total number of 853 ethnic Chinese patients diagnosed as suering from hypertension were randomly selected from hospitals and clinics in Hong Kong to serve as subjects for the study. To demonstrate the methodology, suppose we are interested in studying how patient ‘non-adherence’ is aected by ‘knowledge of medication’ and ‘health condition’ by analysing the related portion of the whole data set. Six Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3080 S.-Y. LEE AND X.-Y. SONG Table I. Questions associated with the manifest variables. Frequencies of (yes ‘1’/no ‘0’) are in parentheses. y1 : y2 : y3 : y4 : y5 : y6 : y7 : y8 : y9 : Did you have any surplus in the previous prescribed drugs? (175/662) Did you stop/reduce/increase the dosage? (69/768) Did you forget to take medications? (391/446) Do you feel you have hypertension? (363/474) Do you know the reasons for taking drugs? (650/187) Do you know the reasons for taking drugs for a long term? (605/232) In the past two weeks, did you have emotional problems such as upset, hot temper, etc? (387/450) In the past two weeks, did your health cause any diculties in daily activities? (181/656) In the past two weeks, did your health cause any diculties in social activities? (177/660) dichotomous manifest variables are selected as indicators for the rst two latent variables mentioned above. Three polytomous manifest variables measured by a ve-point scale are used as indicators for the last latent variable. As these polytomous variables are heavily skewed (see frequencies of the last three variables in Table I), we transform them to dichotomous by grouping the rst four categories on the left together, in order to provide an illustration of the proposed method for dichotomous variables. The information lost due to this grouping should not be substantial. Translations of the corresponding questions from Chinese into English are listed in Table I, together with their frequencies. For brevity, we deleted a small number of observations with missing entries, and the remaining sample size is 837. The resulting data set is analysed using a model as dened in (1) and (2). Although other structures for the loading matrix can be considered, we choose the structure that gives nonoverlapping latent factors for clear interpretation. Hence the following specication of the loading matrix is used: 1 21 31 0 0 0 0 0 0 0 0 1 52 62 0 0 = 0 0 0 0 0 0 0 0 1 83 93 where ij ’s are the unknown factor loading parameters, but 1’s and 0’s are xed in the estimation for achieving an identied model. From the meanings of the questions, see Table I, it is clear that this structure gives three non-overlapping factors (latent variables), which can be interpreted as the ‘non-adherence, ’, ‘knowledge of medication, 1 ’ and ‘health condition, 2 ’ of the patients. As our main interest is to study the linear causal eects of ‘knowledge of medication’ and ‘health condition’ on ‘non-adherence’, the structural equation is chosen to be a linear regression model which regresses on 1 and 2 as follows: = 1 1 + 2 2 + (16) where 1 and 2 are unknown parameters. Other unknown parameters include the variances and covariance of 1 and 2 , 11 , 22 and 12 , and the variance of , . There are a total of 12 unknown parameters in this model. A path diagram of the model is given in Figure 1. The proposed Gibbs sampler algorithm is used to obtain Bayesian estimates of the parameters. Estimates of the latent variables can also be obtained as by-products. In specifying the prior distributions, subjective hyperparameter values in the conjugate distributions with small variances should be used for situations where we have reliable prior knowledge, for example, from closely related data or knowledge of experts. Here, as an illustration for analysing Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3081 Figure 1. Path diagram of the model for the patient non-adherence study. data sets in the general situation with no good prior information, we use the following datadependent prior inputs. We rst conduct an auxiliary Bayesian estimation with non-informative prior for obtaining prior inputs of the hyperparameter values for 0k , 0k and R0 in the conjugate prior distributions used in the actual estimation. Consequently, the values of hyperparameters are given by 0k = 8, 0k = 10, 0 = 8, H0k = H0k = 4I, 0k = +k , 0k = + k , and are estimates obtained from the auxiliary estimation. With some R0 = I, where +k and + k arbitrary starting values, we observe that the Gibbs sampler converges in about 5000 iterations. To reveal the convergence, plots of the EPSR values and some parameter values against the iteration numbers are presented in Figures 2 and 3, respectively. To obtain the Bayesian estimates and the HPD intervals, T = 5000 observations are collected after discarding the rst 5000 burn-in iterations. These results are presented in Table II. Based on these estimates, ˆ the estimated covariance matrices of the manifest random vector y and the latent ˆ and , random vector !, can be computed. Sometimes it is also desirable to transform these estimates to a completely standardized (CS) solution such that both manifest and latent variables ˆ are correlation matrices. A detailed way to compute the CS are standardized, and ˆ and solution is given in Joreskog and Sorbom [7]. For completeness, the CS solution obtained from our Bayesian estimates is also presented in Table II. The PP p-value for goodness-of-t testing of the posited model is 0.414. On the basis of the result given in Gelman et al. [50], which suggests that a PP p-value close to 0.5 indicates good t of the model, the obtained PP p-value indicates that the proposed model ts the data well. The most important interpretations of the results are as follows: (i) From the estimates of ˆ1 and ˆ2 in the structural equation and the corresponding HPD intervals, we see that 1 and 2 have signicant causal eects on . Moreover, it is clear from the estimated structural equation = −1:211 +0:902 that better ‘knowledge of medication, 1 ’ has a negative eect on ‘non-adherence’, whilst weaker or worse Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3082 S.-Y. LEE AND X.-Y. SONG Figure 2. EPSR values against the numbers of iterations. ‘health condition, 2 ’ has a positive eect on ‘non-adherence’. Hence, it is desirable to better educate patients about their illnesses and encourage them to pay more attention to their health. It should be pointed out that as the simple factor analysis model only gives estimates of the correlations among ; 1 and 2 rather than estimates of the causal eects (ˆ1 and ˆ2 ), the insight given by the above structural equation cannot be achieved. (ii) From the completely standardized solution, we see that the correlation estimate between 1 and 2 is −0:59. Hence, we arrive at the expected conclusion that better ‘knowledge of medication, 1 ’ and weaker ‘health condition, 2 ’ are negatively correlated. Using a SUN Enterprise 4500 Server, the required computing time is roughly 8 minutes. Hence, the proposed algorithm is feasible for analysing practical data sets. Owing to the generality of the model and the discrete nature of the dichotomous variable, the analysis of the example cannot be carried out by using WinBUGS. However, the computer programming code for achieving our results is available from the authors upon request. Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3083 Figure 3. Plots of three parallel sequences corresponding to dierent starting values of (a) 31 ; (b) 52 ; (c) 83 ; (d) 1 ; (e) 2 ; (f) 12 , against iterations. 5. DISCUSSION As in much of the research in the behavioural, educational and social sciences, latent variables are frequently encountered in medical and psychiatric studies. A clear understanding of the interrelationships among these latent variables as well as their relationships with manifest variables is important in making correct medical decisions. Structural equation modelling is an important multivariate method of achieving this purpose. With the basic assumption of the normal distribution of the data, this methodology has been applied widely to practical problems. However, the assumption is violated when analysing dichotomous data, which are very common in medical research. In this paper, a Bayesian approach is developed to analyse SEMs with dichotomous data. There are at least two advantages to using a Bayesian approach for our problem. First, recent powerful tools in statistical computing can be utilized in developing an ecient and Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3084 S.-Y. LEE AND X.-Y. SONG Table II. Bayesian solution and its completely standardized solution. HPD intervals are in parentheses. Parameter Original Bayesian estimates CS solution 11 21 31 42 52 62 73 83 93 1:0 (xed) 3:66 (2:61; 4:97) 0:12 (0:06; 0:19) 1:0 (xed) 4:32 (3:54; 5:12) 3:67 (2:98; 4:35) 1:0 (xed) 4:32 (3:53; 4:80) 4:14 (3:61; 5:13) 11 21 22 0:19 (0:14; 0:23) −0:18 (−0:21; −0:13) 0:49 (0:36; 0:58) 1.0 −0:59 1.0 1 2 −1:21 (−1:49; −0:81) 0:90 (0:73; 1:20) −0:38 0.45 0:89(0:58; 1:07) 0.97 0.97 1.31 0.40 0.88 0.85 0.57 0.95 0.95 0.46 dependable algorithm. Secondly, on the basis of the past records and experiences, good prior information is usually available in medical studies, and the Bayesian approach can incorporate this to produce good statistical results. In this paper, we show how to obtain the Bayesian estimates of the unknown parameters and latent variables in the model, their standard errors estimates, the HPD intervals, and a PP p-value assessment of the goodness-of-t of the posited model. On the basis of the simulated observations, we can perform other statistical analyses such as residual analysis, identication of outliers, and the diagnosis of model assumptions. Clearly, it is useful to develop procedures for model selection/comparison on the basis of the Bayes factor [44]. It is well-known that the computation of Bayes factor is highly non-trivial, see for example DiCiccio et al. [51] among others. The development of an ecient method of computing the Bayes factor in the context of the current SEM with dichotomous variables is an interesting research topic. Medical data are comparatively more complicated than the data in the behavioural and social sciences. These complicated data suggest a number of extensions for future research; for example, the development of statistical methods for analysing models with xed covariates, non-linear models, and multi-sample and multilevel models with continuous, dichotomous polytomous or missing data. APPENDIX A: CONDITIONAL DISTRIBUTION OF GIVEN ; Y AND Z Let ! be the vector of unknown parameters in ; and that associated with the structural model dened in equation (2). It is natural to assume that the prior distributions of and ! Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3085 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS are independent, p(Y|; )=p(Y|; ) and p(|)=p(|! ). We have p(; ! ) ∝ [p(Y|; )p()][p(|! )p(! )] As the rst term on the right hand side of the above expression depends only on whilst the second term depends only on ! , marginal conditional densities and ! are proportional to p(Y|; )p() and p(|! )p(! ), respectively. Consequently, these conditional densities can be treated separately. We rst consider the marginal conditional distribution of . For k = h, it is assumed that k and h are independent. Let Yk be the kth row of Y; it can be shown by similar reasoning as in Lee and Zhu [42] and Shi and Lee [45] that [k |Y; ] = N[∗k ; A k ] D −1 −1 + )−1 and ∗k =Ak [H0k 0k + Yk ]. where Ak =(H0k Now consider the conditional distribution of ! . Let (1) =(1 ; : : : ; n ) and (2) =(1 ; : : : ; n ). As the distribution of i only involves , and it is reasonable to assume that the prior distribution of is independent with the prior distribution of and , it follows that p(|! )p(! ) ∝ [p( (1) | (2) ; ; )p(; )][p( (2) |)p()] Hence, marginal conditional densities of (; ) and can be treated separately again. For k = h, it is assumed that ( k ; k ) and ( h ; h ) are independent. Let IWq2 [·; ·] denote the inverted Wishart distribution with dimension q2 , and k be the kth row of . It can be shown that [42, 45] for k =1; : : : ; q1 [k |; k ] = N[∗k ; D k Ak ]; [ D −1 k |] = Gamma[n=2 + 0k ; k ] [|]=[| (2) ] = IWq2 [( (2) (2) + R0−1 ); n + 0 ] D −1 −1 −1 ∗ + )−1 ; ∗k =Ak [H0k 0k +k ]; k =0k +2−1 (k k −∗k Ak k+ where Ak =(H0k −1 0k H0k 0k ). The situation with xed parameters can also be handled similarly as in Lee and Shi [42]; details are not presented to save space. APPENDIX B Let ∗ be a component of ; the (1 − ) HPD interval of ∗ is given by H ()= {∗ : p(∗ |Z) ¿c}, where c is chosen so that 1 − = p(∗ |Z) d ∗ H () The computation of the HPD interval requires knowing c and then calculating the H (). This task is typically not straightforward unless one is dealing with a simple model such as a standard normal model with continuous data. For the complicated model in this paper, HPD Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3086 S.-Y. LEE AND X.-Y. SONG intervals presented in the real example are estimated via the Chen–Shao algorithm [47] as follows: Step 1. Obtain an MCMC sample {t∗ ; t =1; : : : ; T } from p(∗ |Z) via the proposed Gibbs sampler. Step 2. Sort {t∗ ; t =1; : : : ; T } to obtain the ordered values: ∗(1) 6∗(2) 6 · · · 6∗(T ) . Step 3. Compute the 100(1 − ) per cent credible intervals (∗( j) ; ∗( j+[(1−)T ]) ), for j =1; 2; : : : ; T − [(1 − )T ]. Step 4. The 100(1 − ) per cent HPD interval is the one with the smallest interval width among all credible intervals. APPENDIX C Meng [48] introduced a Bayesian counterpart of the classical p-value by dening a posterior predictive p-value that depends both on the data and the choice of priors. His procedure is considered here to establish a goodness-of-t assessment for the posited model under the null hypothesis H0 that the proposed model dened in (1) and (2) is plausible. More specically, the posterior predictive p-value is dened as pB = Pr{D(Yrep |; )¿D(Y|; )|Z; H0 } where Yrep denotes a replication of Y and D(·|·) is a discrepancy variable. The probability is taken over the joint posterior distribution of (Yrep ; ; ) given H0 and Z, where p(Yrep ; ; ; Y|Z; H0 )=p(Yrep |; ; Y)p(; ; Y|Z; H0 ) For our model, we choose the 2 discrepancy variable p(Yrep |; ; Y)= n i=1 (Yirep − !i ) −1 (Yirep − !i ) which is distributed as (pn). Recall that =I. Here, implicitly, the partition (i ; i ) of !i is required to satisfy the model as dened in (2). The posterior predictive p-value (PP p-value) based on this discrepancy variable is given by pB (Z)= {Pr (2 (pn)¿D(Y|; )}p(; ; Y|Z) d d d Y 2 A Rao–Blackwellized type estimate of pB (Z) is equal to p̂B (Z)=J −1 J j=1 Pr (2 (pn)¿D(Y( j) |( j) ; ( j) )) The computation of p̂B (Z) is straightforward, as D(Y( j) |( j) ; ( j) ) can be calculated in each interaction and the tail-area probability of the 2 distribution can be obtained using any standard statistical software. The null hypothesis H0 is rejected if p̂B (z) is closed to 0.0 or 1.0 whilst the posited model can be regarded as a plausible model if p̂B (z) is near 0.5. See references [48–50] for more detailed discussions about the theoretical and practical aspects of the posterior predictive p-value. Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 BAYESIAN ANALYSIS OF STRUCTURAL EQUATION MODELS 3087 ACKNOWLEDGEMENTS The work described in this paper was fully supported by a grant from the Research Grants Council of the HKSAR (project number CUHK 4243/02H). The authors are grateful to Juliana C. N. Chan, Associate Professor, Department of Medicine and Therapeutics, CUHK, and Grace Chan, Pharmacist, Prince Wales Hospital, Hong Kong, for introducing them to the CUHK study on hypertensive patients and for providing the data in the real example, and to the editor and a referee for valuable comments. The assistance of Michael K. H. Leung is also acknowledged. REFERENCES 1. Czajkowski SM, Margaret AC, Ashley WS. Adherence and the placebo eect. In The Handbook of Health Behavior Change, 2nd edn, Sally AS, Eleanor BS, Judith KO, Wendy LM (eds). Springer: New York, 1998; 513 –534. 2. Rand CS, Kathleen W. Measuring adherence with medication regiments in clinical care and research. In The Handbook of Health Behavior Change, 2nd edn, Sally AS, Eleanor BS, Judith KO, Wendy LM (eds). Springer: New York, 1998; 71–103. 3. Andre T, Lynda B. Knowledge of acquired immune deciency syndrome and sexual responsibility among high school students. Youth & Society 1991; 22:339 –361. 4. DiMatteo MR, Heidi SL. Promoting adherence to courses of treatment: mutual collaboration in the physicianpatient relationship. In Health Communication Research: A Guide to Developments and Directions, Lorraine DJ, Bernard KD (eds). Greenwood: Westport, CT, 1998; 71–98. 5. Beisecker AE. Older persons’ medical encounters and their outcomes. Research on Aging 1996; 18:9 –31. 6. Lawley DN, Maxwell AE. Factor Analysis as a Statistical Method, 2nd edn. American Elsevier: New York, 1971. 7. Joreskog KG, Sorbom D. LISREL 8: Structural Equation Modeling with the SIMPLIS Command Language. Scientic Software International: Hove and London, 1996. 8. Bentler PM. EQS: Structural Equation Program Manual. BMDP Statistical Software: Los Angeles, 1992. 9. Sammel MD, Ryan LM. Latent variables with xed eects. Biometrics 1996; 52:220 –243. 10. Sammel MD, Ryan LM, Legler JM. Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society, Series B 1997; 59:667– 678. 11. Shi JQ, Lee SY. Latent variable models with mixed continuous and polytomous data. Journal of the Royal Statistical Society, Series B, 2000; 62:77– 87. 12. Everitt BS. An Introduction to Latent Variable Models. Chapman and Hall: London, 1984. 13. Bollen KA. Structural Equations with Latent Variables. Wiley: New York, l989. 14. Bentler PM, Chou CP. Practical issues in structural modeling. Sociological Methods and Research 1989; 16: 78–117. 15. Bentler PM, Stein JA. Structural equation models in medical research. Statistical Methods in Medical Research 1992; 1:159 –181. 16. Beacon HJ, Thompson SG. The analysis of complex patterns of longitudinal binary response: an example of transient dysphagia following radiotherapy. Statistics in Medicine 1998; 17:2551–2561. 17. Palta M. Latent variables, measurement error and methods for analyzing longitudinal binary and ordinal data. Statistics in Medicine 1999; 18:385 –396. 18. Chan CN, Chan YW, Cheung CK, Swaminathan R, Lan MC, Cockrama S, Wooa J. The metabolic syndrome in Hong Kong Chinese: The interrelationship among its components analyzed by structural equation modeling. Diabetes Care 1996; 19:953–959. 19. Batista-Foguet JM, Coeuder G, Ferragud MA. Using structural equation models to evaluate the magnitude of measurement error in blood pressure. Statistics in Medicine 2001; 20:2351–2368. 20. Olsson U. Maximum likelihood estimation of the polychoric correlation coecient. Psychometrika 1979; 44:443– 460. 21. Lee SY, Poon WY, Bentler PM. Structural equation models with continuous and polytomous variables. Psychometrika 1992; 57:89 –106. 22. Song XY, Lee SY. Bayesian estimation and model selection of multivariate linear model with polytomous variables. Multivariate Behavioral Research 2002; 37:453–477. 23. Christoerson A. Factor analysis of dichotomized variables. Psychometrika 1975; 37:29 –51. 24. Bartholomew DJ. Latent Variable Models and Factor Analysis. Oxford University Press: New York, 1987. 25. Bock RD, Aitkin M. Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 1981; 37:29 –51. 26. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B 1977; 39:1–38. Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088 3088 S.-Y. LEE AND X.-Y. SONG 27. Meng XL, Schilling S. Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association 1996; 91:1254 –1267. 28. Wei GCG, Tanner MA. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistical Association 1990; 85:699 –704. 29. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of state calculations by fast computing machine. Journal of Chemical Physics 1953; 21:1087–1091. 30. Hastings WK. Monte Carlo sampling methods using Markov chains and their application. Biometrika 1970; 57:97–100. 31. Chan GMC. The eects of treatment compliance on clinical outcomes, in patients with chronic diseases. PhD thesis, Department of Medicine and Therapeutics, the Chinese University of Hong Kong. 32. Tanner MA, Wong WH. The calculation of posterior distribution by data augmentation (with discussion). Journal of the American Statistical Association 1987; 86:79 – 86. 33. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 1984; 6:721–741. 34. Geyer CJ. Practical Markov chain Monte Carlo (with discussion). Statistical Science 1992; 7:473–511. 35. Gelman A. Inference and monitoring convergence. In Markov Chain Monte Carlo in Practice, Gilks WR, Richardson S, Spiegelhalter DJ (eds). Chapman & Hall: London, 1996; 131–144. 36. Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 1990; 85:398– 409. 37. Zeger SL, Karim MR. Generalized linear models with random eects: a Gibbs sampling approach. Journal of American Statistical Association 1991; 86:79 – 86. 38. Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 1993; 88:669 – 679. 39. Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion). Journal of the Royal Statistical Society, Series B 1972; 34:1– 42. 40. Broemeling LD. Bayesian Analysis of Linear Models. Marcel Dekker: New York, 1985. 41. Arminger G, Muthen BO. A Bayesian approach to nonlinear latent variable models using the Gibbs sampler and the Metropolis-Hastings algorithm. Psychometrika 1998; 63:271–300. 42. Lee SY, Zhu HT. Statistical analysis of nonlinear structural equation models with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology 2000; 53:209 –232. 43. Song XY, Lee SY. Bayesian estimation and test for factor analysis model with continuous and polytomous data in several populations. British Journal of Mathematical and Statistical Psychology 2001; 54:237–263. 44. Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association 1995; 90:773 –795. 45. Shi JQ, Lee SY. Bayesian sampling-based approach for factor analysis model with continuous and polytomous data. British Journal of Mathematical and Statistical Psychology 1998; 51:233–252. 46. Casella G, Berger RL. Statistical Inference. Duxbury Press: Belmont, CA, 1996. 47. Chen MH, Shao QM, Ibrahim JG. Monte Carlo Methods in Bayesian Computation. Springer: New York, 2000. 48. Meng XL. Posterior predictive p-values. Annals of Statistics 1994; 22:1142–1160. 49. Rubin DB. Bayesian justiable and relevant frequency calculations for the applied statisticians. Annals of Statistics 1984: 12:1151–1172. 50. Gelman A, Meng XL, Stern H. Posterior predictive assessment of model tness via realized discrepancies. Statistica Sinica 1996; 6:733– 807. 51. DiCiccio TJ, Kass RE, Raftery A, Wasserman L. Computing Bayes factors by combining simulation and asymptotic approximations. Journal of the American Statistical Association 1997; 92:903–915. Copyright ? 2003 John Wiley & Sons, Ltd. Statist. Med. 2003; 22:3073–3088
© Copyright 2026 Paperzz