Factor analysis: A primer for asthma researchers

Editorial
Factor analysis: A primer for asthma researchers
Kristin A. Riekert, PhD, and Michelle Eakin, PhD
Baltimore, Md
In the past few years, factor analysis (FA) has been increasingly
used to examine the relationship between variables known to be
associated with asthma health status including spirometry, atopy,
inflammation, symptom frequency, quality of life, b-agonist use,
and exacerbations. FA is a multistep analytic process that seeks to
define the underlying latent constructs among a larger set of
observed variables. Variables that correlate with one another
but are distinct from other variables are combined into factors
that are assumed to reflect the same underlying disease process.
Thus, a primary advantage of FA is that it reduces a large number
of disease features to a smaller, more manageable number of
independent and analyzable factors. FA does not have any a priori
hypotheses about the data structure. As such, the specific
variables included in the model highly affect the output of the factor analysis and the interpretation of which latent constructs are
represented. Moreover, there are multiple valid options for
conducting FA, and there are no tests of significance. As a result,
FA is dependent on the researchers’ judgment about (1) the
consistency of results using different statistical approaches and
rules of thumb, (2) the interpretability of the output, and (3)
comparability with previous research. Herein lies the challenge
of FA.
Because FA is based on the correlation between variables, one
of the most important decisions is selecting which variable to
include in the model. Ideally the researcher has access to the
universe of possible variables from which to choose those that are
interrelated and make sense on the basis of known empirical data
or hypothesized mechanisms. Unfortunately, for most asthma
studies, the FA is a secondary analysis of an existing dataset, and
the variables available may not include relevant domains, or
domains present may not be measured thoroughly. For example,
Holt et al1 did not have quality-of-life data, and symptom data
were based on diary cards, which contrasts with the survey approach used by Shatz et al,2 which in turn did not have measures
of atopy or inflammation that were included to a varying degree in
other studies (Table I). This is important because failure to measure an important factor may distort the apparent relationships
among the measured factors.6 The complexity of the variables included in the analysis can also affect the results. In asthma, this is
a significant limitation because many of the constructs of interest
From the Division of Pulmonary and Critical Care Medicine, Johns Hopkins School of
Medicine.
Supported in part by HL075344 from the National Heart, Lung, and Blood Institute.
Disclosure of potential conflict of interest: The authors have declared that they have no
conflict of interest.
Received for publication March 24, 2008; accepted for publication March 25, 2008.
Reprint requests: Kristin Riekert, PhD, Assistant Professor, Division of Pulmonary and
Critical Care Medicine, Johns Hopkins University, 5501 Hopkins Bayview Circle,
JHAAC Rm 4B.74, Baltimore, MD 21224. E-mail: [email protected].
J Allergy Clin Immunol 2008;121:1181-3.
0091-6749/$34.00
Ó 2008 American Academy of Allergy, Asthma & Immunology
doi:10.1016/j.jaci.2008.03.023
(eg, spirometry, atopy, patient reported outcomes) are confounded
by measurement metrics (eg, mL vs a 5-point scale) and levels of
patient burden (1-time blood test vs daily diary completion).
Thus, subtle differences in factors between published studies
may merely reflect differences in variable selection and measures
and not inconsistencies in the true underlying constructs.
The 2 most common methods for extracting factors in FA are
principal FA (PFA) and principal components analysis (PCA).
The only difference between these methods is the type of variance
included in the analysis. PCA uses the total variance of all the
variables, and PFA uses only the variance that each variables
shares with other variables and excludes a single variable’s unique
and error variance. Holt et al1 conducted a PFA, whereas most
other asthma studies have used PCA (Table I). PCA and PFA typically yield equivalent results, particularly if the sample size is
large and there are numerous variables. However, PFA can result
in factors with negative eigenvalues, which are artifacts of the
methodology that can affect how many factors are retained and
should not be included in further analyses or interpretation.
After factors have been extracted, researchers must use their
judgment to determine how many factors should be retained.
There are several criteria available, and typically multiple
methods are examined to determine whether the most scientifically meaningful factors are being retained. The 2 most common
methods for selecting the number of factors are the Kaiser
criterion (eg, only factors that have eigenvalues greater than
1 are retained) and scree test (identifies the number of factors
by plotting the latent roots against the number of factors in order
of extraction [ie, scree plot] and examining the slope).6 Holt et al1
used an unconventional method for factors selection that uses the
mean of the eigenvalues as the cutoff point for factor retention.
They appear to have erroneously included negative eigenvalues
in the mean calculation, thereby inflating the number of factors retained. Using the more traditional approaches to factor selection,
the data provided by the authors indicate that 3 factors would be
retained by using the Kaiser criterion and 3 to 5 factors using the
scree test. Holt et al1 did not comment on whether they also examined the interpretability of a 3-factor or 4-factor solution. Scientific judgment of the practical significance and interpretability
of the results is one of the most important steps in determining
which factors to retain.
After completing the PFA or PCA, the results are likely to be
difficult to interpret. Rotation is therefore used to improve the
utility of the solution and does not change the quality of the
mathematical fit. There are 2 types of rotation: orthogonal (which
results in uncorrelated factors) and oblique (which allows factors
to be correlated). Orthogonal rotation provides factors that are
easier to interpret and describe succinctly in a manuscript, but it is
extremely rare for factors to be independent and uncorrelated. In
reality, both methods tend to result in equivalent patterns.
Researchers often conduct both types of rotations and, if similar,
present the orthogonal results as did Holt et al.1
1181
1182 RIEKERT AND EAKIN
J ALLERGY CLIN IMMUNOL
MAY 2008
TABLE I. Factor analytic procedures and criteria used in studies evaluating the components of asthma health status
Article
Constructs included in FA
1
Holt et al (this issue)
Pillai et al (2008)3
Shatz et al (2005)2
Juniper et al (2004)4
Leung et al (2004)5
Atopy (biological)
b-Agonist use (diary)
Exacerbations (diary)
Inflammation (biological)
Spirometry
Symptoms—asthma (diary)
Atopy (biological and survey)
Spirometry
Symptoms—asthma (survey)
Symptoms—rhinitis (survey)
Exacerbations (survey)
Quality of life (survey)
Symptoms—asthma (survey)
b-Agonist use (diary)
Quality of life (survey)
Spirometry
Symptoms—asthma (diary)
Spirometry
Exhaled NO
Atopy (biological)
Inflammation (biological)
Extraction
strategy
Rotation
Selection of
no. of factors
Criterion used
for factor loading
Test of robustness
PFA
Orthogonal EV above mean of all EVs
Oblique
.40
Time
NR
Oblique
Kaiser criterion
Scree test
.45
PCA
Orthogonal Kaiser criterion
.40
PCA
Orthogonal Kaiser criterion
NR
Age
Sex
Race
Siblings
Education
Income
Nonwhite
Smokers
Active treatment group
Multiple trials
Time
PCA
Orthogonal Kaiser criterion
.45
None
EV, Eigenvalue; NR, not reported or unclear.
The next step is to determine which variables load on the
factors. Factor loadings are the correlations between the observed
variables and the factor. Because there is no statistical test, the
rule of thumb is to examine the magnitude of factor loadings.
Values greater than 60.30 are considered to meet the minimal
level, whereas loadings 60.40 are considered more important
to that factor.6 Ideally the goal is to have each variable load on
only 1 factor; however, in practice, some variables load greater
than 0.40 on more than 1 factor or may not load on any factor.
If this occurs, a researcher may decide to delete them from the
model and rerun the model to obtain a new factor solution. In contrast with previous asthma studies,2-5 Holt et al1 did not have any
variable load on multiple factors.
Secondary analyses often include testing the robustness of the
factors identified. This is done by repeating the FA in a second
sample, in a subsample, or over time. Previous asthma samples
have found little difference when replicating the results over
time,1,4 using multiple samples,4 or stratifying on demographics
such as age, sex, race, or socioeconomic variables,2,3 suggesting
that asthma status factors are robust. The final step is to interpret
the scientific meaning of the pattern of factor loadings and offer a
label for each factor. The label or name of the factor is not
assigned by the analysis but is developed by the researcher to
describe best the pattern of factor loadings and can be symbolic,
descriptive, or causal. It is important to note that factors may
represent the method of measurement such as self-report versus
laboratory values rather that a biological or mechanistic
relationship.
A common error is overinterpretation of FA by ascribing
greater meaning to the factors than can be presumed from the FA
results alone. In the article by Holt et al,1 for example, there is an
explicit assumption that the results of the FA suggest that the factors abstracted are meaningfully related to the assessment of a patient’s asthma status. Although it is true that there are previous
empirical data to support that the variables included in the FA
are important for the assessment of asthma status, the FA itself
cannot test the hypothesis that a multicomponent assessment
has clinical utility. FA also does not support the recommendation
by Holt et al1 that the selection of 1 variable from each factor is
sufficient for a comprehensive asthma assessment. What is clear
from this article and others, however, is that variables known to
be associated with asthma outcomes are not a unidimensional
construct.
A critical next step is to test the relevance of these constructs to
the prediction of important health outcomes. For example, what is
the relative importance of each factor identified by Holt et al1 to
specialist ratings of asthma severity, response to treatment, or the
occurrence of future exacerbations? Is the strength of the association between factors equal across all outcomes, or are some factors more sensitive to change? A benefit of FA is that factor scores
can be calculated (weighing each variable proportionally to its
loading on the factor) and used as variables in subsequent analyses. Factor scores are influenced by measurement error, so it is not
a perfect measure of the underlying construct; however, they can
be considered as weighted composite scores. This empirical approach to data reduction may increase statistical precision,
thereby reducing the sample size required.7 To date, no asthma
study has evaluated the clinical utility or predictive ability of
factor scores.
The article by Holt et al1 in this issue supports the asthma
guidelines8 and a growing body of literature suggesting that
asthma status is not unidimensional. The minor differences between studies conducting FAs more likely reflect the selection
of variables and varying analytic approaches rather than substantial differences in the underlying constructs. To advance our scientific knowledge of which constructs are essential to assess to
properly assign treatment and minimized negative health outcomes, future studies need to go beyond merely evaluating factor
J ALLERGY CLIN IMMUNOL
VOLUME 121, NUMBER 5
structure to testing the relative importance of each factor to relevant asthma health outcomes.
REFERENCES
1. Holt EW, Cook EF, Covar RA, Spahn J, Fuhbrigge AL. Identifying the components
of asthma health status in children with mild to moderate asthma. J Allergy Clin
Immunol 2008;121:1175-80.
2. Schatz M, Zeiger RS, Vollmer WM, Mosen D, Apter AJ, Stibolt TB, et al. Validation
of a beta-agonist long-term asthma control scale derived from computerized pharmacy data. J Allergy Clin Immunol 2006;117:995-1000.
3. Pillai SG, Tang Y, van den Oord E, Klotsman M, Barnes K, Carlsen K, et al. Factor
analysis in the Genetics of Asthma International Network family study identifies five
major quantitative asthma phenotypes. Clin Exp Allergy 2008;38:421-9.
RIEKERT AND EAKIN 1183
4. Juniper EF, Wisniewski ME, Cox FM, Emmett AH, Nielsen KE, O’Byrne PM.
Relationship between quality of life and clinical status in asthma: a factor analysis.
Eur Respir J 2004;23:287-91.
5. Leung TF, Wong GW, Ko FW, Lam CW, Fok TF. Clinical and atopic parameters and
airway inflammatory markers in childhood asthma: a factor analysis. Thorax 2005;
60:822-6.
6. Tabachnick BG, Fidell LS. Using multivariate statistics. 4th ed. New York: Allyn
and Bacon; 2001.
7. Freemantle N, Calvert M, Wood J, Eastaugh J, Griffin C. Composite outcomes in
randomized trials: greater precision but with greater uncertainty? JAMA 2003;
289:2554-9.
8. National Asthma Education and Prevention Program Expert Panel Report 3:
guidelines for the diagnosis and management of asthma. Bethesda (MD: National
Institutes of Health; 2007. Pub no. 08-4051.