Best practices for your confirmatory factor analysis

A JASP and lavaan tutorial

Author
Affiliation

Pablo Rogers

Universidade Federal de Uberlândia

Published

March 13, 2024

Other Formats
Abstract

This supplementary material provides the complete R/lavaan implementation for the confirmatory factor analysis (CFA) tutorial presented in Rogers (2024). The analyses use the WHOQOL-BREF instrument — a 24-item quality of life measure covering four domains: Psychological, Physical, Social, and Environment — with data from a Brazilian sample of 1,047 respondents (Rogers, 2021). The codes that generates the outputs for this tutorial article can be found in the notebook: https://phdpablo.github.io/cfa-brm/index-preview.html

Keywords

Confirmatory Factor Analysis, Structural Equation Modeling, Internal Structural Validity, JASP, lavaan

1 Introduction

A central challenge in applied CFA is that items are typically measured on ordinal Likert scales, yet many textbooks and software default to Maximum Likelihood (ML) estimation, which assumes continuous, multivariate normal data. As discussed in Rogers (2024), the appropriate approach for ordinal data is the Diagonally Weighted Least Squares (DWLS) estimator — implemented as WLSMV in lavaan — which operates on polychoric correlations rather than Pearson correlations and does not assume normality of observed variables.

ImportantTutorial Structure

This document is organized as a manuscript project: the Article presents the analytical narrative and results without showing code, while the Article Notebook displays the full R code behind every result. Both views share the same content — the only difference is code visibility.

2 Setup

We begin by loading all required R packages. The lavaan package provides the core CFA estimation engine; semTools extends it with reliability functions and additional diagnostics; semPlot produces path diagrams; psych provides descriptive statistics and omega coefficients for bifactor models; simsem enables Monte Carlo simulation for power analysis; and dynamic computes the Dynamic Fit Index (DFI) cutoffs.

We set global options and a seed for reproducibility. The seed ensures that all simulation-based results (power analysis) are exactly reproducible.

NoteA Note on Code Design

Throughout this tutorial, you will notice that some code patterns are repeated across sections — for example, extracting fit indices, formatting factor loadings, and building comparison tables. In a production environment, these routines would typically be modularized into reusable R functions and sourced from separate scripts.

We deliberately chose not to do this. Each analysis section is self-contained, with all code written explicitly and in sequence. This design sacrifices automation and conciseness in favor of transparency: the reader can follow each step without navigating between files or tracing function definitions. For a tutorial whose primary goal is to teach CFA best practices, we believe clarity of exposition is more valuable than code elegance.

Readers who wish to adapt this workflow for their own research are encouraged to refactor repeated patterns into functions and organize them into modular scripts — this is, in fact, a best practice for reproducible research projects.

3 Step 1: ETL — Extract, Transform, Load

The WHOQOL-BREF dataset is loaded directly from the Mendeley Data repository (Rogers, 2021). The data file contains N = 1,047 observations with 24 items. A separate label file provides the variable names following the naming convention: item number followed by a suffix indicating its domain (_P = Psychological, _F = Physical, _S = Social, _A = Environment). Items Q3, Q4, and Q26 have already been reverse-coded in the source data so that higher values consistently indicate better quality of life.

Table 1 shows the first six observations (n). All variables are integer-coded (1–5), reflecting the five-point Likert response format.

Table 1: First rows of the WHOQOL-BREF dataset
n1 n2 n3 n4 n5 n6
Q3_F 5 5 5 5 3 2
Q4_F 4 5 4 5 4 4
Q10_F 2 3 3 5 3 4
Q15_F 5 3 4 5 4 4
Q16_F 3 3 5 5 4 4
Q17_F 4 3 2 4 4 4
Q18_F 3 2 2 5 4 4
Q5_P 3 3 4 4 3 3
Q6_P 3 2 4 5 5 4
Q7_P 2 2 2 5 3 3
Q11_P 2 3 4 5 2 4
Q19_P 2 3 3 4 4 4
Q26_P 4 4 2 5 4 4
Q20_S 4 2 4 4 4 4
Q21_S 2 3 5 5 5 2
Q22_S 2 2 5 5 4 4
Q8_A 4 3 3 4 2 3
Q9_A 3 2 2 4 2 3
Q12_A 3 2 3 5 3 2
Q13_A 4 2 4 5 3 4
Q14_A 3 3 3 4 2 3
Q23_A 2 2 5 5 5 4
Q24_A 2 4 5 3 3 2
Q25_A 2 2 5 5 3 4
Source: Article Notebook

Table 2 presents the descriptive statistics for all 24 items. Means range from approximately 3.0 to 4.0, with standard deviations around 0.8–1.1, indicating moderate variability across items. No floor or ceiling effects are evident, and skewness values are generally within acceptable bounds (|sk| < 1), supporting the use of polychoric correlations. These statistics are consistent with the original validation study (Rogers, 2024).

Table 2: Descriptive statistics for all WHOQOL-BREF items
vars n mean sd median trimmed mad min max range skew kurtosis se
Q3_F 1 1047 4.17 0.93 4 4.31 1.48 1 5 4 -0.99 0.27 0.03
Q4_F 2 1047 4.10 1.01 4 4.26 1.48 1 5 4 -1.02 0.31 0.03
Q10_F 3 1047 3.61 0.91 4 3.65 1.48 1 5 4 -0.18 -0.43 0.03
Q15_F 4 1047 4.42 0.79 5 4.57 0.00 1 5 4 -1.52 2.41 0.02
Q16_F 5 1047 3.50 1.03 4 3.49 1.48 2 5 3 -0.13 -1.16 0.03
Q17_F 6 1047 3.72 0.92 4 3.78 0.00 2 5 3 -0.51 -0.53 0.03
Q18_F 7 1047 3.75 0.96 4 3.81 1.48 2 5 3 -0.50 -0.64 0.03
Q5_P 8 1047 3.29 0.82 3 3.35 1.48 1 5 4 -0.44 -0.09 0.03
Q6_P 9 1047 3.90 0.99 4 4.02 1.48 1 5 4 -0.88 0.48 0.03
Q7_P 10 1047 3.61 0.86 4 3.66 1.48 1 5 4 -0.53 0.12 0.03
Q11_P 11 1047 3.82 0.99 4 3.92 1.48 1 5 4 -0.62 -0.09 0.03
Q19_P 12 1047 3.72 0.95 4 3.77 1.48 2 5 3 -0.42 -0.70 0.03
Q26_P 13 1047 3.91 0.92 4 4.03 0.00 1 5 4 -1.19 1.68 0.03
Q20_S 14 1047 3.64 0.89 4 3.67 1.48 2 5 3 -0.25 -0.66 0.03
Q21_S 15 1047 3.46 1.00 4 3.45 1.48 2 5 3 -0.05 -1.09 0.03
Q22_S 16 1047 3.58 0.88 4 3.60 1.48 2 5 3 -0.16 -0.69 0.03
Q8_A 17 1047 2.94 0.98 3 2.97 1.48 1 5 4 -0.11 -0.43 0.03
Q9_A 18 1047 3.47 0.91 4 3.52 1.48 1 5 4 -0.64 0.26 0.03
Q12_A 19 1047 3.45 1.10 3 3.50 1.48 1 5 4 -0.25 -0.54 0.03
Q13_A 20 1047 3.93 0.82 4 3.98 1.48 1 5 4 -0.57 0.15 0.03
Q14_A 21 1047 3.34 0.96 3 3.34 1.48 1 5 4 -0.29 -0.26 0.03
Q23_A 22 1047 3.98 0.97 4 4.09 1.48 2 5 3 -0.70 -0.45 0.03
Q24_A 23 1047 3.73 0.97 4 3.79 1.48 2 5 3 -0.40 -0.79 0.03
Q25_A 24 1047 3.83 0.98 4 3.91 1.48 2 5 3 -0.50 -0.72 0.03
NoteData Overview

The dataset contains N = 1047 observations on 24 variables. All items are measured on a 5-point Likert scale (1–5), representing ordinal data. This ordinal nature is fundamental: it requires the use of polychoric correlations and DWLS estimation rather than Pearson correlations and ML.

4 Population Model

Before fitting CFA models to our sample data, we define a population model that serves two purposes: (1) computing Dynamic Fit Index (DFI) cutoffs tailored to this specific model structure and sample size, and (2) conducting a priori power analysis via simulation.

The population model is based on the meta-analytic factor loadings from Lin & Yao (2022), who validated the WHOQOL-BREF factor structure using a meta-analysis of exploratory factor analyses combined with social network analysis. Factor correlations were set at 0.3, reflecting the upper bound reported in Lin & Yao (2022) (range 0.08–0.30). Residual variances were derived analytically as 1 − λ² (except for Q3 and Q4, whose residuals are adjusted for their predicted correlation).

Given the ordinal nature of the variables, we assume equidistant thresholds following a linearity assumption with proportions of approximately 12%, 23%, 31%, 23%, and 12%. For a five-point Likert scale, this translates to threshold values of −1.2, −0.4, 0.4, and 1.2. While it is unlikely that thresholds from previous studies will be available, if response frequencies per category per item are provided, thresholds can be estimated using the inverse normal distribution.

ImportantImportant Caveat

This population model should be understood as a teaching illustration. The loadings from Lin & Yao (2022) come primarily from exploratory factor analysis studies using principal components with varimax rotation — a method that has well-known limitations (Rogers, 2022). Ideally, one should seek a robust national survey that has employed CFA with ordinal data for the WHOQOL-BREF, providing more appropriate starting values.

4.1 Dynamic Fit Index (DFI)

The Dynamic Fit Index (McNeish, 2023) provides simulation-based cutoff values for fit indices that are tailored to a specific model structure and sample size, rather than relying on generic rules of thumb (e.g., CFI > .95, RMSEA < .06). The DFI computation requires the population model defined above.

Your DFI cutoffs: 
            SRMR  RMSEA CFI   Magnitude
Level-0     0.024 0.012 0.998 NONE     
Specificity 95%   95%   95%            
                                       
Level-1     0.038 0.037 0.985 0.37     
Sensitivity 95%   95%   95%            
                                       
Level-2     0.064 0.083 0.934 0.599    
Sensitivity 95%   95%   95%            
                                       
Level-3     0.078 0.105 0.904 0.58     
Sensitivity 95%   95%   95%            

 Notes:
  -Number of levels is based on the number of factors in the model
  -'Sensitivity' is % of hypothetically misspecified models correctly identified by cutoff in DFI simulation
  -Cutoffs with 95% sensitivity are reported when possible
  -If sensitivity is <50%, cutoffs will be supressed 

 The distributions for each level are in the Plots tab 
[[1]]


[[2]]


[[3]]

Note that we could compute a separate DFI for each estimated model (assuming each in turn as the population model). However, this would sacrifice comparability across models. Moreover, DFI methods for bifactor models are not yet available, and those for hierarchical models are still in development.

5 4-Factor CFA Model

The first model tested is the standard four correlated factor model, which represents the theoretical structure of the WHOQOL-BREF (Lin & Yao, 2022).

5.1 Model Specification

Each of the 24 items loads on exactly one of the four latent factors: Psychological (6 items), Physical (7 items), Social (3 items), and Environment (8 items). Latent factors are allowed to freely correlate, reflecting the expected interrelations among quality of life domains. A residual covariance between items Q3 and Q4 is included based on prior theoretical and empirical evidence, as both items assess related aspects within the Physical domain.

5.2 Model Estimation

The estimation uses DWLS (WLSMV in lavaan) with all items declared as ordinal (ordered = TRUE), and factor variances are fixed to 1 (std.lv = TRUE) for identification.

5.3 Model Evaluation

TipHow to Read CFA Output

When evaluating a CFA model, watch for: (1) Heywood cases — standardized loadings > 1 or negative variance estimates, which indicate estimation problems; (2) Overall fit — SRMR < .08, RMSEA < .06, CFI/TLI > .95 using the scaled versions for WLSMV (For reference only, as the correct approach would be to use the cutoffs derived from the DFI); (3) Local fit — standardized residuals > |2|, and modification indices (MI) > 3.84; (4) Reliability — GLB and composite reliability (omega) should exceed .70 for adequate internal consistency.

Table 3: Fit indices for the 4-factor CFA model
χ² df p CFI TLI RMSEA RMSEA 90% CI SRMR
1732.82 245 0 0.941 0.934 0.076 [0.073, 0.08] 0.057
Table 4: Standardized factor loadings for the 4-factor model
Factor Item Std.Loading SE p
psycho Q5_P 0.740 0.017 < .001
psycho Q6_P 0.670 0.018 < .001
psycho Q7_P 0.664 0.018 < .001
psycho Q11_P 0.659 0.020 < .001
psycho Q19_P 0.860 0.012 < .001
psycho Q26_P 0.640 0.020 < .001
physical Q3_F 0.508 0.026 < .001
physical Q4_F 0.423 0.029 < .001
physical Q10_F 0.817 0.012 < .001
physical Q15_F 0.639 0.025 < .001
physical Q16_F 0.593 0.023 < .001
physical Q17_F 0.921 0.007 < .001
physical Q18_F 0.886 0.009 < .001
social Q20_S 0.818 0.018 < .001
social Q21_S 0.583 0.027 < .001
social Q22_S 0.751 0.020 < .001
environment Q8_A 0.510 0.028 < .001
environment Q9_A 0.624 0.024 < .001
environment Q12_A 0.712 0.021 < .001
environment Q13_A 0.649 0.023 < .001
environment Q14_A 0.766 0.020 < .001
environment Q23_A 0.571 0.027 < .001
environment Q24_A 0.482 0.028 < .001
environment Q25_A 0.529 0.028 < .001
Table 5: Factor correlations for the 4-factor model
Factor 1 Factor 2 r SE p
psycho physical 0.909 0.010 < .001
psycho social 0.771 0.019 < .001
psycho environment 0.725 0.021 < .001
physical social 0.638 0.024 < .001
physical environment 0.630 0.022 < .001
social environment 0.550 0.028 < .001

For the complete lavaan output — including unstandardized estimates, thresholds, variance parameters, and R² values — run the following command in your R console after fitting the model:

summary(est_4fa, fit.measures = TRUE, standardized = TRUE, rsq = TRUE)

This produces a comprehensive printout that is useful for diagnostic purposes but too extensive for publication. The key results have been extracted into the tables above.

Examining the output, the scaled chi-square test is statistically significant (expected for large N), but this alone is not informative for model evaluation. The scaled fit indices reveal a mediocre fit: the CFI and TLI values fall slightly below the conventional .95 threshold, and the RMSEA exceeds .06. The SRMR is within acceptable range. This pattern is consistent with a Level 2 misspecification according to DFI standards — the model captures the general structure but has notable local misfits.

Regarding factor loadings, most items show adequate standardized loadings (Std.all > .50), indicating substantial association with their respective factors. However, Q4 (Physical) shows a relatively lower loading, and the R² values for Q4 (Physical) and Q24 (Environment) are notably low, suggesting these items are not well-explained by their assigned factor.

5.3.1 Modification Indices

Modification indices (MI) estimate the expected decrease in the chi-square statistic if a currently fixed parameter were freed. An MI > 3.84 is statistically significant at α = .05, and practically relevant MIs typically exceed 10. However, not all high MIs should be acted upon — only those with theoretical justification.

Table 6: Top 20 modification indices for the 4-factor model
lhs op rhs mi epc
289 Q5_P ~~ Q14_A 177.218 0.278
254 environment =~ Q5_P 175.788 0.503
544 Q24_A ~~ Q25_A 124.552 0.282
216 physical =~ Q5_P 119.025 -1.039
467 Q17_F ~~ Q18_F 92.645 0.162
263 environment =~ Q15_F 89.086 0.347
214 psycho =~ Q24_A 80.033 -0.333
231 physical =~ Q24_A 78.731 -0.289
200 psycho =~ Q10_F 55.182 0.631
265 environment =~ Q17_F 48.628 -0.251
266 environment =~ Q18_F 44.622 -0.236
280 Q5_P ~~ Q17_F 44.099 -0.193
281 Q5_P ~~ Q18_F 41.613 -0.188
258 environment =~ Q19_P 37.766 -0.254
428 Q10_F ~~ Q18_F 34.964 -0.123
252 social =~ Q24_A 33.765 -0.206
203 psycho =~ Q17_F 33.378 -0.549
394 Q3_F ~~ Q15_F 32.487 0.181
212 psycho =~ Q14_A 31.896 0.221
223 physical =~ Q21_S 30.562 0.233

The modification indices reveal a critical finding: item Q5 appears in multiple high-MI entries, suggesting cross-loadings on factors other than Psychological. This is a strong empirical signal that Q5 may not function as intended within the four-factor structure. Additionally, several MIs suggest correlated residuals between items across domains, reflecting the high correlation between Physical and Psychological factors.

WarningRed Flag: Item Q5

When a single item appears repeatedly among the highest modification indices, this suggests a systematic misfit rather than isolated local strain. Freeing individual parameters (e.g., adding cross-loadings) is generally not recommended as a post hoc strategy — it risks capitalizing on sample-specific patterns. The better approach is to evaluate whether the item should be removed from the model.

5.3.2 Residual Correlations

Standardized residuals greater than |2| indicate item pairs whose empirical correlation is poorly reproduced by the model. The correlation residuals below show the discrepancies between model-implied and observed correlations.

Figure 1: Heatmap of standardized residual correlations for the 4-factor model

Figure 1 provides a global view of local misfit. Color intensity reflects the absolute magnitude of standardized residuals — darker red indicates poorer reproduction of that pairwise correlation by the model, regardless of direction. Cell labels retain the sign: positive values indicate the model underestimates the correlation, negative values indicate overestimation. Pairs exceeding |2| deserve closer inspection.

The residual analysis confirms the patterns identified by the modification indices: the largest standardized residuals involve Q5 and items from other domains, reinforcing the conclusion that this item introduces systematic misfit.

For the complete residual matrices — including the correlation residuals and the full 24×24 standardized residual matrix — run the following commands in your R console after fitting the model:

residuals(est_4fa, type = "cor")
residuals(est_4fa, type = "standardized")$cov

The first command returns the difference between observed and model-implied correlations. The second returns standardized residuals, where values exceeding |2| indicate item pairs whose association is poorly reproduced by the model. The table and heatmap above summarize the most relevant results from these matrices.

5.3.3 Reliability Coefficients

We compute reliability using semTools::reliability(), which provides: alpha (Cronbach’s), omega (composite reliability, preferred for CFA), omega2 (Bentler’s), omega3 (McDonald’s), and AVE (average variance extracted). AVE > .50 indicates that the factor captures more variance from its items than is due to measurement error.

Table 7: Reliability coefficients for the 4-factor model
psycho physical social environment
alpha 0.820 0.823 0.687 0.787
alpha.ord 0.854 0.863 0.741 0.820
omega 0.821 0.801 0.707 0.794
omega2 0.821 0.801 0.707 0.794
omega3 0.823 0.811 0.723 0.797
avevar 0.504 0.500 0.524 0.375

All four domains show composite reliability (omega) above .70, indicating adequate internal consistency. The AVE values are acceptable for most domains, with the Environment domain showing a slightly lower value — consistent with its broader and more heterogeneous item content. The Social domain, despite having only three items, shows strong reliability due to the high factor loadings of Q20, Q21, and Q22.

5.3.4 Preliminary Evaluation Summary

Based on the overall and local fit analysis:

  • Overall fit is mediocre, compatible with a Level 2 misspecification (DFI)
  • Local indices indicate: (a) high correlation between Physical and Psychological factors (suggesting potential hierarchical structure);
    1. low factor loadings for Q4 and Q24; (c) item Q5 is systematically problematic — it appears in many high MIs and contributes to local misfit
  • Reliability is adequate for all domains

5.4 Path Diagrams

The path diagram below visualize the estimated model. Latent factors are shown as ovals, observed items as rectangles, and standardized factor loadings as edge labels. The circle layout better displays interfactor correlations.

NoteLayout Note

Consider this layout for educational purposes, as for your article you should create something better formatted, which will almost always be done by hand.

Figure 2: Path diagram of the 4-factor CFA model (circle layout)

The circle layout in Figure 2 reveals the strong correlations among all four factors, particularly between Psychological and Physical domains. This pattern motivates the exploration of bifactor and second-order models in the following sections.

6 4-Factor Bifactor Model

Given the high interfactor correlations observed in the 4-factor model, a bifactor structure is a natural alternative. In a bifactor model, a general factor (here, overall Quality of Life — QOL) loads directly on all items alongside specific group factors for each domain. The critical assumption is that all factors — general and specific — are orthogonal (uncorrelated).

This model tests whether there is a meaningful general QOL construct underlying all items, and how much unique variance remains in each specific domain after accounting for the general factor. Item Q5 is excluded from this model due to convergence problems, which further supports the preliminary evidence that Q5 is problematic.

NoteOn Bifactor Models

Bifactor models have gained popularity in psychometrics because they often provide better fit indices than correlated-factor models. However, as discussed in Rogers (2024), some authors argue that this improved fit is partly an artifact of the greater number of parameters, and that the orthogonality assumption rarely holds in practice. These considerations will be important when comparing models.

6.1 Model Specification

In bifactor model syntax, the general factor must be specified first. The Q3–Q4 residual correlation is no longer needed because both items now load on the general factor, which absorbs their shared variance.

6.2 Model Estimation

The orthogonal = TRUE argument is required for bifactor models, enforcing zero correlations between all factors.

6.3 Model Evaluation

Table 8: Fit indices for the bifactor 4-factor CFA model
χ² df p CFI TLI RMSEA RMSEA 90% CI SRMR
1212.68 207 0 0.958 0.949 0.068 [0.064, 0.072] 0.049
Table 9: Standardized factor loadings for the bifactor 4-factor model
Factor Item Std.Loading SE p
QOL Q6_P 0.632 0.022 < .001
QOL Q7_P 0.668 0.019 < .001
QOL Q11_P 0.666 0.020 < .001
QOL Q19_P 0.851 0.012 < .001
QOL Q26_P 0.634 0.022 < .001
QOL Q3_F 0.422 0.030 < .001
QOL Q4_F 0.349 0.031 < .001
QOL Q10_F 0.785 0.014 < .001
QOL Q15_F 0.584 0.027 < .001
QOL Q16_F 0.573 0.023 < .001
QOL Q17_F 0.875 0.012 < .001
QOL Q18_F 0.842 0.013 < .001
QOL Q20_S 0.598 0.021 < .001
QOL Q21_S 0.452 0.026 < .001
QOL Q22_S 0.541 0.024 < .001
QOL Q8_A 0.372 0.028 < .001
QOL Q9_A 0.454 0.026 < .001
QOL Q12_A 0.499 0.024 < .001
QOL Q13_A 0.461 0.025 < .001
QOL Q14_A 0.527 0.024 < .001
QOL Q23_A 0.380 0.029 < .001
QOL Q24_A 0.246 0.031 < .001
QOL Q25_A 0.328 0.029 < .001
psycho Q6_P 0.696 0.479 0.146
psycho Q7_P 0.059 0.051 0.247
psycho Q11_P 0.021 0.032 0.502
psycho Q19_P 0.085 0.066 0.2
psycho Q26_P 0.135 0.102 0.184
physical Q3_F 0.690 0.042 < .001
physical Q4_F 0.606 0.039 < .001
physical Q10_F 0.116 0.030 < .001
physical Q15_F 0.290 0.037 < .001
physical Q16_F 0.059 0.037 0.107
physical Q17_F 0.268 0.031 < .001
physical Q18_F 0.257 0.031 < .001
social Q20_S 0.566 0.052 < .001
social Q21_S 0.269 0.035 < .001
social Q22_S 0.567 0.052 < .001
environment Q8_A 0.300 0.032 < .001
environment Q9_A 0.386 0.029 < .001
environment Q12_A 0.485 0.026 < .001
environment Q13_A 0.453 0.028 < .001
environment Q14_A 0.441 0.028 < .001
environment Q23_A 0.468 0.029 < .001
environment Q24_A 0.603 0.028 < .001
environment Q25_A 0.518 0.027 < .001

The bifactor model shows improved fit compared to the original 4-factor model, with slightly better CFI, TLI, and RMSEA values. However, the improvement is modest and the fit remains in the mediocre range (Level 2 of DFI).

Examining the standardized loadings, the general QOL factor captures substantial variance from most items. Notably, the Psychological specific factor shows very low loadings — most of the variance that was previously attributed to the Psychological domain is now absorbed by the general factor. This pattern suggests that psychological well-being items are primarily indicators of overall QOL rather than a distinct domain.

Table 10: Top 20 modification indices for the bifactor model
lhs op rhs mi epc
462 Q17_F ~~ Q18_F 155.946 0.215
280 environment =~ Q15_F 114.813 0.300
387 Q3_F ~~ Q4_F 58.434 0.819
283 environment =~ Q18_F 46.170 -0.177
286 environment =~ Q22_S 45.651 0.170
539 Q24_A ~~ Q25_A 44.606 0.231
282 environment =~ Q17_F 42.692 -0.170
166 social ~~ environment 38.239 0.196
269 social =~ Q23_A 36.821 0.232
164 physical ~~ social 34.825 -0.274
162 psycho ~~ social 33.459 0.280
252 social =~ Q6_P 31.214 0.184
526 Q12_A ~~ Q14_A 31.078 0.152
509 Q22_S ~~ Q23_A 30.368 0.165
512 Q8_A ~~ Q9_A 29.681 0.156
391 Q3_F ~~ Q17_F 29.502 -0.244
392 Q3_F ~~ Q18_F 28.622 -0.239
448 Q15_F ~~ Q25_A 26.451 0.154
241 physical =~ Q20_S 23.444 -0.174
408 Q4_F ~~ Q18_F 23.223 -0.198
Figure 3: Heatmap of standardized residual correlations for the bifactor 4-factor model

6.3.1 Reliability

For bifactor models, omega from psych::omegaFromSem() provides the most appropriate reliability decomposition: omega hierarchical (ωH) estimates the proportion of total score variance attributable to the general factor, while omega subscale (ωS) estimates the proportion attributable to each specific factor after controlling for the general factor.

Omega Coefficients - Bifactor Model
Dimension ωT ωH ωS
General Factor 0.942 0.837 0.111
Psychological 0.864 0.797 0.067
Physical 0.926 0.731 0.195
Social 0.758 0.427 0.331
Environmental 0.847 0.376 0.471

The omega hierarchical value indicates that the general QOL factor accounts for a substantial proportion of reliable variance. The Psychological specific factor contributes very little unique reliable variance beyond the general factor, confirming the pattern observed in the loadings. The Physical, Social, and Environment specific factors retain more meaningful unique variance.

6.3.2 Preliminary Evaluation Summary

  • Overall fit has improved but remains mediocre (Level 2 of DFI, although not directly comparable)
  • The general (G) factor is statistically relevant and explains substantial variance across all items
  • The Psychological specific factor loses most of its importance — its variance is almost entirely captured by the general QOL factor
  • The orthogonality assumption required for this model is a strong restriction that may not hold empirically

6.4 Path Diagram

The tree3 layout is specifically designed for bifactor diagrams, placing the general factor centrally and specific factors on the sides.

Figure 4: Path diagram of the bifactor CFA model
NotePath Diagram Layout

You can test other formatting parameters (edge.label.cex, label.cex, edge.label.position, etc) to simulate path diagrams with better layouts.

7 Second-Order CFA Model

An alternative to the bifactor model is the second-order (hierarchical) model. Here, the general QOL factor does not load directly on items. Instead, it loads on the four first-order factors, which in turn load on the observed items. This structure implies that the influence of the general factor on individual items is mediated by the domain-specific factors — a conceptually different interpretation from the bifactor model.

7.1 Model Specification

Item Q5 is excluded for comparability with the bifactor model. A technical requirement is that the Psychological factor variance must be fixed to zero for model convergence (empirical underidentification), suggesting that essentially all Psychological domain variance is explained by the higher-order QOL factor.

WarningConvergence Note

Fixing the Psychological factor variance to zero (psycho ~~ 0*psycho) is necessary for model convergence. This is not merely a technical trick — it has substantive meaning: nearly all variance in the Psychological domain is captured by the general QOL factor. This finding is consistent with the bifactor results, where the Psychological specific factor had minimal importance.

7.2 Model Estimation

Note that std.lv is not used here because the second-order structure imposes its own identification constraints through the higher-order factor loadings.

7.3 Model Evaluation

The second-order model fit is comparable to the other models, though the convergence issue with the Psychological domain is a notable limitation. The second-order loadings (QOL → first-order factors) reveal the relative contribution of each domain to overall quality of life.

Table 11: Fit indices for the 2nd order 4-factor CFA model
χ² df p CFI TLI RMSEA RMSEA 90% CI SRMR
1285.17 226 0 0.956 0.951 0.067 [0.063, 0.071] 0.054
Table 12: Standardized factor loadings for the 2nd order 4-factor model
Factor Item Std.Loading SE p
psycho Q6_P 0.660 0.019 < .001
psycho Q7_P 0.678 0.018 < .001
psycho Q11_P 0.671 0.020 < .001
psycho Q19_P 0.872 0.012 < .001
psycho Q26_P 0.648 0.020 < .001
physical Q3_F 0.509 0.026 < .001
physical Q4_F 0.422 0.029 < .001
physical Q10_F 0.811 0.012 < .001
physical Q15_F 0.642 0.025 < .001
physical Q16_F 0.590 0.023 < .001
physical Q17_F 0.923 0.007 < .001
physical Q18_F 0.888 0.009 < .001
social Q20_S 0.818 0.019 < .001
social Q21_S 0.588 0.027 < .001
social Q22_S 0.747 0.020 < .001
environment Q8_A 0.510 0.028 < .001
environment Q9_A 0.630 0.025 < .001
environment Q12_A 0.715 0.021 < .001
environment Q13_A 0.663 0.023 < .001
environment Q14_A 0.731 0.021 < .001
environment Q23_A 0.579 0.027 < .001
environment Q24_A 0.492 0.028 < .001
environment Q25_A 0.541 0.027 < .001
QOL psycho 1.000 0.000 NA
QOL physical 0.912 0.010 < .001
QOL social 0.740 0.020 < .001
QOL environment 0.685 0.021 < .001
Table 13: Top 20 modification indices for the 2nd order model
lhs op rhs mi epc
533 Q24_A ~~ Q25_A 115.469 0.274
252 environment =~ Q15_F 94.778 0.673
456 Q17_F ~~ Q18_F 82.991 0.156
206 psycho =~ Q24_A 74.190 -0.449
280 QOL =~ Q24_A 74.190 -0.449
222 physical =~ Q24_A 72.314 -0.523
192 psycho =~ Q10_F 54.814 1.152
266 QOL =~ Q10_F 54.814 1.152
255 environment =~ Q18_F 36.927 -0.394
254 environment =~ Q17_F 36.682 -0.395
417 Q10_F ~~ Q18_F 32.049 -0.119
383 Q3_F ~~ Q15_F 31.584 0.179
195 psycho =~ Q17_F 28.227 -0.929
269 QOL =~ Q17_F 28.227 -0.929
242 social =~ Q24_A 27.741 -0.206
503 Q22_S ~~ Q23_A 24.588 0.147
532 Q23_A ~~ Q25_A 24.194 0.141
208 physical =~ Q6_P 22.864 -1.054
215 physical =~ Q22_S 22.662 -0.599
438 Q15_F ~~ Q13_A 22.272 0.148
Figure 5: Heatmap of standardized residual correlations for the 2nd order 4-factor model

7.3.1 Reliability

For the first-order factors, we use semTools::reliability(). For the second-order QOL factor, semTools::reliabilityL2() provides the appropriate decomposition.

Table 14: Reliability coefficients for first-order factors (second-order model)
psycho physical social environment
alpha 0.793 0.823 0.687 0.787
alpha.ord 0.831 0.863 0.741 0.820
omega 0.794 0.801 0.708 0.795
omega2 0.794 0.801 0.708 0.795
omega3 0.794 0.810 0.725 0.802
avevar 0.505 0.499 0.525 0.377
Table 15: Second-order factor (QOL) reliability
omegaL1 omegaL2 partialOmegaL1
0.838 0.899 0.926

The second-order QOL factor shows adequate reliability, indicating that the four first-order domains collectively serve as reliable indicators of overall quality of life. First-order reliability values remain consistent with those observed in the standard 4-factor model.

7.4 Path Diagram

Figure 6: Path diagram of the second-order CFA model

8 4-Factor Model Without Item Q5

The analyses above converged on a consistent finding: item Q5 is problematic. It appeared in multiple high modification indices in the original 4-factor model, caused convergence problems in the bifactor model, and its removal was necessary for comparability across alternative models. This is a common scenario in applied CFA — items that appear sound from a content validity perspective may not perform well empirically in a given sample or cultural context.

8.1 Model Specification

This section presents the revised 4-factor model excluding Q5, retaining the correlated residuals between Q3 and Q4. This model will serve as the basis for the final model comparison and post hoc power analysis.

8.2 Model Estimation

8.3 Model Evaluation

Table 16: Fit indices for the 4-factor CFA model (Q5 excluded)
χ² df p CFI TLI RMSEA RMSEA 90% CI SRMR
1285.46 223 0 0.956 0.95 0.067 [0.064, 0.071] 0.053
Table 17: Standardized factor loadings for the 4-factor model (Q5 excluded)
Factor Item Std.Loading SE p
psycho Q6_P 0.659 0.019 < .001
psycho Q7_P 0.677 0.018 < .001
psycho Q11_P 0.671 0.019 < .001
psycho Q19_P 0.869 0.012 < .001
psycho Q26_P 0.648 0.020 < .001
physical Q3_F 0.510 0.026 < .001
physical Q4_F 0.423 0.029 < .001
physical Q10_F 0.811 0.012 < .001
physical Q15_F 0.643 0.025 < .001
physical Q16_F 0.590 0.023 < .001
physical Q17_F 0.923 0.007 < .001
physical Q18_F 0.888 0.009 < .001
social Q20_S 0.819 0.019 < .001
social Q21_S 0.586 0.027 < .001
social Q22_S 0.748 0.020 < .001
environment Q8_A 0.510 0.028 < .001
environment Q9_A 0.629 0.025 < .001
environment Q12_A 0.715 0.021 < .001
environment Q13_A 0.663 0.023 < .001
environment Q14_A 0.732 0.021 < .001
environment Q23_A 0.580 0.027 < .001
environment Q24_A 0.492 0.028 < .001
environment Q25_A 0.542 0.027 < .001
Table 18: Factor correlations for the 4-factor model (Q5 excluded)
Factor 1 Factor 2 r SE p
psycho physical 0.922 0.010 < .001
psycho social 0.755 0.019 < .001
psycho environment 0.659 0.024 < .001
physical social 0.638 0.024 < .001
physical environment 0.630 0.022 < .001
social environment 0.550 0.028 < .001

After removing Q5, the overall fit shows improvement. All remaining items exhibit adequate standardized loadings, and the model converges without issues. The Q5 removal resolves the most prominent source of local misfit identified in the original 4-factor model.

Table 19: Top 20 modification indices for the 4-factor model (Q5 excluded)
lhs op rhs mi epc
510 Q24_A ~~ Q25_A 114.934 0.273
252 environment =~ Q15_F 94.407 0.351
433 Q17_F ~~ Q18_F 84.659 0.157
222 physical =~ Q24_A 78.505 -0.281
206 psycho =~ Q24_A 77.213 -0.289
254 environment =~ Q17_F 43.449 -0.232
255 environment =~ Q18_F 42.877 -0.228
192 psycho =~ Q10_F 42.388 0.704
208 physical =~ Q6_P 36.087 -0.700
242 social =~ Q24_A 33.243 -0.203
394 Q10_F ~~ Q18_F 31.926 -0.118
360 Q3_F ~~ Q15_F 31.211 0.178
214 physical =~ Q21_S 31.008 0.235
198 psycho =~ Q21_S 26.547 0.277
509 Q23_A ~~ Q25_A 23.889 0.140
415 Q15_F ~~ Q13_A 21.582 0.146
376 Q4_F ~~ Q15_F 21.431 0.147
416 Q15_F ~~ Q14_A 20.568 0.143
480 Q22_S ~~ Q23_A 18.885 0.131
224 social =~ Q6_P 18.702 0.220

The modification indices are notably reduced compared to the original model. While some MIs remain above 3.84, they are smaller in magnitude and do not show the systematic pattern of misfit associated with Q5.

Figure 7: Heatmap of standardized residual correlations for the 4-factor model (Q5 excluded)
Table 20: Reliability coefficients for the revised 4-factor model
psycho physical social environment
alpha 0.793 0.823 0.687 0.787
alpha.ord 0.831 0.863 0.741 0.820
omega 0.793 0.801 0.708 0.795
omega2 0.793 0.801 0.708 0.795
omega3 0.792 0.811 0.725 0.802
avevar 0.504 0.500 0.524 0.377

Reliability for the Psychological domain remains adequate even after removing Q5, confirming that the item was not essential for capturing the construct. The remaining five items (Q6, Q7, Q11, Q19, Q26) provide a reliable measurement of psychological well-being. All other domains maintain their previous reliability levels.

8.4 Path Diagram

Figure 8: Path diagram of the revised 4-factor CFA model (without Q5)

9 Model Comparison

We now systematically compare the four models on key fit indices: SRMR, RMSEA (scaled), and CFI (scaled). These three indices capture complementary aspects of model fit — SRMR assesses the average discrepancy in correlations, RMSEA penalizes for model complexity, and CFI compares the model against a baseline independence model.

Table 21: Fit indices comparison across all CFA models
Model SRMR RMSEA CFI
4-Factor (all items) 0.057 0.076 0.941
Bifactor 0.049 0.068 0.958
Second-Order 0.054 0.067 0.956
4-Factor (no Q5) 0.053 0.067 0.956

Table 21 shows that the four models exhibit a similar level of misspecification, although the original 4-factor model (with all items) performs slightly worse. The bifactor model shows a marginally better fit, which aligns with findings in the literature — this is likely due to the greater number of free parameters in bifactor structures, which mechanically tend to improve fit indices. As noted by Rogers (2024), some authors argue that bifactor models have been used to “salvage” traditional scales that are no longer sustainable under modern estimation and item selection techniques.

ImportantModel Selection Recommendation

A known limitation of bifactor models is the assumption that specific factors are uncorrelated — an assumption that frequently does not hold in practice. Additionally, bifactor models can suffer from empirical underidentification and may capitalize on sample-specific variance.

The second-order model also showed convergence problems, requiring the Psychological factor variance to be fixed to zero.

Based on empirical evidence and practical considerations, the four correlated factor model excluding item Q5 is recommended for measuring WHOQOL-BREF in this sample. This model:

  • Has acceptable overall and local fit
  • Avoids the strong orthogonality assumption of bifactor models
  • Is easier to interpret and communicate to non-technical audiences
  • Has no convergence issues
  • Maintains adequate reliability (omega > .70) across all domains
  • Is the most parsimonious adequate representation of the data

10 Post Hoc Power Analysis

The final step in our CFA workflow is a post hoc power analysis assessing whether the sample size (N = 1047) provides adequate statistical power for parameter estimation and misfit detection. We use Monte Carlo simulation via the simsem package: data are repeatedly generated from the fitted model and re-estimated, yielding empirical distributions of parameter estimates and fit indices.

NoteWhy Post Hoc Power Analysis?

While a priori power analysis is ideal for study planning, post hoc simulation-based power analysis provides essential information about: (1) parameter stability — whether estimates are consistent across replications; (2) power to detect non-zero parameters — whether loadings and correlations are reliably different from zero; (3) coverage — whether 95% confidence intervals perform as expected; and (4) bias — whether estimates and standard errors are unbiased.

We use the revised four-factor model (without Q5) as both the generating and analysis model. By generating data from the fitted lavaan object, we preserve the ordinal nature of the items and the estimated polychoric structure.

10.1 Define Cutoff Values

10.2 Run Simulation

Warning

To reduce computational time, we run 100 replications. For final published results, 1,000–5,000 replications are recommended for stable estimates. More replications yield narrower confidence intervals around power estimates but increase computation time proportionally.

10.3 Fit Index Cutoffs

Simulation-based cutoffs at α = .05 provide model-specific and sample-specific alternatives to generic rules of thumb. These cutoffs represent the 95th (or 5th, for CFI) percentile of fit index distributions under the assumption that the model is correctly specified.

Table 22: Simulation-based fit index cutoffs at alpha = 0.05
95%
srmr 0.0238
rmsea.scaled 0.0122
cfi.scaled 1.0000

10.4 Power to Detect Misfit

Table 23 shows the proportion of replications where fit indices exceeded rule-of-thumb cutoffs. Low values (near 0) indicate that the correctly specified model consistently shows good fit — which is the desired result.

Table 23: Proportion of replications indicating worse fit than rule-of-thumb cutoffs
cfi.scaled rmsea.scaled srmr
1 0 0
TipInterpreting Power to Detect Misfit

Low proportions in Table 23 mean that the model fits well in most replications — we have low power to reject it when it is correctly specified, which is exactly what we want. If these proportions were high, it would suggest that even when the model is correct, our sample size causes the fit indices to exceed cutoffs (i.e., the cutoffs are too strict for our model and N).

10.5 Sampling Distributions of Fit Indices

Figure 9 shows the empirical distributions of SRMR, RMSEA, and CFI across 100 replications, with rule-of-thumb cutoffs overlaid as vertical lines. The distributions should be concentrated well below (SRMR, RMSEA) or above (CFI) the cutoffs.

Figure 9: Sampling distributions of fit indices — Power of rejecting misspecified models

Figure 10 presents the same distributions with simulation-based cutoffs at α = .05. The shaded areas represent the rejection regions.

Figure 10: Sampling distributions of fit indices with simulation-based cutoffs at alpha = 0.05

10.6 Detailed Parameter Summary

The comprehensive parameter summary below reports, for each model parameter: coverage (proportion of 95% CIs containing the true value), power (proportion rejecting H₀: parameter = 0), relative bias in estimates, and relative bias in standard errors.

To evaluate other parameters, such as the population value (true), mean estimate, standard deviation of estimates, mean standard error, etc., consider the command below:

#| label: param-summary
summaryParam(my_modelci, detail = TRUE, digits = 3)
Table 24: Simulation Performance Statistics (n = 1,047, replications = 100)
Parameter Power Coverage Bias % SE Bias %
psycho=~Q6_P 1 0.96 0.004 0.060
psycho=~Q7_P 1 0.92 0.001 -0.067
psycho=~Q11_P 1 0.94 0.004 0.062
psycho=~Q19_P 1 0.93 0.005 -0.019
psycho=~Q26_P 1 0.96 0.006 0.125
physical=~Q3_F 1 0.98 0.002 -0.022
physical=~Q4_F 1 0.97 0.000 0.044
physical=~Q10_F 1 0.91 0.004 -0.066
physical=~Q15_F 1 0.92 -0.001 -0.076
physical=~Q16_F 1 0.92 0.003 -0.091
physical=~Q17_F 1 0.95 -0.001 0.011
physical=~Q18_F 1 0.93 0.001 -0.029
social=~Q20_S 1 0.94 0.005 0.029
social=~Q21_S 1 0.96 -0.006 0.070
social=~Q22_S 1 0.96 0.003 -0.032
environment=~Q8_A 1 0.95 -0.006 -0.026
environment=~Q9_A 1 0.98 -0.001 0.116
environment=~Q12_A 1 0.96 0.003 0.013
environment=~Q13_A 1 0.97 0.000 -0.011
environment=~Q14_A 1 0.93 -0.003 -0.040
environment=~Q23_A 1 0.97 -0.005 0.173
environment=~Q24_A 1 0.97 0.001 0.067
environment=~Q25_A 1 0.99 0.002 0.082
Q3_F~~Q4_F 1 0.97 -0.001 0.110
psycho~~physical 1 0.95 -0.004 -0.579
psycho~~social 1 0.95 -0.003 -0.093
psycho~~environment 1 0.94 0.005 0.056
physical~~social 1 0.98 -0.001 0.046
physical~~environment 1 0.88 0.009 -0.121
social~~environment 1 0.92 0.006 -0.124

Only few threshold parameters show power below .80 (see full report), and thresholds are generally not parameters of interest in CFA. All factor loadings and factor correlations are estimated with adequate power, indicating that the sample size is sufficient for reliable parameter estimation.

TipHow to Read the Parameter Summary

Key benchmarks for evaluating simulation quality (Rogers, 2024):

  • Power > .80: Adequate power to detect non-zero parameters
  • Coverage 0.91–0.98: Confidence intervals perform as expected (neither too liberal nor too conservative)
  • Relative Bias < |0.10|: Parameter estimates are unbiased (within 10% of true values)
  • Relative SE Bias < |0.10|: Standard errors accurately estimate sampling variability

Parameters outside these ranges deserve attention. However, thresholds are generally not parameters of interest in CFA. The focus should be on factor loadings, factor correlations, and residual variances.

The simulation results confirm that the sample size of N = 1047 provides adequate power for estimating all parameters of substantive interest. Factor loadings and correlations are estimated with power > .80, coverage near .95, and negligible bias. This supports the reliability and generalizability of the CFA results presented throughout this tutorial.

11 Summary and Conclusions

This supplementary material demonstrated a complete CFA workflow for ordinal data using the WHOQOL-BREF instrument, implementing the best practices described in Rogers (2024):

  1. Data preparation (Section 3): Loaded data directly from Mendeley Data repository, verified structure and descriptive statistics
  2. Population model (Section 4): Defined a teaching example based on Lin & Yao (2022) with analytically derived residual variances and equidistant thresholds
  3. Model specification and estimation: Tested four competing structures:
    • Original 4-factor model with all 24 items (Section 5)
    • Bifactor model with general QOL factor (Section 6)
    • Second-order hierarchical model (Section 7)
    • Revised 4-factor model excluding problematic item Q5 (Section 8)
  4. Model evaluation: Applied DWLS estimation appropriate for ordinal data, examining overall fit (SRMR, RMSEA, CFI), local fit (MI, residuals), reliability (alpha, omega, AVE), and path diagrams
  5. Model comparison (Section 9): Systematically compared all models, recommending the revised 4-factor model
  6. Power analysis (Section 10): Assessed statistical adequacy via Monte Carlo simulation
ImportantKey Findings
  1. Always use appropriate estimators for ordinal data: DWLS/WLSMV operates on polychoric correlations and does not assume normality of observed variables — using ML with ordinal Likert data leads to biased estimates
  2. Item Q5 is systematically problematic across multiple model structures and should be excluded
  3. The revised 4-factor correlated model (without Q5) provides the best balance of fit, interpretability, and statistical properties
  4. Bifactor models improve fit mechanically but impose strong orthogonality assumptions that rarely hold; they should not be preferred solely on the basis of fit indices
  5. All four domains show adequate reliability (omega > .70) in the recommended model
  6. Sample size N = 1047 provides adequate power for all parameters of substantive interest
  7. Examine both overall and local fit — global indices can mask specific, actionable problems
  8. Modification indices should guide, not dictate: Only make theoretically justified changes, and be alert to items that appear repeatedly in high MIs

For the full theoretical and methodological framework underlying these analyses, including parallel JASP implementations, see Rogers (2024), OSF repository, and Github Repository

References

Lin, L.-C., & Yao, G. (2022). Validation of the factor structure of the WHOQOL-BREF using meta-analysis of exploration factor analysis and social network analysis. Psychological Assessment, 34(7), 660–670. https://doi.org/10.1037/pas0001122
McNeish, D. (2023). Dynamic Fit Index Cutoffs for Factor Analysis with Likert, Ordinal, or Binary Responses. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/tp35s
Rogers, P. (2021). Data for "Best Practices for Your Exploratory Factor Analysis: A Factor Tutorial" [dataset]. In RAC-Revista de Administração Contemporânea. Mendeley Data, V2. https://doi.org/10.17632/rdky78bk8r.2
Rogers, P. (2022). Best Practices for Your Exploratory Factor Analysis: A Factor Tutorial. Revista de Administração Contemporânea, 26(6). https://doi.org/10.1590/1982-7849rac2022210085.en
Rogers, P. (2024). Best practices for your confirmatory factor analysis: A JASP and lavaan tutorial. Behavior Research Methods. https://doi.org/10.3758/s13428-024-02375-7