What does model parameters do against latent variables

R for psychologists (BSc and MSc.) At the LMU Munich

7.2 Structural equation models

7.2.1 Loading and installing the required packages and the data set

At the start we set the working directory as always and load the required data set GESIS.csv in R-Studio.

7.2.2 Formulating structural assumptions I: PANAS Theory of the Positive and Negative Affect Schedule (PANAS)

We want a structural equation model for the Positive and Negative Affect Schedule (PANAS) put up. The structural equation model is intended to be a formal representation existing theoretical assumptions. In contrast to the exploratory factor analysis, we need specific hypotheses about the structure of the data for the structural equation model. We can derive these from research literature, for example:

“The Positive and Negative Affect Schedule (PANAS) is a 20-item self-report measure of positive and negative affect developed by Watson, Clark, and Tellegen (1988b). NA and PA reflect dispositional dimensions, with high NA epitomized by subjective distress and unpleasurable engagement, and low NA by the absence of these feelings. By contrast, PA represents the extent to which an individual experiences pleasurable engagement with the environment. Thus, emotions such as enthusiasm and alertness are indicative of high PA, whilst lethargy and sadness characterize low PA (Watson & Clark, 1984). " (Crawford & Henry, 2004)

“The most controversial characteristic of the PANAS is the purported independence of its subscales. It has been argued that it is counter-intuitive to regard happiness and sadness as unrelated constructs (Costa & McCrae, 1980) and indeed all measures of PA and NA developed prior to the PANAS have proven at least moderately negatively correlated. ” (Crawford & Henry, 2004)

The PANAS thus comprises 20 items and states that it includes both a factor for positive affect (PA) and a factor for negative affect (NA). Each of the two latent constructs should load on 10 items each. It is therefore a two-factor model with 10 indicators each. In our model we initially assume (as originally assumed in the theory) that the two subscales are orthogonal (i.e. independent of one another). Transfer assumptions in a formalized graphic representation

In a first step, we now want to implement our theoretical assumptions about the structure of our data in a formal causal model. The formalized representation should include both the measurement model (assumptions about the link between manifest variables / items and the latent constructs) and the structural model (assumptions about the connection between the latent constructs). In addition, a distinction should be made between latent and manifest, directed and undirected paths as well as endogenous and exogenous variables. For this we use the Greek designation of the model parameters (lambda (\ (\ lambda \)) = charges of items, gamma (\ (\ gamma \)) = charges for latent variables, epsilon (\ (\ epsilon \)) = errors manifest Variables, Zeta (\ (\ zeta \)) = error variables of latent dependent variables, Xi (\ (\ xi \)) = latent independent variables, Eta (\ (\ eta \)) = latent dependent variables, X = indicators of latent Variables).

The orthogonality (i.e. independence) of the two latent factors can be seen in the lack of correlation between the two ellipses. In the illustration, the “unit loading identification” was used, which can be seen from the dashed path to the first indicator variable. Formalization in R

In the next step we want to transfer the created model to R. To do this, we first have to look at the data set to find out how our manifest variables X1-X20 are named. The associated items of the PANAS can be found in our data set in columns 32 to 51. As we can see, each of the items is assigned to one of the two factors PA or NA.

In our formalized representation, we can assign every manifest variable X1-X20, derived from the theory, to one of our latent constructs. All negative items are assigned to one latent construct and all positive items to a second latent construct. The latent constructs can be named arbitrarily; It is worth choosing names that are as meaningful as possible for the readability of the code. (Note: is a protected term in R, which stands for “not available” / missing value. Even if it makes sense to name the latent factors briefly and succinctly, the latter is not possible, as variable names cannot be assigned in principle .) Model specification in the lavaan-Syntax

We can now also specify the model we have set up in R, namely in the lavaan-Syntax. To do this, we save in a new variable panas1 a text object (string) that contains the structural information. Latent variables are defined with the operator. Covariances can also be defined with the operator and regressions. We give the latent constructs any name (which is as meaningful as possible). The indicators must be identified with the variable names from our data set. Visualization

As soon as the model structure is specified in the string variable, we can also have it graphically represented in R. To do this, we first have to fit the model just specified with the function. With we define that the latent variables in our model are independent of each other, and accordingly no covariances between the factors should be included (unless these explicitly appear in our model specification).

With the semPaths () function we can now graphically display the fitted model. The parameter is necessary to represent the factors in the figure orthogonally (i.e. without covariance). Exercise 1: Big Five

In columns 54 to 63 of the data set you will find items from a Big Five short scale, the personality in the dimensions Neuroticism, Extraversion, openness, compatibility and conscientiousness measures with two items each. Set up a structural equation model for the personality dimensions measured here. Assume that the 5 dimensions are orthogonal - that is, independent of each other.

(a) Draw the complete structural equation model. When identifying the manifest variables, use the correct variable names from the data set and name the latent factors sensibly. Also, mark all fixed parameters in the model.

(b) Specify the model structure in the lavaan syntax and save the result in the variable bigfive_model.

(c) Fit the model bigfive with the cfa () function and store the result in the variable bigfive_fit. Then display the calculated model graphically using the function. (Note: When fitting the model we get the warning message “Could not compute standard errors! The information matrix could not be inverted. This may be a symptom that the model is not identified”. We ignore this for the moment; we will deal with it to be dealt with in a later section. Fitting is only necessary here to be able to view the graphical representation.)

7.2.3 Formulating structural assumptions II: Big One Formulate structural assumptions

In our previous model we assumed that the 5 personality factors are independent of each other. However, studies indicate that there are higher-level structures for the 5 dimensions.

“Thus, the accumulating evidence strongly suggests the existence of higher-order dimensions in the Big Five personality domain. According to this hypothesis, the higher-order dimensions are organized across two levels. First higher-order level consists of two meta-traits (Big Two), Stability and Plasticity (or Alpha and Beta in Digman's terms), while the second, highest-order level comprises general factor of personality (GFP or The Big One) ” . (Musek, 2007)

In an alternative model, we now assume a superordinate two-factor structure: the factor stability on which the dimensions of neuroticism, compatibility and conscientiousness should load as well as the factor Plasticity load on the extraversion and openness. The two factors are supposed to add to one General factor for personality load. Transfer assumptions into formalized graphical representation

The previously exogenous latent variables Neuroticism, Extraversion, openness, compatibility and conscientiousness have now become endogenous latent variables in our new model, the variance of which is explained by the superordinate factors. Therefore, they now have an error term because their variance is not 100% predicted by the respective superordinate factor. The same applies to the superordinate factors stability and plasticity, the variance of which is explained by the superordinate general factor (exogenous variable). The first indicator in each case is shown in dashed lines (i.e. fixed at 1). Formalization in lavaan syntax and visualization Exercise 2: PANAS

You have the following information about the factor structure of the PANAS:

“The structure of PANAS, as proposed by the authors, is bi-dimensional, as PA and NA are separate and highly, but not absolutely, independent dimensions (Tellegen et al., 1999; Watson & Clark, 1997). Several studies supported this two-factor structure. [...] Alternative three-factor structures with good fit indices have also been proposed and tested by Gaudreau, Sanchez, and Blondin (2006), Killgore (2000) and Mehrabian (1997). Mehrabian (1997) tested a model where PA was maintained as one factor and NA was divided into two conceptually meaningful factors: Afraid (scared, nervous, afraid, guilty, ashamed, and jittery) and Upset (distressed, irritated, hostile, and upset ). ” (Galinha, Pareira & Esteves, 2013)

(a) Draw the alternative three-factor model by Mehrabian (1997). When identifying the manifest variables, use the correct variable names from the data set and name the latent factors sensibly. Also, mark all fixed parameters in the model.

(b) Mark the measurement model and the structural model in your illustration. Record for all elements of your model whether it is a manifest or latent, endogenous or exogenous variable.

(c) Create the drawn model in R and display it graphically. Optional: You can change the parameters layout (tree, circle, spring, tree2, circle2) and rotation (1, 2, 3, 4) in order to find a clear representation for your model.

7.2.4 Identifiability of structural equation models

At the beginning we want to look at a very simple structural equation model: The facet of openness from the already used Big Five short scale. The structural model contains only two manifest variables - the items O_artistic and O_fantasy - and the latent construct openness.

When fitting the model, we receive a warning that the model could not be fully estimated and a message that the model may not be identified.

We can have the model output graphically and check the identifiability of the model.

We only have two manifest variables in our model, which results in three known parameters: the variances of the two manifest variables and their covariance. We can also use the formula from the lecture:

However, we have four parameters to estimate: the two error variances of the manifest variables, a non-fixed charge, and the variance of the latent exogenous variable openness.

By taking the difference between the known and unknown parameters, we get the degrees of freedom of our model test. The degrees of freedom in this case are <0 (3 known parameters - 4 unknown parameters). The model is thus under-identified and cannot be estimated. Exercise 3: Determine the number of parameters and degrees of freedom

A three-factor personality model is to be tested, which only consists of the three dimensions neuroticism, extraversion and openness. The three dimensions are intended to load on a superordinate general factor for personality. In addition, two method factors are included, which it is assumed that, in addition to the three personality dimensions, can explain the variance in the manifest variables. All negatively polarized items load on method factor 1. All items that query two content aspects load onto method factor2. The graphic model looks like this:

(a) How many known parameters are there in the model?

(b) How many unknown parameters are there in the model?

(c) Calculate the degrees of freedom and make a statement about the identifiability of the model. Identifiability of the PANAS

In the previous section we looked at the two-factor, orthogonal model for the PANAS. We can now calculate the degrees of freedom for this model as well:

Known parameters

Unknown parameters

20 error variances of the manifest variables, 18 unfixed charges, two variances of the exogenous latent variables. So there are 40 unknown parameters.

Degrees of freedom

The model has 170 degrees of freedom, is over-identified with it, and can be estimated by lavaan. Metric of the latent variables

Next we will take a look at the output of the model for the first time with the command. The parameter means that no fit indices should be shown (we will look at them later), the standardized solution is also displayed in the output.

In the output we find the estimated loads of the latent variables (section “Latent Variables”), the covariances (section “Covariances”) and the (error) variances of manifest and latent variables (section “Variances”). If there is a period in front of the variable name (such as at) with this variance estimate, the variance of the Error terms meant by this variable. Variable names without a dot (such as) mean the variance of this latent variable itself. Incidentally, long variable names are automatically abbreviated in the output so that it fits better on the screen. So don't be surprised if the names don't exactly match those in the model syntax.

Under Degrees of Freedom we can see that our calculation of the degrees of freedom (df = 170) was correct.

As we can see, the first charge of each latent variable is always fixed at 1. This not only serves to make the model identifiable, but also gives the latent variable a metric. The first charge of each latent variable is thus the unit in which all other charges are output. For the latent variable Negative affect is the charge of NA_bekuemmert the specified unit of measurement. The charge of NA_ angry is 1.006 times as high as the charge NA_bekuemmert and the charge of NA_guilty on the other hand is only 0.674 times as high as the charge of NA_bekuemmert.

Unless otherwise stated, fixed lavaan automatically loads from the first specified indicator. We can fix another charge in our model by putting another manifest variable first in our model specification.

Another possibility is to weight the corresponding manifest variable with 1 by writing in front of the indicator. Then we also have to weight the first charge, which is otherwise fixed at 1 by default (not available - not “negative affect!”). If we take place NA_bekuemmert the charge of NA_guilty fix it to 1, we now get another metric.

A new metric does not change anything in the entire model fit (see next section), as can be seen from the “Model Test User Model” (same \ (\ chi ^ 2 \) value, same p-value). Both models are algebraically equivalent, but have a different interpretation due to the other metric.

7.2.5 Check model fit

With the help of fit indices, we can now check whether our specified structure fits the data or whether the model-implied and the empirical covariance matrix differ significantly from one another. To do this, we fit our model and have the result output with fit indices () and standardized model parameters ().

We are testing the less strict PANAS model, which allows a correlation of the latent factors ():

In the first part of the output we can now read the following information:

Exact model fit (see “Model Test User Model”)

The chi-square test is significant (p <0.05). So we have to reject the H0, since the model-implied and empirical covariance matrices differ significantly from each other. It should be noted that with a large sample as in the present case (n = 916), even smaller deviations can lead to a significant model test.

Relative model fit compared to the null model (in which all variables are uncorrelated)

The Comparative Fit Index (CFI) with a value of 0.841 is below the recommendation of> .95.

Absolute model fit

RMSEA with 0.080 for a sample with n> 250 is above the recommended value of <= .06.Only the SRMR, at 0.063, is below the recommended cut-off value of <= .11.

Overall, we would have to reject the model, since both the chi-square test, CFI and RMSEA speak for a bad model fit.

Local model fit

We can also consider the local fit of the model, because even if the model were rated as globally suitable, the model may not be specified correctly in individual places (e.g. manifest variables do not load significantly on latent constructs, specified covariances are not significant). In the case of a bad model fit, we can also see whether some of the items are possibly unsuitable or incorrectly specified.

In the present case, all specified charges also become significant. So the manifest variables are actually indicators of the latent constructs. Standardized values

The argument in the function also outputs standardized loads and variances (see column).

To better illustrate the indications of the standardized solution, let's look at the definition and structure equation for a single manifest variable in a model like our tested one. Since we have a one-dimensional model for both latent variables, the equations look similar for each manifest variable \ (X_j \) (\ (j \) = 1 ... 20):

\ [\ begin {aligned} & \ text {Definition equation:} & X_ {ji} & = \ lambda_j * \ xi_i + \ epsilon_ {ji} \ & \ text {Structural equation:} & Var (X_j) & = \ lambda_j ^ 2 * Var (\ xi) + Var (\ epsilon_j) \ end {aligned} \] In the standardized case, the variances of all manifest and latent variables - but Not of the error variable - scaled to 1. This simplifies the structural equations considerably: \ [\ begin {aligned} & \ text {Standardized variances:} \ && Var (X_j) _ {std} & = 1 \ && Var (\ xi) _ {std} & = 1 \ & \ text {structural equation:} & \ && Var (X_j) _ {std} & = \ lambda_ {j, std} ^ 2 * Var (\ xi) _ {std} + Var (\ epsilon_j) _ { std} \ & \ text {Inserted:} & \ && 1 & = \ lambda_ {j, std} ^ 2 + Var (\ epsilon_j) _ {std} \ end {aligned} \]

So we see that the variance of an endogenous variable is made up of two parts. The systematic Part goes back to the influence of a variable, in the example to \ (\ xi \). How strong the systematic part of the variance of the endogenous variables is, we can use the squared standardized charge to calculate. The unsystematic Part goes back to the influence of the error. The variance of this part is compared with the standardized error variance calculated. We also see that I can calculate the systematic part from the unsystematic part if - as in the standardized case - I know that both add up to exactly 1.


The charges can be compared with one another using the standardized values. So if we use the systematic proportion of variance want to estimate, we can see by squaring the standardized charge that 32.2% of the variance of the item NA_guilty by the latent variable Negative affect is cleared up.


In the section Variances we now also find the standardized error variances (those with a period in front of the variable name) and variances of the latent exogenous variables. The standardized error variance of NA_guilty we can now estimate the unsystematic share to the one calculated above systematic part add up and get - as expected - the sum of 1.

The systematic variance component of the latent variable (which is not due to the error) can also be obtained simply by subtracting the error variance from 1. Violation of the assumption of normally distributed data

If the assumption of normal distribution is violated, the estimated \ (\ chi ^ 2 \) values ​​are excessive, which can lead to a suitable model being rejected. In this case we can use the Bollen-Stine bootstrap method () to get a corrected p-value for the model fit. However, our new corrected p-value is also significant, meaning that the poor fit of the model is not due to the skewed distribution of the data. To evaluate the local model fit, we can also have the standard errors estimated using the bootstrap method ().