# Advice on Exploratory Factor Analysis

## Preview text

Introduction
Exploratory Factor Analysis (EFA) is a process which can be carried out in SPSS to validate scales of items in a questionnaire. The purpose of an EFA is to describe a multidimensional data set using fewer variables. Once a questionnaire has been validated, another process called Confirmatory Factor Analysis can be used. This is supported by AMOS, a ‘sister’ package to SPSS.
There are two forms of EFA known as Factor Analysis (FA) and Principal Component Analysis (PCA). The reduced dimensions produced by a FA are known as factors whereas those produced by a PCA are known as components. PCA will always work but FA may not converge to a solution.
FA analyses the relationship between the individual item variances and common variances shared between items whereas the PCA analyses the relationships between the individual item variances and total (both common and error) variances shared between items. FA is therefore preferable to PCA in the early stages of an analysis as it allows you to measure the ratio of an item’s unique variance to its shared variance, known as its communality. As dimension reduction techniques seek to identify items with a shared variance, it is advisable to remove any item with a communality score less than 0.2 (Child, 2006). Items with low communality scores may indicate additional factors which could be explored in further studies by developing and measuring additional items (Costello and Osborne, 2005).
There are different EFA methods. If you are only dealing with your sample for further analysis (i.e. it is a population in terms of the EFA) it is advisable to use the Principal Axis Factoring method. Otherwise, if you are trying to develop and instrument to be used with other data sets in the future, it is advisable to use a sample-based EFA method such as Maximum Likelihood or Kaiser’s alpha factoring (Field, 2013: 674-675).
Whether to rotate the factors and the type of rotation used also needs to be decided. An orthogonal rotation can improve the solution from the unrotated one but it forces the factors to be independent of each other. The most popular orthogonal rotation technique is varimax. An oblique rotation allows a degree of correlation between the factors in order to improve the intercorrelation between the items within the factors. Although Reise et al. (2000) give several reasons why it should be considered, it is more difficult to interpret so it advised that it should only be considered if the orthogonal solution is unacceptable. Field (2013: 681) recommends using either the direct oblimin or promax rotation with the default parameter settings. An oblique rotation creates two additional factor matrices called pattern and structure. It is the pattern matrix which needs to be analysed in the same way as the single rotated factor matrix obtained from orthogonal rotations.
After the EFA has been carried out there a validation process. There are different ways to extract and double-check the derived scales. For a successful analysis there should be a higher average correlation between the items in the derived scales than the average correlation between the scales. The proportion of the total variance explained by the

1

retained factors should be noted. As a general rule this should be at least 50% (Streiner, 1994). The adequacy of the sample size should also be checked. The average communality should be checked for small samples. Finally, a test for multicollinearity based on the size of the determinant of the correlation matrix should be carried out.
Step by step approach
1. Before carrying out an EFA the values of the bivariate correlation matrix of all items should be analysed. It is easier to do this in Excel. High values are an indication of multicollinearity, although they are not a necessary condition (see Rockwell, 1975). Field (2013: 686) suggests removing one of a pair of items with bivariate correlation scores greater than 0.8. There is no statistical means for deciding which item of a pair to remove – this should be based on a qualitative interpretation.
2. Decide on the appropriate method and rotation (probably varimax to start with) and run the analysis.
3. Remove any items with communalities less than 0.2 and re-run.
4. Optimize the number of factors – the default number in SPSS is given by Kaiser’s criterion (eigenvalue > 1) which often tends to be too high. You are looking for as many factors as possible with at least 3 items with a loading greater than 0.4 and a low crossloading. Fix the number of factors to extract and re-run.
5. Remove any items with no factor loadings > 0.3 and re-run.
6. Remove any items with cross-loadings > 75% starting with the one with the lowest absolute maximum loading on all the factors and re-run.
7. Once the solution has stabilized, check the average within and between factor correlations. To obtain the factors, use a PCA with the identified items and save the regression scores. If there is not an acceptable difference between the within and between factor average correlations, try an oblique rotation instead.
8. Provided the average within factor correlation is now higher than the average between factor correlation, a number of final checks should be made:
a. Check that the proportion of the total variance explained by the retained factors is at least 50%.
b. Check the adequacy of the sample size using the KMO statistic. A minimum acceptable score for this test is 0.5 (Kaiser, 1974).
c. If the sample size is less than 300 check the average communality of the retained items. An average value above 0.6 is acceptable for samples less than 100, an average value between 0.5 and 0.6 is acceptable for sample sizes between 100 and 200 (MacCallum et al., 1999).
d. The determinant of the correlation matrix should be greater than 0.00001 (Field, 2013: 686). A lower score might indicate that groups of three or more questions have high intercorrelations, so the threshold for item removal should be reduced until this condition is satisfied.
e. The Cronbach’s alpha coefficient for each scale can also be calculated.
9. If the goal of the analysis is to create scales of unique items then the meaning of the group of unique items which load on each factor should be interpreted to give each factor a meaningful name.

2

Worked example 171 business men and women responded to a questionnaire on entrepreneurship which was constructed from 8 groups of questions derived from existing questionnaires, comprising of a total of 39 questions. Each of the questions comprised of a five point Likert response scale. As the data from the questionnaire was to be used in a further analysis it was decided to carry out an Exploratory Factor Analysis using the Principal Axis Factoring technique and a Varimax rotation. A Pearson bivariate correlation of all the items was carried out in Excel. A conditional formatting was set for any correlations with an absolute value greater than 0.8.
This returned a table of correlations including 10 unique pairs of correlations with an absolute value greater than 0.8, with the lowest absolute value being 0.922. As this was markedly higher than the threshold it was decided to remove one item from each of these pairs based on a qualitative analysis of the items, leaving 29 items. An EFA was then run on the remaining 29 items using a Principal Axis Factoring technique with a varimax rotation, providing the KMO statistics and determinant of the correlation matrix, retaining all factors with eigenvalues greater than 1 and suppressing all factor coefficients less than 0.3:

3

The communalities of the initial solution were observed. All were larger than 0.2 so all the items were retained.
This led to an initial solution comprising of 8 factors. However the 7th and 8th factors did not have 3 items with loadings > 0.4 in the rotated factor matrix so they were excluded and the analysis re-run to extract 6 factors only, giving the output shown on the left.

However, many items in the rotated factor matrix (highlighted) cross loaded on more than one factor at more than 75% or had a highest loading < 0.4. These were removed in turn, starting with the item whose highest loading was the lowest (KSA2) and the analysis re-run.

During the following analysis, in order that each factor had at least three items with loadings > 0.4, it was necessary to reduce the number of factors to 5, then to 4. This eventually yielded a stable solution after 13 steps with 18 items (see right). The item KM4 loaded on both Factor 1 and Factor 3 but the cross loading was < 75% so it was only included in the third scale.
The items loading on each factor were noted in order to create the trial scales.

Factor 1
2 3 4

Items KSA1, KSA8, KL4, KM5, KSB3, KI2 KST3, KST5, KSA3, KSA4 KL1, KM1, KM4, KSB1, KSB2 KSA7, KL2, KL3

4

A PCA with a single factor was then run for each scale in turn as shown below. The Regression factor scores were saved.
The within scale correlations were calculated using Excel and the average scale correlations were calculated:

This yielded the following results:

Factor

1

2

3

4 Overall

Average within factor correlation 0.419 0.461 0.379 0.361 0.405

The regression scores for the scales were downloaded into Excel and a correlation analysis was run, yielding the results shown on the right.
The average within factor correlation (0.405) was only slightly higher than the average between factor correlation (0.365). This was considered unacceptable as the within group correlations should have been considerable higher. An oblique factor rotation was then carried out.
A Principal Axis FA with a direct oblimin oblique rotation with Delta = 0 was carried out using the same 29 items as the original FA above. During the process of re-running the analysis the number of iterations for the Rotation was increased to 100 due to slow convergence.
A 4 factor solution eventually stabilized after 15 steps with 17 items as shown below. One item was removed for having communality < 0.2. KM4 was not included in Factor 1 because of its cross-loading on Factor 2 (even though this was < 75%).

5

Factor 1 2 3 4

Items KSA1, KSA8, KL4, KM5, KI2 KL1, KM1, KM4, KSB1 KST3, KST4, KST5, KSA3, KSA4 KST1, KSA6, KSA7

The average within factor correlation was 0.404. The average between factor correlation was 0.276. This was a much better result than the orthogonal rotation and was considered acceptable.
Finally, validation checks were run. The KMO statistic was 0.819 (very good). The correlation matrix determinant was 0.002 (much higher than the critical value of 0.00001). The 4 factors explained 59.5% of the variation in the data, which was also acceptable.
The extracted communalities were exported into Excel and the average value was calculated (see below right). This was slightly lower than recommended for the sample size. According to MacCallum et al. (1999), for sample sizes between 100 and 200 it should be between 0.5 and 0.6. It was noted that the communailities of the three items on the fourth factor (highlighted) were all relatively low.
A PCA with a single component was carried out on the 4 scales in turn and the regression scores were saved. As a double, check a Cronbach’s alpha reliability analysis was also run on each scale.
This yielded the following results:

Factor 1

Factor 2

Eigenvalue = 2.869
Cronbach’s alpha = 0.814

Eigenvalue = 2.266
Cronbach’s alpha = 0.744

6

Factor 3

Factor 4

The first 3 scales have acceptable Cronbach’s alpha scores, acceptable loadings on at least 4 items > 0.6 and acceptable
eigenvalue sizes.

Eigenvalue = 2.721
Cronbach’s alpha = 0.787

Eigenvalue = 1.606
Cronbach’s alpha = 0.561

The scales should then be interpreted qualitatively and given an appropriate name (omitted).

The low Cronbach’s alpha score for the 4th scale is consistent with
it only having 3 items with loadings > 0.6, its low eigenvalue and its low average communality,
indicating that it should only be used with caution.

References
Child, D. (2006). The Essentials of Factor Analysis. 3rd edn. New York: Continuum.
Costello, A. B. and Osborne, J. W. (2005) Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10(7), pp. 1-9.
Field, A. (2013) Discovering Statistics using SPSS, 4th edn. London: SAGE.
Guadagnoli, E. and Velicer, W. F. (1988) Relation of sample size to the stability of component patterns. Psychological Bulletin, 103(2), pp. 265-275.
Kaiser, H. F. (1974) An index of factorial simplicity. Psychometrika, 39(1), pp. 31-36.
MacCallum, R. C., Widaman, K. F., Zhang, S. and Hong, S. (1999) Sample size in factor analysis. Psychological Methods, 4(1), pp. 84-99.
Reise, S. P., Waller, N. G. and Comrey, A. L. (2000) Factor analysis and scale revision. Psychological Assessment, 12(3), pp. 287-297.
Rockwell, R. C. (1975) Assessment of multicollinearity: the Haitovsky test of the determinant. Sociological Methods & Research, 3(3), pp. 308-320.
Stevens, J. P. (2012) Applied Multivariate Statistics for the Social Sciences. 5th edn. London: Routledge.
Streiner (1994) Figuring out factors: the use and misuse of factor analysis. Canadian Journal of Psychiatry, 39(3), pp. 135-140.
Tabachnick, B. G. and Fidell, L. S. (2014) Using Multivariate Statistics. 6th edn. Harlow: Pearson.