Chapter 3 Canonical Correlation Analysis (CCA)
3.1 What is CCA?
- Seeks the weighted linear composit for each variate (sets of D.V. or I.V.) to maximize the overlap in their distributions.
- Labeling of DV and IV is arbitrary. The procedure looks for relationships and not causation.
- Goal is to maximize the correlation (not the variance extracted as in most other techniques).
- Lacks specificity in interpreting results, that may limit its usefulness in many situations.
CCA helps us answer the questions:
- What is the best way to understand how the variables in two sets are related? , mathematically speaking:
- what linear combinations of the set \(X\) variables (\(u\)) and the set Y variables (\(t\)) will maximize their correlation?
3.1.1 Canonical R (\(R_c\))
It represents the overlapping variance between two variates which are linear composites of each set of variables.
3.2 Assumptions for CCA
- Multiple continuous variables for DVs and IVs or categorical with dummy coding.
- Assumes linear relationship between any two variables and between variates.
- Multivariate normality is necessary to perform CCA.
- Multicollinearity in either variate confounds interpretation of canonical results.
3.3 Objectives of CCA
- Determine the magnitude of the relationships that may existe between two sets of variables.
- Derive a variate(s) for each set of criterion and predictor variables such that the variate(s) of each set is maximally correlated.
- Explain the nature of whatever relationships exist between the sets of criterion and predictor variables.
- Seek the max correlation of shared variance between the two sides of the equation.
3.4 Terms used in the context of a CCA analysis
- Canonical correlation: Correlation between two sets; the largest possible correlation that can be found between linear combinations.
- Canonical variate: The linear combinations created from the IV set and DV set.
- Canonical weights: weights used to create the liniear combinations; interpreted like regression coefficients.
- Canonical loadings: correlations between each variable and its variate; interpreted like loadings in PCA.
- Canonical cross-loadings: Correlation of each observed independent or dependent variable with opposite canonical variate.
3.5 Interpreting canonical variates
- Canonical weights
- larger wight contributes more to the function.
- negative weight indicates an inverse relationship with other variables.
- always look out for multicollinearity, it can skew the whole analysis.
- Canonical Loadings.
- A direct assessment of each variable´s contribution to its respective canonical variate.
- Larger loadings are interpreted as more important to deriving the canonical variate.
- Correlation between the original variable and its canonoical variate.
- Canonical Cross-Loadings
- Measure of correlation of each original D.V. with the independent canonical variate.
- Direct assessment of the relationship between each D.V. and the independent variate.
- Provides a more pure measure of the dependent and independent variable relationship.
- Preferred approach to interpretation.
3.6 Considerations when working with CCA
- Small samples sizes may have an adverse effect.
- Suggested minimun sample size = 10 * # of values.
- Selection of variables to be included:
- Select them with domain knowledge or theoretical basis.
- Inclusion of irrelevant or deletion of relevant variables may adversely affect the entire canonical solution.
- All I.V.s must be interrelated and all D.V.s must be interrelated.
- Composition of D.V. and I.V. variates is critical to producing practical results.
3.7 Limitations of CCA
- \(R_c\) (canonical R) reflects only the variance shared by the linear composites, not the variances extracted from the variables.
- Canonical weights are subject to a great deal of instability, particularly when there is multicollinearity.
- Interpretation difficult because rotation is not possible.
- Precise statistics have not been developed to interpret canonical analysis.
This chapter is under construction.
