Chapter 3 Canonical Correlation Analysis (CCA)

3.1 What is CCA?

  • Seeks the weighted linear composit for each variate (sets of D.V. or I.V.) to maximize the overlap in their distributions.
  • Labeling of DV and IV is arbitrary. The procedure looks for relationships and not causation.
  • Goal is to maximize the correlation (not the variance extracted as in most other techniques).
  • Lacks specificity in interpreting results, that may limit its usefulness in many situations.

CCA helps us answer the questions:

  • What is the best way to understand how the variables in two sets are related? , mathematically speaking:
    • what linear combinations of the set \(X\) variables (\(u\)) and the set Y variables (\(t\)) will maximize their correlation?

3.1.1 Canonical R (\(R_c\))

It represents the overlapping variance between two variates which are linear composites of each set of variables.

3.2 Assumptions for CCA

  • Multiple continuous variables for DVs and IVs or categorical with dummy coding.
  • Assumes linear relationship between any two variables and between variates.
  • Multivariate normality is necessary to perform CCA.
  • Multicollinearity in either variate confounds interpretation of canonical results.

3.3 Objectives of CCA

  • Determine the magnitude of the relationships that may existe between two sets of variables.
  • Derive a variate(s) for each set of criterion and predictor variables such that the variate(s) of each set is maximally correlated.
  • Explain the nature of whatever relationships exist between the sets of criterion and predictor variables.
  • Seek the max correlation of shared variance between the two sides of the equation.

3.4 Terms used in the context of a CCA analysis

  • Canonical correlation: Correlation between two sets; the largest possible correlation that can be found between linear combinations.
  • Canonical variate: The linear combinations created from the IV set and DV set.
  • Canonical weights: weights used to create the liniear combinations; interpreted like regression coefficients.
  • Canonical loadings: correlations between each variable and its variate; interpreted like loadings in PCA.
  • Canonical cross-loadings: Correlation of each observed independent or dependent variable with opposite canonical variate.

3.5 Interpreting canonical variates

  • Canonical weights
    • larger wight contributes more to the function.
    • negative weight indicates an inverse relationship with other variables.
    • always look out for multicollinearity, it can skew the whole analysis.
  • Canonical Loadings.
    • A direct assessment of each variable´s contribution to its respective canonical variate.
    • Larger loadings are interpreted as more important to deriving the canonical variate.
    • Correlation between the original variable and its canonoical variate.
  • Canonical Cross-Loadings
    • Measure of correlation of each original D.V. with the independent canonical variate.
    • Direct assessment of the relationship between each D.V. and the independent variate.
    • Provides a more pure measure of the dependent and independent variable relationship.
    • Preferred approach to interpretation.

3.6 Considerations when working with CCA

  • Small samples sizes may have an adverse effect.
  • Suggested minimun sample size = 10 * # of values.
  • Selection of variables to be included:
  • Select them with domain knowledge or theoretical basis.
  • Inclusion of irrelevant or deletion of relevant variables may adversely affect the entire canonical solution.
  • All I.V.s must be interrelated and all D.V.s must be interrelated.
  • Composition of D.V. and I.V. variates is critical to producing practical results.

3.7 Limitations of CCA

  • \(R_c\) (canonical R) reflects only the variance shared by the linear composites, not the variances extracted from the variables.
  • Canonical weights are subject to a great deal of instability, particularly when there is multicollinearity.
  • Interpretation difficult because rotation is not possible.
  • Precise statistics have not been developed to interpret canonical analysis.

This chapter is under construction.