When two sets of data are strongly linked together we say they have a High Correlation.
Correlation is Positive when the values increase together, and
Correlation is Negative when one value decreases as the other increases
In common usage it most often refers to how close two variables are to having a linear relationship with each other. Here is sample values and shape for correlation
Pearson’s correlation coefficient
This is most commonly used correlation coefficient
The population correlation coefficient ρX,Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as
Pearson’s correlation coefficient using Python
When calculated using scipy, it returns pearson’s correlation coefficient and 2-tailed p-value
When calculated using numpy, it returns The correlation coefficient matrix of the variables.
Spearman’s rank correlation coefficient
Spearman’s rank correlation coefficient or Spearman’s rho, named after Charles Spearman and often denoted by the Greek letter rho. The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables.
Kendall rank correlation coefficient
the Kendall rank correlation coefficient, commonly referred to as Kendall’s tau coefficient (after the Greek letter τ), is a statistic used to measure the ordinal association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.
Python code for calculating Person’s, Spearman’s and Kendall’s coefficient.
Correlation can have a value:
- 1 is a perfect positive correlation
- 0 is no correlation (the values don’t seem linked at all)
- -1 is a perfect negative correlation
Important points to be noted
- Correlation is not causation
- Person’s coefficient works only if there is linear relationship between two variables.