Correlation is used to indicate dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. It is a measure of relationship between two mathematical variables or measured data values, which includes the Pearson correlation coefficient as a special case.Correlation is any of a broad class of statistical relationships involving dependence, though in common usage it most often refers to how close two variables are to having a linear relationship with each other.
The strength of the linear association between two variables is quantified by the correlation coefficient.
Formula for correlation is as below
- The correlation coefficient always takes a value between -1 and 1,
- Value of 1 or -1 indicating perfect correlation (all points would lie along a straight line in this case).
- A correlation value close to 0 indicates no association between the variables.The closer the value of r to 0 the greater the variation around the line of best fit.
- A positive correlation indicates a positive association between the variables (increasing values in one variable correspond to increasing values in the other variable),
- while a negative correlation indicates a negative association between the variables (increasing values is one variable correspond to decreasing values in the other variable).
The square of the correlation coefficient, r², is a useful value in linear regression. This value represents the fraction of the variation in one variable that may be explained by the other variable. Thus, if a correlation of 0.8 is observed between two variables (say, height and weight, for example), then a linear regression model attempting to explain either variable in terms of the other variable will account for 64% of the variability in the data1
the least-squares regression line will always pass through the means of x and y, the regression line may be entirely described by the means, standard deviations, and correlation of the two variables under investigation.
Pearson correlation coefficient
Pearsons correlation coefficient is a measure of the linear correlation between two variables X and Y. It has a value between +1 and −1 2.t is obtained by dividing the covariance of the two variables by the product of their standard deviations.
Formula for Pearson Correlation Coefficient
Rank correlation coefficients
Spearman’s rank correlation coefficient
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables.3
Kendall rank correlation coefficient
the Kendall correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully different for a correlation of -1) rank between the two variables.4
Goodman and Kruskal’s gamma
Goodman and Kruskal’s gamma is a measure of rank correlation, i.e., the similarity of the orderings of the data when ranked by each of the quantities. 5
You can find his report here
Now let us try to calculate these correlations using python, you can find code below
output is as below: