# What is correlation and how to find correlation using python

When two sets of data are strongly linked together we say they have a High Correlation.

Correlation is Positive when the values increase together, and
Correlation is Negative when one value decreases as the other increases

In common usage it most often refers to how close two variables are to having a linear relationship with each other. Here is sample values and shape for correlation #### Pearson’s correlation coefficient

This is most commonly used correlation coefficient

The population correlation coefficient ρX,Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is defined as #### Pearson’s correlation coefficient using Python

When calculated using scipy, it returns pearson’s correlation coefficient and 2-tailed p-value

When calculated using numpy, it returns The correlation coefficient matrix of the variables.

#### Spearman’s rank correlation coefficient

Spearman’s rank correlation coefficient or Spearman’s rho, named after Charles Spearman and often denoted by the Greek letter rho. The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables.

#### Kendall rank correlation coefficient

the Kendall rank correlation coefficient, commonly referred to as Kendall’s tau coefficient (after the Greek letter τ), is a statistic used to measure the ordinal association between two measured quantities. A tau test is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.

#### Python code for calculating Person’s, Spearman’s and Kendall’s coefficient.

 ################################################################################################ # name: correlation_coefficient_01.py # desc: correlation coefficient # date: 2018-12-22 # Author: conquistadorjd ################################################################################################ import numpy as np from scipy import stats #Calculate mean by python a = [10,20,30,40,50,60] b = [9,9,10,8,9,10] #Using scipy to calculate person's coefficient pearsonr_val = stats.pearsonr(a,b) print('pearsonr_val : ', pearsonr_val) #pearsonr_val : (0.2130214807490179, 0.6853010393640564) #Using numpy corrcoef_val = np.corrcoef(a,b) print('corrcoef_val : ', corrcoef_val) #corrcoef_val : [[1. 0.21302148] #[0.21302148 1. ]] #Using scipy to calculate Spearmanr spearmanr_val = stats.spearmanr(a,b) print('spearmanr_val : ', spearmanr_val) #spearmanr_val : SpearmanrResult(correlation=0.24688535993934707, pvalue=0.6371960853462737) #Using scipy to calculate kendalltau kendalltau_val = stats.kendalltau(a,b) print('kendalltau_val : ', kendalltau_val) #kendalltau_val : KendalltauResult(correlation=0.2335496832484569, pvalue=0.5374525191136282) print('*** Program ended ***')

Correlation can have a value:

• 1 is a perfect positive correlation
• 0 is no correlation (the values don’t seem linked at all)
• -1 is a perfect negative correlation

Important points to be noted

• Correlation is not causation
• Person’s coefficient works only if there is linear relationship between two variables.

This site uses Akismet to reduce spam. Learn how your comment data is processed.