Correlation

Separator : space(s)


This tool calculates different correlations : covariance based on sample or population, correlation coefficient (rho), coefficient of determination (R2), Kendall correlation and Spearman correlation.

Enter dataset numbers separated by a space.

Covariance

The covariance of two statistical series is a statistical measure that quantifies their independence.

Calculation of covariance from population data
X and Y are two population datasets,

`X = {x_1, x_2, ..., x_N}`
`Y = {y_1, y_2, ..., y_N}`
We denote `bar x` the arithmetic mean of the X series, `bar x = 1/N.sum_{i=1}^{i=N}x_i`
The arithmetic mean of Y dataset is `bar y`, `bar y = 1/N.sum_{i=1}^{i=N}y_i`
The covariance of X and Y series can be calculated as follows :

`\sigma _{xy} = \frac{1}{N}sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)`

Calculation of covariance from sample data
In this case, values are available for a sample and not for entire population. The following estimator is used to estimate the covariance for the entire population: X and Y are two sample series,

`X={x_1,x_2,...,x_n}`
`Y={y_1,y_2,...,y_n}`

The averages of the two samples are `bar x` and `bar y`,

`bar x = 1/n.sum_{i=1}^{i=n}x_i`

`bar y = 1/n.sum_{i=1}^{i=n}y_i`

The unbiased covariance estimator for the entire population is:

`\sigma _{xy} = \frac{1}{n-1}sum_{i=1}^{i=n} (x_i - bar x) (y_i - bar y)`

Pearson Correlation Coefficient

What is called 'correlation' in statistics is actually a linear correlation coefficient which is equal to the quotient of their covariance by the product of their standard deviations.

X and Y are two datasets,

`X = {x_1, x_2, ..., x_N}`
`Y = {y_1, y_2, ..., y_N}`
We denote `bar x` the arithmetic mean of the X series, `bar x = 1/N.sum_{i=1}^{i=N}x_i`

The arithmetic mean of the Y series is `bar y`, `bar y = 1/N.sum_{i=1}^{i=N}y_i`

The correlation coefficient of X and Y series can be calculated as follows :

`r = \frac{sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)}{sqrt(sum_{i=1}^{i=N} (x_i - bar x)^2) . sqrt(sum_{i=1}^{i=N} (y_i - bar y)^2)}`

Coefficient of determination R²

The coefficient of determination is an indication of the quality of the prediction of a linear regression.

How to calculate the coefficient of determination ?

X is a dataset `X = {x_1, x_2, ..., x_N}`
We denote `bar x` the arithmetic mean of the X series either, `bar x = 1/N.sum_{i=1}^{i=N}x_i`
The coefficient of determination of the X series can be calculated as follows:

`R^2 = 1 - \frac{sum_{i=1}^{i=N} (x_i - hat x_i)^2}{sum_{i=1}^{i=N} (x_i - bar x)^2}`

`{hat x_1, hat x_2,..., hat x_N}` being the values predicted by the linear regression of the X series.

See also

Standard deviation
Arithmetic mean
Linear Regression