# Correlation

Separator : space(s)

This tool calculates different correlations : covariance based on sample or population, correlation coefficient (rho), coefficient of determination (R2), Kendall correlation and Spearman correlation.

Enter dataset numbers separated by a space.

## Covariance

The covariance of two statistical series is a statistical measure that quantifies their independence.

Calculation of covariance from population data
X and Y are two population datasets,

X = {x_1, x_2, ..., x_N}
Y = {y_1, y_2, ..., y_N}
We denote bar x the arithmetic mean of the X series, bar x = 1/N.sum_{i=1}^{i=N}x_i
The arithmetic mean of Y dataset is bar y, bar y = 1/N.sum_{i=1}^{i=N}y_i
The covariance of X and Y series can be calculated as follows :

\sigma _{xy} = \frac{1}{N}sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)

Calculation of covariance from sample data
In this case, values are available for a sample and not for entire population. The following estimator is used to estimate the covariance for the entire population: X and Y are two sample series,

X={x_1,x_2,...,x_n}
Y={y_1,y_2,...,y_n}

The averages of the two samples are bar x and bar y,

bar x = 1/n.sum_{i=1}^{i=n}x_i

bar y = 1/n.sum_{i=1}^{i=n}y_i

The unbiased covariance estimator for the entire population is:

\sigma _{xy} = \frac{1}{n-1}sum_{i=1}^{i=n} (x_i - bar x) (y_i - bar y)

## Pearson Correlation Coefficient

What is called 'correlation' in statistics is actually a linear correlation coefficient which is equal to the quotient of their covariance by the product of their standard deviations.

X and Y are two datasets,

X = {x_1, x_2, ..., x_N}
Y = {y_1, y_2, ..., y_N}
We denote bar x the arithmetic mean of the X series, bar x = 1/N.sum_{i=1}^{i=N}x_i

The arithmetic mean of the Y series is bar y, bar y = 1/N.sum_{i=1}^{i=N}y_i

The correlation coefficient of X and Y series can be calculated as follows :

r = \frac{sum_{i=1}^{i=N} (x_i - bar x) (y_i - bar y)}{sqrt(sum_{i=1}^{i=N} (x_i - bar x)^2) . sqrt(sum_{i=1}^{i=N} (y_i - bar y)^2)}

## Coefficient of determination R²

The coefficient of determination is an indication of the quality of the prediction of a linear regression.

How to calculate the coefficient of determination ?

X is a dataset X = {x_1, x_2, ..., x_N}
We denote bar x the arithmetic mean of the X series either, bar x = 1/N.sum_{i=1}^{i=N}x_i
The coefficient of determination of the X series can be calculated as follows:

R^2 = 1 - \frac{sum_{i=1}^{i=N} (x_i - hat x_i)^2}{sum_{i=1}^{i=N} (x_i - bar x)^2}

{hat x_1, hat x_2,..., hat x_N} being the values predicted by the linear regression of the X series.