# Linear Regression

Separator : space(s)
used in 'Estimate y0 for x0' option

This tool calculates statistical linear regression. Namely, it computes the following elements :
- Linear regression line,
- Total sum of squares (TSS or SST),
- Explained sum of squares (ESS),
- Residual sum of squares (RSS),
- Mean square residual,
- degrees of freedom,
- Residual standard deviation,
- Correlation coefficient,
- Coefficient of determination (R² or r²),
- Regression variance,
- 95% confidence interval,
- 95% prediction interval.

Enter numbers of the series separated by a space.

## Simple Linear Regression Line

The purpose of a simple linear regression is to establish a linear relationship between a single variable Y called dependent variable and a single variable X called independent variable X.

Graphical representation of a linear regression :

Variable X = {x_1, x_2,...,x_n} in x-axis
Variable Y = {y_1, y_2,...,y_n} in y-axis

Computing a linear regression is equivalent to estimate two parameters beta_0 and beta_1 that define the regression line :

y = beta_1 . x + beta_0

The most commonly used method for estimating beta_0 and beta_1 is the least-squares method.

Estimators for beta_0 and beta_1:

We note bar x the arithmetic mean of the X series, bar x = 1/N.sum_{i=1}^{i=N}x_i

We denote bar y the arithmetic mean of the Y series, bar y = 1/N.sum_{i=1}^{i=N}y_i

hat beta_1 = \frac{\text{cov}(X,Y)}{\text{var}(X)} = \frac{sum_{i=1}^{i=n} (x_i - bar x) (y_i - bar y)}{sum_{i=1}^{i=n} (x_i - bar x)^2}

hat beta_0 = bar y - hat beta_1 . bar x

## Estimate y0 for x0

Once the regession line is calculated as explained above, the variable Y can be estimated for any value of variable X using the line equation and estimators of beta_1 and beta_0:

hat y_0 = hat beta_1 . x_0 + hat beta_0

## ESS, RSS, TSS and coefficient of determination (R²)

To qualify the quality of a linear regression, ie its ability to predict the dependent variable (Y), several parameters are used including,

- ESS or Explained Sum of Squares : this is the variation explained by the regression. It is calculated as follows,

ESS = sum_{i=1}^{i=n} (hat y_i - bar y)^2

- RSS or Residual Sum of Squares: this is the variation non-explained by the regression. It is calculated as follows,

RSS = sum_{i=1}^{i=n} (y_i - hat y_i)^2

- TSS or Total Sum of Squares : this is the total variation. It is calculated as follows,

TSS = ESS + RSS = sum_{i=1}^{i=n} (y_i - bar y)^2

- R² or coefficient of determination defined by,

R^2 = \frac{ESS}{TSS} = 1 - \frac{RSS}{TSS}

We see that 0 <= R^2 <= 1.

The closer R² is to 1, the better the quality of the prediction by the linear regression model : the cloud of points is tightened around the regression line. Conversely, the closer R² is to 0, the worse the quality of the prediction.