# Linear Regression

This tool calculates statistical linear regression. Namely, it computes the following elements :

- Linear regression line,

- Total sum of squares (TSS or SST),

- Explained sum of squares (ESS),

- Residual sum of squares (RSS),

- Mean square residual,

- degrees of freedom,

- Residual standard deviation,

- Correlation coefficient,

- Coefficient of determination (R² or r²),

- Regression variance,

- 95% confidence interval,

- 95% prediction interval.

## Simple Linear Regression Line

The purpose of a simple linear regression is to establish a linear relationship between a single variable Y called dependent variable and a single variable X called independent variable X.

Graphical representation of a linear regression :

Variable `X = {x_1, x_2,...,x_n}` in x-axis

Variable `Y = {y_1, y_2,...,y_n}` in y-axis

Computing a linear regression is equivalent to estimate two parameters `beta_0` and `beta_1` that define the regression line :

`y = beta_1 . x + beta_0`

The most commonly used method for estimating `beta_0` and `beta_1` is the least-squares method.

Estimators for `beta_0` and `beta_1`:

We note `bar x` the arithmetic mean of the X series, `bar x = 1/N.sum_{i=1}^{i=N}x_i`

We denote `bar y` the arithmetic mean of the Y series, `bar y = 1/N.sum_{i=1}^{i=N}y_i`

`hat beta_1 = \frac{\text{cov}(X,Y)}{\text{var}(X)} = \frac{sum_{i=1}^{i=n} (x_i - bar x) (y_i - bar y)}{sum_{i=1}^{i=n} (x_i - bar x)^2}`

`hat beta_0 = bar y - hat beta_1 . bar x`

## Estimate y_{0} for x_{0}

Once the regession line is calculated as explained above, the variable Y can be estimated for any value of variable X using the line equation and estimators of `beta_1` and `beta_0`:

`hat y_0 = hat beta_1 . x_0 + hat beta_0`

## ESS, RSS, TSS and coefficient of determination (R²)

To qualify the quality of a linear regression, ie its ability to predict the dependent variable (Y), several parameters are used including,

- ESS or Explained Sum of Squares : this is the variation explained by the regression. It is calculated as follows,

`ESS = sum_{i=1}^{i=n} (hat y_i - bar y)^2`

- RSS or Residual Sum of Squares: this is the variation non-explained by the regression. It is calculated as follows,

`RSS = sum_{i=1}^{i=n} (y_i - hat y_i)^2`

- TSS or Total Sum of Squares : this is the total variation. It is calculated as follows,

`TSS = ESS + RSS = sum_{i=1}^{i=n} (y_i - bar y)^2`

- R² or coefficient of determination defined by,

`R^2 = \frac{ESS}{TSS} = 1 - \frac{RSS}{TSS}`

We see that `0 <= R^2 <= 1`.

The closer R² is to 1, the better the quality of the prediction by the linear regression model : the cloud of points is tightened around the regression line. Conversely, the closer R² is to 0, the worse the quality of the prediction.