## What is linear regression?

In a simple Linear Regression model, the relationship between the dependent variable (Y) and the independent variable (X) is represented by a straight line:

**Where:**

- “Y” is the dependent variable (output/response)
- “X
_{i}” is the independent variable - “f” is a linear function
- “β” is an unknown parameter
- “e
_{i}” are the error terms

## Important Considerations:

While Linear Regression is a powerful and widely used statistical technique, it's essential to consider its assumptions and limitations:

**Linearity: **

The relationship between the independent and dependent variables must be linear. If the relationship is nonlinear, other methods may be more appropriate.

Independence: The observations should be independent of each other. In cases of time series or spatial data, other techniques may be more suitable.

**Homoscedasticity: **

The variance of the error terms should be constant across all levels of the independent variable.

**Normality: **

The error terms should be normally distributed.

When using Linear Regression, always validate the assumptions and evaluate the model's performance using appropriate metrics, such as the coefficient of determination (R-squared), residual analysis, and cross-validation.

## Examples:

In this section, we’ll describe the method of calculating the linear regression between any two data sets.

**Example 1: **

Calculate the regression equation if:

X = 5, 22, 19, 8, 33, 10

Y = 9, 71, 31, 12, 44, 28

**Solution: **

**Step 1:** Calculate the mean of the data sets

**Mean of X** = μ_{x} = ΣX / n

μ_{x} = (5 + 22 + 19 + 8 + 33 + 10) / 6

μ_{x} = 97 / 6

μ_{x} = 16.67

Mean of Y = μ_{y} = ΣY / n

μ_{y} = (9 + 71 + 31 + 12 + 44 + 28) / 6

μ_{y} = 195 / 6

μ_{y} = 32.5

**Step 2:** Make a table:

X_{i} | Y_{i} | X_{i} * Y_{i} | X_{i}^{2} | Y_{i}^{2} |

5 | 9 | 45 | 25 | 81 |

22 | 71 | 1562 | 484 | 5041 |

19 | 31 | 589 | 361 | 961 |

8 | 12 | 96 | 64 | 144 |

33 | 44 | 1452 | 1089 | 1936 |

10 | 28 | 280 | 100 | 784 |

Σ X_{i} = 97 | Σ Y_{i} = 195 | Σ X_{i} * Y_{i} = 4024 | Σ X_{i}^{2} = 2123 | Σ Y_{i}^{2} = 8947 |

**Step 3:** Calculate slope “**m**”

m = {(n * ∑ X_{i} *Y_{i}) − (∑X_{i} * ∑Y_{i})} / {n * ∑X_{i}^{2 }− (∑X_{i})^{2}}

m = {(6 * 4024) – (97 – 195)} / {(6 *2123) - (9409)}

m = 5229 / 3329

**m = 1.5707**

**Step 4:** Calculate Y-intercept

b = {(∑Y_{i}) − (m * ∑X_{i})} / n

b = {(195) − (1.5707 * 97)} / 6

**b = 7.107**

**Step 5:** Make the linear equation

Y = (slope)x + (y-intercept)