What is linear regression?
In a simple Linear Regression model, the relationship between the dependent variable (Y) and the independent variable (X) is represented by a straight line:
Where:
- “Y” is the dependent variable (output/response)
- “Xi” is the independent variable
- “f” is a linear function
- “β” is an unknown parameter
- “ei” are the error terms
Important Considerations:
While Linear Regression is a powerful and widely used statistical technique, it's essential to consider its assumptions and limitations:
Linearity:
The relationship between the independent and dependent variables must be linear. If the relationship is nonlinear, other methods may be more appropriate.
Independence: The observations should be independent of each other. In cases of time series or spatial data, other techniques may be more suitable.
Homoscedasticity:
The variance of the error terms should be constant across all levels of the independent variable.
Normality:
The error terms should be normally distributed.
When using Linear Regression, always validate the assumptions and evaluate the model's performance using appropriate metrics, such as the coefficient of determination (R-squared), residual analysis, and cross-validation.
Examples:
In this section, we’ll describe the method of calculating the linear regression between any two data sets.
Example 1:
Calculate the regression equation if:
X = 5, 22, 19, 8, 33, 10
Y = 9, 71, 31, 12, 44, 28
Solution:
Step 1: Calculate the mean of the data sets
Mean of X = μx = ΣX / n
μx = (5 + 22 + 19 + 8 + 33 + 10) / 6
μx = 97 / 6
μx = 16.67
Mean of Y = μy = ΣY / n
μy = (9 + 71 + 31 + 12 + 44 + 28) / 6
μy = 195 / 6
μy = 32.5
Step 2: Make a table:
Xi | Yi | Xi * Yi | Xi2 | Yi2 |
5 | 9 | 45 | 25 | 81 |
22 | 71 | 1562 | 484 | 5041 |
19 | 31 | 589 | 361 | 961 |
8 | 12 | 96 | 64 | 144 |
33 | 44 | 1452 | 1089 | 1936 |
10 | 28 | 280 | 100 | 784 |
Σ Xi = 97 | Σ Yi = 195 | Σ Xi * Yi = 4024 | Σ Xi2 = 2123 | Σ Yi2 = 8947 |
Step 3: Calculate slope “m”
m = {(n * ∑ Xi *Yi) − (∑Xi * ∑Yi)} / {n * ∑Xi2 − (∑Xi)2}
m = {(6 * 4024) – (97 – 195)} / {(6 *2123) - (9409)}
m = 5229 / 3329
m = 1.5707
Step 4: Calculate Y-intercept
b = {(∑Yi) − (m * ∑Xi)} / n
b = {(195) − (1.5707 * 97)} / 6
b = 7.107
Step 5: Make the linear equation
Y = (slope)x + (y-intercept)