Today, we were introduced to the idea of linear regression using multiple varibale. This technique is essential when we have more than one predictors, especially if they are highly correlated like the Obesity and Inactivty data.
The linear regression model for multi variable was built using sklearn package from scikit-lear module whcih provides many inbuilt function for linear regression
Expression: Y = B0 + B1.X_obesity + B2.X_inactivity + e
Result for 2 variable
B0: 1.6535991518559392 Coefficients:
[('B1', 0.23246991917672563), ('B2', 0.11106296576800405)
R^2 =
0.34073967115731385
As expected, there is an improvement in the R-squared value as compared to single variable model.
Now, we introduce one more predictor variable as the product of inactivitya and obesity
Y = B0 + B1*X_inactivity + B2*X_obesity + B3*X_inactivity * X_obestiy + e
Result:
B0: -10.06467453049291 Coefficients: [('INACTIVE', 1.1533595376219894), ('OBESE', 0.743039972401428), ('Obese x inactive', -0.049635909945020235)] R-squared = 0.36458725864661756
As expected, the performace has increased again, albeit by a very small margin
Now, lets try adding two more predictors – X1^2 and X2^2
Y = B0 + B1*X_inactivity + B2*X_obesity + B3*X_inactivity * X_obestiy + B4*X1^2 + B5*X2^2 + e
Result:
B0: -11.590588857309138 Coefficients: [('INACTIVE', 0.47789170236400547), ('OBESE', 1.489656837325879), ('Obese x inactive', 0.01970764143007776), ('Inactivity sqaure', -0.01973523748870601), ('Obesity square', -0.04942722743255474)] Score = 0.38471232946078504
The score has improved yet again.
It seems this process adding higher powers of the predictor model is an effective way of improving the accuracy of the model, although it can no longer be considered as a linear model and this is now a quadratic model. But using this process infinitely to get a nearly perfect score can lead to overfitting redering the model inefective to predicting new data
To properly validate, we need to test the model accuracy based on a new traiing data but as we have limited data available, other validation techniques must be explored .