Week 2 – 20th Sept

Today, I explored validation techniques for smaller data sets, namely K-Fold cross validation.

To start, the linear regression model was re-retrained fusing 70% of the data as a training set and 30% as the test set. Here are the results obtained

As we can see, the model shows similar performance with r-squared = 0.38 approx

Now the same model was tested again using K-fold cross-validtion with 5 folds. Here are the results for the linear and polynomial regression models

The linear and polynomial models both show similar mean r-squared values of 0.30, which is lower than the score obtained without using cross-validation.

The polynomial regression score will tend to increase with higher degrees of polynomial if we validate using the test data as it leads to overfitting

Leave a Reply

Your email address will not be published. Required fields are marked *