ML Question

I am new to ML, and I have an intuition question. In order to start with a linear regression model, we have to run the regression on the training data, and get the model's parameters. Then we use these parameters with the testing data, and calculate our cost. Using GD we could iterate until we find the best fitting parameters to minimize the cost function. My question is this, OLS by definition give us the best fitting line, right, but when we use the min cost function, we find out that the parameters obtained by the regression models weren't the best, why is that? Is it because of estimation error? and ML job here is to get eliminate the estimation error? Did I even get the process correctly? Maybe I am missing something IDK, please LMK, thanks.
Last edited:
Got it figured out, seems like GD is an alternative for OLS; I'm just throwing it out there for those who are carious as well.
By definition, OLS is BLUE (Best Linear Unbiased Estimator) but it doesn't necessarily give you the best model that has the best prediction performance. There are several reasons for that. First of all, fitting best to the training data doesn't imply the best out-sample prediction performance due to overfitting. I mean, ypu can always find the perfect fitting model to the training data and in this case you would overfit (since your model will be highly dependent on your train data and will have poor power of prediction new points). Secondly, sometimes for several reasons we choose to use biased estimators in order to get better predictive power (for instance Lasso and Ridge). This is due to the fact that OLS model becomes highly volatile (especially when there are many independent variables) so that even if it is unbiased, it is now highly volatile and parameters you obtain from OLS are vastly different from one training data to other. Hence, in that case using other non-OLS methods would give you better predictive outcome (even if they are biased). Finally, by definition OLS is Best LINEAR unbiased estimator. There are many other non-linear models which may gove you better results.