I'm working to minimize the RMSE for the Boston housing data set. This is a very basic result:
library(Metrics)
df <- MASS::Boston
train <- df[1:400, ]
test <- df[401:506, ]
Boston_lm <- lm(medv ~., data = train)
Boston_lm_RMSE <- Metrics::rmse(actual = test$medv,
predicted = predict(object = Boston_lm, newdata = test))
# 6.155792
However, if the amount of train and test is changed, the RMSE is very different:
df <- MASS::Boston
train <- df[1:300, ]
test <- df[301:506, ]
Boston_lm <- lm(medv ~., data = train)
Boston_lm_RMSE <- Metrics::rmse(actual = test$medv,
predicted = predict(object = Boston_lm, newdata = test))
# 19.13284
Is there a way to determine the train and test amounts that return the lowest RMSE on the test data set without looping through a range of possible values?