I have a BERT model that I want to fine-tune. Initially, I use a training dataset, which I split into a training and validation set. During fine-tuning, I monitor the validation loss to ensure that the model is learning properly. If the validation loss increases or does not improve for a certain number of epochs (as defined by the patience parameter), I apply early stopping.
Once the model has been fine-tuned, I then want to re-train it on 100% of the training data — including the validation set used earlier — to make full use of the available data.
My question is: during this final round of fine-tuning on the full dataset, how many epochs should I train for, given that I no longer have a validation set to monitor and prevent overfitting?