I am trying to replicate the loss reported after training a tf.keras model. From my understanding, history.history['loss']
should return the average loss per epoch after calling model.fit
. However, it doesn't seem to match what I expected.
Here's the code snippet:
history = model.fit(X, y, batch_size=32, shuffle=True, epochs=1, verbose=1)
Output: 4050/4050 ━━━━━━━━━━━━━━━━━━━━ 66s 13ms/step - loss: 290.9271
print(history.history) # Output: {'loss': [56.9575309753418]}
I created a custom callback to collect losses from each batch:
batch_losses = []
collect_losses = LambdaCallback(on_batch_end=lambda batch, logs: batch_losses.append(logs['loss']))
model.fit(X, y, batch_size=32, shuffle=True, epochs=1, verbose=1, callbacks=[collect_losses])
batch_losses_array = np.array(batch_losses, dtype=np.float32)
Here are the results:
print(float(np.mean(batch_losses_array))) # Output: 290.98486328125
print(batch_losses_array[-1]) # Output: 56.95753
It appears that history.history['loss'] returns the loss for the last batch, not the average per epoch. A simple average of the batch losses is very close to the reported value after training, but there is still a discrepancy.
All my data are in np.float32. TensorFlow version is 2.18.0. Batch sizes are consistent (32), meaning no smaller final batch; hence, the weighted average and simple average should be identical.
My questions are:
- Is history.history['loss'] supposed to return the last batch's loss or the average loss per epoch?
- How can I directly obtain the reported loss value after training?
Thanks in advance for any insights!