Jun 30, 2021
Personally I think that performance on the test set is all that matters.
Generalisation can be tested by generating multiple models such as using K-fold.
I would guess that its more likely that a smaller difference between train and test would give better generalisation but its not necessary.