RE high dimensional spaces. They work because most real life data you want to predict ends up in the patches of the space where the training data also was. They do not extrapolate.
To get more generalisation, parsimony of model may be the answer. For example E=mc^2 involves just three variables and a couple of operations. The laws of physics are based on this seeking of parsimony. The breadth of situations where they are applicable is staggering.
They are products of the human brain which is certainly complex but has somehow distilled down the data into these laws.
Possibly this parsimony requirement is due to the limited bandwidth communication between human brains, if we had just one big brain maybe it would not have occurred?
Also physics may be more compliant to parsimonious representations. Biological systems are the results of billions of years of spaghetti programming via evolution and there may not be many parsimonious representations available here.