Thank you Gunther for bringing this up. It's been bothering me quite a bit over time as well.
Forgive the naive question, but does the query planner's cost estimator only track a single estimate of cost that gets accumulated and compared across plan variants? Or is it keeping a range or probabilistic distribution? I'm suspecting the former, but i bet either of the latter would fix this rapidly.
The cases that frustrate me are where NL is chosen over something like HJ, where if the query planner is slightly wrong on the lower side, then NL would certainly beat HJ (but by relatively small amounts), but a slight error on the higher side mean that the NL gets punished tremendously, do to the big-o penalty difference it's paying over the HJ approach. Having the planner with some notion of the distribution might help it make a better assessment of the potential consequences for being slightly off in its estimates. If it notices that being off on a plan involving a NL sends the distribution off into hours instead of seconds, it could potentially avoid it even if it might be slightly faster in the mean.
<fantasy> If i ever find time, maybe i'll try to play around with this idea and see how it performs... </fantasy>
-dave-