Also, it might help if you can provide a query/ies with numbers where this optimization shows improvement.
I can't provide the real queries where we encountered the problem because they are internal. However I showed a simplified version of the queries in my first post.
On our queries, the change made quite a difference - execution time dropped from 31.4 seconds to 7.2 seconds. Explain analyze also shows that memory use dropped significantly and we didn't have to spill the sort to disk