The second patch tries to enforce work_mem more strictly. That would be impossible if we were to keep all the BufFile structs in memory, so instead it slices the batches into chunks that fit into work_mem, and then uses a single "overflow" file for slices currently not in memory. These extra slices can't be counted into work_mem, but we should need just very few of them. For example with work_mem=4MB the slice is 128 batches, so we need 128x less overflow files (compared to per-batch).
I want to see if I understand the implications of the per-slice-overflow patch for execution of hashjoin: For each bucket in the hashtable, when attempting to double the number of batches, if the memory that the BufFile structs will occupy once this is done will exceed the work_mem, split each batch into slices that fit into memory. This means that, for each probe-side tuple hashing to that bucket, you have to load every slice of each batch separately into memory to ensure correct results. Is this right?
I'm not entirely sure which of those approaches is the right one. The first one is clearly just a "damage control" for cases where the hash side turned out to be much larger than we expected. With good estimates we probably would not have picked a hash join for those (that is, we should have realized we can't keep work_mem and prohibit hash join).
The second patch however makes hash join viable for some of those cases, and it seems to work pretty well (there are some numbers in the message posted to pgsql-performance thread). So I kinda like this second one.
So, my initial reaction after taking a look at the patches is that I prefer the first approach--increasing the resize threshhold. The second patch, the per-slice-overflow patch, adds a major new mechanic to hashjoin in order to address what is, based on my understanding, an edge case.