Admittedly I don't have a patch, just a bunch of handwaving. One reason I haven't attempted to write it is because although I know how to do the non-parallel version using a BufFile full of match bits in sync with the tuples for outer joins, I haven't figured out how to do it for parallel-aware hash join, because then each loop over the outer batch could see different tuples in each participant. You could use the match bit in HashJoinTuple header, but then you'd have to write all the tuples out again, which is more IO than I want to do. I'll probably start another thread about that.
Going back to the idea of using the match bit in the HashJoinTuple header and writing out all of the outer side for every chunk of the inner side, I was wondering if there was something we could do that was kind of like mmap'ing the outer side file to give the workers in parallel hashjoin the ability to update a match bit in the tuple in place and avoid writing the whole outer side out each time.