Re: Avoiding hash join batch explosions with extreme skew and weirdstats - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Avoiding hash join batch explosions with extreme skew and weirdstats
Date
Msg-id 20190607140540.gx736kravrzna57o@development
Whole thread Raw
In response to Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
On Thu, Jun 06, 2019 at 04:37:19PM -0700, Melanie Plageman wrote:
>On Thu, May 16, 2019 at 3:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
>> Admittedly I don't have a patch, just a bunch of handwaving.  One
>> reason I haven't attempted to write it is because although I know how
>> to do the non-parallel version using a BufFile full of match bits in
>> sync with the tuples for outer joins, I haven't figured out how to do
>> it for parallel-aware hash join, because then each loop over the outer
>> batch could see different tuples in each participant.  You could use
>> the match bit in HashJoinTuple header, but then you'd have to write
>> all the tuples out again, which is more IO than I want to do.  I'll
>> probably start another thread about that.
>>
>>
>Going back to the idea of using the match bit in the HashJoinTuple header
>and writing out all of the outer side for every chunk of the inner
>side, I was wondering if there was something we could do that was kind
>of like mmap'ing the outer side file to give the workers in parallel
>hashjoin the ability to update a match bit in the tuple in place and
>avoid writing the whole outer side out each time.
>

I think this was one of the things we discussed in Ottawa - we could pass
index of the tuple (in the batch) along with the tuple, so that each
worker know which bit to set.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Custom table AMs need to include heapam.h because of BulkInsertState
Next
From: Tomas Vondra
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weirdstats