Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date
Msg-id CAAKRu_Z2qKMvdD3=J7-Gk1-0eu94NSHNDkL5E4EnGEdS=hTX0w@mail.gmail.com
Whole thread Raw
In response to Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Avoiding hash join batch explosions with extreme skew and weird stats  (David Kimura <david.g.kimura@gmail.com>)
Re: Avoiding hash join batch explosions with extreme skew and weird stats  (Melanie Plageman <melanieplageman@gmail.com>)
List pgsql-hackers
I've attached a patch which should address some of the previous feedback
about code complexity. Two of my co-workers and I wrote what is
essentially a new prototype of the idea. It uses the main state machine
to route emitting unmatched tuples instead of introducing a separate
state. The logic for falling back is also more developed.

In addition to many assorted TODOs in the code, there are a few major
projects left:
- Batch 0 falling back
- Stripe barrier deadlock
- Performance improvements and testing

I will address the stripe barrier deadlock here. David is going to send
a separate email about batch 0 falling back.

There is a deadlock hazard in parallel hashjoin (pointed out by Thomas
Munro in the past). Workers attached to the stripe_barrier emit tuples
and then wait on that barrier.
I believe that that can be addressed starting with this
relatively unoptimized solution:
- after probing a stripe in a batch, a worker sets the status of that
  batch to "tentatively done" and saves the stripe_barrier phase
- if that worker is not the only worker attached to that batch, it
  detaches from both stripe and batch barriers and moves on to other
  batches
- if that worker is the only worker attached to the batch, it will
  proceed to load the next stripe of that batch, and, once it has
  finished loading, it will set the status of the batch back to "not
  done" for itself
- when the other worker encounters that batch again, if the
  stripe_barrier phase has not moved forward, it will mark that batch as
  done for itself. if the stripe_barrier phase has moved forward, it can
  join in in probing this batch for the current stripe.
Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Next
From: Richard Guo
Date:
Subject: Re: Pulling up sublink may break join-removal logic