Home > mailing lists

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date	June 9, 2020 00:12:25
Msg-id	CAAKRu_YKsO6=GMN_6SMeJwuRXEbb1o2mtReHT-GULXt9mtnKYA@mail.gmail.com Whole thread
In response to	Re: Avoiding hash join batch explosions with extreme skew and weird stats (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Avoiding hash join batch explosions with extreme skew and weirdstats
List	pgsql-hackers

Tree view

On Wed, May 27, 2020 at 7:25 PM Melanie Plageman <melanieplageman@gmail.com> wrote:

I've attached a rebased patch which includes the "provisionally detach"
deadlock hazard fix approach

Alas, the "provisional detach" logic proved incorrect (see last point in
the list of changes included in the patch at bottom).

Also, we kept the batch 0 spilling patch David Kimura authored [1]
separate so it could be discussed separately because we still had some
questions.

The serial batch 0 spilling is in the attached patch. Parallel batch 0
spilling is still in a separate batch that David Kimura is working on.

I've attached a rebased and updated patch with a few fixes:

- semi-join fallback works now

- serial batch 0 spilling in main patch

- added instrumentation for stripes to the parallel case
- SharedBits uses same SharedFileset as SharedTuplestore
- reverted the optimization to allow workers to re-attach to a batch and
help out with stripes if they are sure they pose no deadlock risk

For the last point, I discovered a pretty glaring problem with this
optimization: I did not include the bitmap created by a worker while
working on its first participating stripe in the final combined bitmap.
I only was combining the last bitmap file each worker worked on.

I had the workers make new bitmaps for each time that they attached to
the batch and participated because having them keep an open file
tracking information for a batch they are no longer attached to on the
chance that they might return and work on that batch was a
synchronization nightmare. It was difficult to figure out when to close
the file if they never returned and hard to make sure that the combining
worker is actually combining all the files from all participants who
were ever active.

I am sure I can hack around those, but I think we need a better solution
overall. After reverting those changes, loading and probing of stripes
after stripe 0 is serial. This is not only sub-optimal, it also means
that all the synchronization variables and code complexity around
coordinating work on fallback batches is practically wasted.
So, they have to be able to collaborate on stripes after the first
stripe. This version of the patch has correct results and no deadlock
hazard, however, it lacks parallelism on stripes after stripe 0.
I am looking for ideas on how to address the deadlock hazard more
efficiently.

The next big TODOs are:
- come up with a better solution to the potential tuple emitting/barrier
waiting deadlock issue
- parallel batch 0 spilling complete

Melanie Plageman

Attachment

v9-0001-Implement-Adaptive-Hashjoin.patch

pgsql-hackers by date:

From: Tom Lane
Date: 08 June 2020, 23:00:26
Subject: Re: Remove SpinLockFree() / S_LOCK_FREE()?

From: Thomas Munro
Date: 09 June 2020, 00:21:53
Subject: Re: BufFileRead() error signalling

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

Attachment

Previous

Next