Home > mailing lists

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

From	David Kimura
Subject	Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date	April 30, 2020 02:44:53
Msg-id	CAHnPFjSV8u=D85RnorugR-5-RR73msDghuQ1sRRnwbVa6S-Oyg@mail.gmail.com Whole thread Raw
In response to	Re: Avoiding hash join batch explosions with extreme skew and weird stats (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Avoiding hash join batch explosions with extreme skew and weird stats (David Kimura <david.g.kimura@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Apr 29, 2020 at 4:39 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> In addition to many assorted TODOs in the code, there are a few major
> projects left:
> - Batch 0 falling back
> - Stripe barrier deadlock
> - Performance improvements and testing
>

Batch 0 never spills.  That behavior is an artifact of the existing design that
as an optimization special cases batch 0 to fill the initial hash table. This
means it can skip loading and doesn't need to create a batch file.

However in the pathalogical case where all tuples hash to batch 0 there is no
way to redistribute those tuples to other batches. So, existing hash join
implementation allows work_mem to be exceeded for batch 0.

In adaptive hash join approach, there is another way to deal with a batch that
exceeds work_mem. If increasing the number of batches does not work then the
batch can be split into stripes that will not exceed work_mem. Doing this
requires spilling the excess tuples to batch files. Following patch adds logic
to create a batch 0 file for serial hash join so that even in pathalogical case
we do not need to exceed work_mem.

Thanks,
David

Attachment

v6-0002-Implement-fallback-of-batch-0-for-serial-adaptive.patch

pgsql-hackers by date:

From: David Zhang
Date: 30 April 2020, 02:42:50
Subject: Can the OUT parameter be enabled in stored procedure?

From: "Jonathan S. Katz"
Date: 30 April 2020, 02:55:16
Subject: Re: Poll: are people okay with function/operator table redesign?

Re: Avoiding hash join batch explosions with extreme skew and weird stats - Mailing list pgsql-hackers

Attachment

Previous

Next