Home > mailing lists

Re: Avoiding hash join batch explosions with extreme skew and weirdstats - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Avoiding hash join batch explosions with extreme skew and weirdstats
Date	May 16, 2019 16:39:47
Msg-id	20190516163451.r7crbjgsssn6eew6@development Whole thread Raw
In response to	Avoiding hash join batch explosions with extreme skew and weird stats (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Avoiding hash join batch explosions with extreme skew and weird stats
List	pgsql-hackers

Tree view

On Thu, May 16, 2019 at 01:22:31PM +1200, Thomas Munro wrote:
> ...
>
>Here's a quick hack to show that a 95% cut-off fixes those examples.
>I don't really know how to choose the number, but I suspect it should
>be much closer to 100 than 50.  I think this is the easiest of three
>fundamental problems that need to be solved in this area.  The others
>are: accounting for per-partition overheads as Tomas pointed out, and
>providing an actual fallback strategy that respects work_mem when
>extreme skew is detected OR per-partition overheads dominate.  I plan
>to experiment with nested loop hash join (or whatever you want to call
>it: the thing where you join every arbitrary fragment of the hash
>table against the outer batch, and somehow deal with outer match
>flags) when time permits.
>

I think this is a step in the right direction, but as I said on the other
thread(s), I think we should not disable growth forever and recheck once
in a while. Otherwise we'll end up in sad situation with non-uniform data
sets, as poined out by Hubert Zhang in [1]. It's probably even truer with
this less strict logic, using 95% as a threshold (instead of 100%).

I kinda like the idea with increasing the spaceAllowed value. Essentially,
if we decide adding batches would be pointless, increasing the memory
budget is the only thing we can do anyway.

The problem however is that we only really look at a single bit - it may
be that doubling the batches would not help, but doing it twice would
actually reduce the memory usage. For example, assume there are 2 distinct
values in the batch, with hash values (in binary)

  101010000
  101010111

and assume we currently. Clearly, splitting batches is going to do nothing
until we get to the 000 vs. 111 parts.

At first I thought this is rather unlikely and we can ignore that, but I'm
not really sure about that - it may actually be pretty likely. We may get
to 101010 bucket with sufficiently large data set, and then it's ~50%
probability the next bit is the same (assuming two distinct values). So
this may be quite an issue, I think.

regards

[1] https://www.postgresql.org/message-id/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Albert Cervera i Areny
Date: 16 May 2019, 16:24:40
Subject: Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown

From: Andrey Borodin
Date: 16 May 2019, 17:13:22
Subject: Re: pglz performance

Re: Avoiding hash join batch explosions with extreme skew and weirdstats - Mailing list pgsql-hackers

Previous

Next