Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: accounting for memory used for BufFile during hash joins
Date
Msg-id 20190529145517.sj2poqmb3cr4cg6w@development
Whole thread Raw
In response to Re: accounting for memory used for BufFile during hash joins  (Hubert Zhang <hzhang@pivotal.io>)
Responses Re: accounting for memory used for BufFile during hash joins
List pgsql-hackers
On Tue, May 28, 2019 at 03:40:01PM +0800, Hubert Zhang wrote:
>On Sat, May 4, 2019 at 8:34 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>Hi Tomas
>
>I read your second patch which uses overflow buf files to reduce the total
>number of batches.
>It would solve the hash join OOM problem what you discussed above: 8K per
>batch leads to batch bloating problem.
>
>I mentioned in another thread:
>
>https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
>There is another hashjoin OOM problem which disables splitting batches too
>early. PG uses a flag hashtable->growEnable to determine whether to split
>batches. Once one splitting failed(all the tuples are assigned to only one
>batch of two split ones) The growEnable flag would be turned off forever.
>
>The is an opposite side of batch bloating problem. It only contains too few
>batches and makes the in-memory hash table too large to fit into memory.
>

Yes. There are deffinitely multiple separate issues in the hashjoin code,
and the various improvements discussed in this (and other) thread usually
address just a subset of them. We need to figure out how to combine them
or maybe devise some more generic solution.

So I think we need to take a step back, and figure out how to combine
these improvements - otherwise we might commit a fix for one issue, making
it much harder/impossible to improve the other issues.

The other important question is whether we see these cases as outliers
(and the solutions as last-resort-attempt-to-survive kind of fix) or more
widely applicable optimizations. I've seen some interesting speedups with
the overflow-batches patch, but my feeling is we should really treat it as
a last-resort to survive. 

I had a chat about this with Thomas Munro yesterday. Unfortunately, some
beer was involved but I do vaguely remember he more or less convinced me
the BNL (block nested loop join) might be the right approach here. We
don't have any patch for that yet, though :-(

>Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due to
>performance), in-memory hash table takes memory as well and splitting
>batched may(not must) reduce the in-memory hash table size but introduce
>more batches(and thus more memory usage 8KB*#batch).
>Can we conclude that it would be worth to splitting if satisfy:
>(The reduced memory of in-memory hash table) - (8KB * number of new
>batches) > 0
>

Something like that, yes.

>So I'm considering to combine our patch with your patch to fix join OOM
>problem. No matter the OOM is introduced by (the memory usage of in-memory
>hash table) or (8KB * number of batches).
>
>nbatch_inmemory in your patch could also use the upper rule to redefine.
>
>What's your opinion?
>

One of the issues with my "overflow batches" patch, pointed out to me by
Thomas yesterday, is that it only works with non-parallel hash join. And
we don't know how to make it work in the parallel mode :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Dead stores in src/common/sha2.c
Next
From: Michael Paquier
Date:
Subject: Re: docs about FKs referencing partitioned tables