Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: accounting for memory used for BufFile during hash joins
Date
Msg-id CAAKRu_ZTy=a2jqj0++jDwke1_ssJTBB0ezU2t8OwogmXDbYdYw@mail.gmail.com
Whole thread Raw
In response to accounting for memory used for BufFile during hash joins  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: accounting for memory used for BufFile during hash joins  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers


On Fri, May 3, 2019 at 5:34 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

The second patch tries to enforce work_mem more strictly. That would be
impossible if we were to keep all the BufFile structs in memory, so
instead it slices the batches into chunks that fit into work_mem, and
then uses a single "overflow" file for slices currently not in memory.
These extra slices can't be counted into work_mem, but we should need
just very few of them. For example with work_mem=4MB the slice is 128
batches, so we need 128x less overflow files (compared to per-batch).

I want to see if I understand the implications of the per-slice-overflow patch
for execution of hashjoin:
For each bucket in the hashtable, when attempting to double the number of
batches, if the memory that the BufFile structs will occupy once this is done
will exceed the work_mem, split each batch into slices that fit into memory.
This means that, for each probe-side tuple hashing to that bucket, you have to
load every slice of each batch separately into memory to ensure correct results.
Is this right?
 

I'm not entirely sure which of those approaches is the right one. The
first one is clearly just a "damage control" for cases where the hash
side turned out to be much larger than we expected. With good estimates
we probably would not have picked a hash join for those (that is, we
should have realized we can't keep work_mem and prohibit hash join).

The second patch however makes hash join viable for some of those cases,
and it seems to work pretty well (there are some numbers in the message
posted to pgsql-performance thread). So I kinda like this second one.

So, my initial reaction after taking a look at the patches is that I prefer the
first approach--increasing the resize threshhold. The second patch, the
per-slice-overflow patch, adds a major new mechanic to hashjoin in order to
address what is, based on my understanding, an edge case.

--
Melanie Plageman

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: _bt_split(), and the risk of OOM before its critical section
Next
From: David Fetter
Date:
Subject: Re: range_agg