Home > mailing lists

Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: accounting for memory used for BufFile during hash joins
Date	May 7, 2019 00:58:09
Msg-id	CAAKRu_ZTy=a2jqj0++jDwke1_ssJTBB0ezU2t8OwogmXDbYdYw@mail.gmail.com Whole thread Raw
In response to	accounting for memory used for BufFile during hash joins (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: accounting for memory used for BufFile during hash joins (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, May 3, 2019 at 5:34 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

The second patch tries to enforce work_mem more strictly. That would be
impossible if we were to keep all the BufFile structs in memory, so
instead it slices the batches into chunks that fit into work_mem, and
then uses a single "overflow" file for slices currently not in memory.
These extra slices can't be counted into work_mem, but we should need
just very few of them. For example with work_mem=4MB the slice is 128
batches, so we need 128x less overflow files (compared to per-batch).

I want to see if I understand the implications of the per-slice-overflow patch
for execution of hashjoin:
For each bucket in the hashtable, when attempting to double the number of
batches, if the memory that the BufFile structs will occupy once this is done
will exceed the work_mem, split each batch into slices that fit into memory.
This means that, for each probe-side tuple hashing to that bucket, you have to
load every slice of each batch separately into memory to ensure correct results.
Is this right?

I'm not entirely sure which of those approaches is the right one. The
first one is clearly just a "damage control" for cases where the hash
side turned out to be much larger than we expected. With good estimates
we probably would not have picked a hash join for those (that is, we
should have realized we can't keep work_mem and prohibit hash join).

The second patch however makes hash join viable for some of those cases,
and it seems to work pretty well (there are some numbers in the message
posted to pgsql-performance thread). So I kinda like this second one.

So, my initial reaction after taking a look at the patches is that I prefer the
first approach--increasing the resize threshhold. The second patch, the
per-slice-overflow patch, adds a major new mechanic to hashjoin in order to
address what is, based on my understanding, an edge case.

Melanie Plageman

pgsql-hackers by date:

From: Peter Geoghegan
Date: 07 May 2019, 00:51:38
Subject: Re: _bt_split(), and the risk of OOM before its critical section

From: David Fetter
Date: 07 May 2019, 01:07:04
Subject: Re: range_agg

Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers

Previous

Next