Home > mailing lists

Re: OOM in hash join - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: OOM in hash join
Date	April 14, 2023 14:27:55
Msg-id	CA+hUKGKDbv+5uiJZDdB1wttkMPFs9CDb6=02Qxitq4am-KBM_A@mail.gmail.com Whole thread Raw
In response to	OOM in hash join (Konstantin Knizhnik <knizhnik@garret.ru>)
List	pgsql-hackers

Tree view

On Fri, Apr 14, 2023 at 10:59 PM Konstantin Knizhnik <knizhnik@garret.ru> wrote:
> Too small value of work_mem cause memory overflow in parallel hash join
> because of too much number batches.

Yeah.  Not only in parallel hash join, but in any hash join
(admittedly parallel hash join has higher per-batch overheads; that is
perhaps something we could improve).  That's why we tried to invent an
alternative strategy where you loop over batches N times, instead of
making more batches, at some point:

https://www.postgresql.org/message-id/flat/CA+hUKGKWWmf=WELLG=aUGbcugRaSQbtm0tKYiBut-B2rVKX63g@mail.gmail.com

That thread starts out talking about 'extreme skew' etc but the more
general problem is that, at some point, even with perfectly evenly
distributed keys, adding more batches requires more memory than you
can save by doing so.  Sure, it's a problem that we don't account for
that memory properly, as complained about here:

https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh@development

If you did have perfect prediction of every byte you will need, maybe
you could say, oh, well, we just don't have enough memory for a hash
join, so let's do a sort/merge instead.  But you can't, because (1)
some types aren't merge-joinable, and (2) in reality sometimes you've
already started the hash join due to imperfect stats so it's too late
to change strategies.

pgsql-hackers by date:

From: Matthias van de Meent
Date: 14 April 2023, 14:21:05
Subject: Re: OOM in hash join

From: Nishant Sharma
Date: 14 April 2023, 14:38:39
Subject: postgres_fdw: wrong results with self join + enable_nestloop off

Re: OOM in hash join - Mailing list pgsql-hackers

Previous

Next