Re: Memory leak from ExecutorState context? - Mailing list pgsql-hackers

From Jehan-Guillaume de Rorthais
Subject Re: Memory leak from ExecutorState context?
Date
Msg-id 20230310195114.6d0c5406@karst
Whole thread Raw
In response to Re: Memory leak from ExecutorState context?  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Memory leak from ExecutorState context?  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
List pgsql-hackers
Hi,

> So I guess the best thing would be to go through these threads, see what
> the status is, restart the discussion and propose what to do. If you do
> that, I'm happy to rebase the patches, and maybe see if I could improve
> them in some way.

OK! It took me some time, but I did it. I'll try to sum up the situation as
simply as possible.

I reviewed the following threads:

* Out of Memory errors are frustrating as heck!
  2019-04-14 -> 2019-04-28
  https://www.postgresql.org/message-id/flat/bc138e9f-c89e-9147-5395-61d51a757b3b%40gusw.net

  This discussion stalled, waiting for OP, but ideas there ignited all other
  discussions.

* accounting for memory used for BufFile during hash joins
  2019-05-04 -> 2019-09-10
  https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh%40development

  This was suppose to push forward a patch discussed on previous thread, but
  it actually took over it and more ideas pops from there.

* Replace hashtable growEnable flag
  2019-05-15 -> 2019-05-16
  https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com

  This one quickly merged to the next one.

* Avoiding hash join batch explosions with extreme skew and weird stats
  2019-05-16 -> 2020-09-24
  https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com

  Another thread discussing another facet of the problem, but eventually end up
  discussing / reviewing the BNLJ implementation.

Five possible fixes/ideas were discussed all over these threads:


1. "move BufFile stuff into separate context"
   last found patch: 2019-04-21
   https://www.postgresql.org/message-id/20190421114618.z3mpgmimc3rmubi4%40development
   https://www.postgresql.org/message-id/attachment/100822/0001-move-BufFile-stuff-into-separate-context.patch

   This patch helps with observability/debug by allocating the bufFiles in the
   appropriate context instead of the "ExecutorState" one.

   I suppose this simple one has been forgotten in the fog of all other
   discussions. Also, this probably worth to be backpatched.

2. "account for size of BatchFile structure in hashJoin"
   last found patch: 2019-04-22
   https://www.postgresql.org/message-id/20190428141901.5dsbge2ka3rxmpk6%40development
   https://www.postgresql.org/message-id/attachment/100951/v2-simple-rebalance.patch

   This patch seems like a good first step:

   * it definitely helps older versions where other patches discussed are way
     too invasive to be backpatched
   * it doesn't step on the way of other discussed patches

   While looking at the discussions around this patch, I was wondering if the
   planner considers the memory allocation of bufFiles. But of course, Melanie
   already noticed that long before I was aware of this problem and discussion:

   2019-07-10: «I do think that accounting for Buffile overhead when estimating
   the size of the hashtable during ExecChooseHashTableSize() so it can be
     used during planning is a worthwhile patch by itself (though I know it
     is not even part of this patch).»
   https://www.postgresql.org/message-id/CAAKRu_Yiam-%3D06L%2BR8FR%2BVaceb-ozQzzMqRiY2pDYku1VdZ%3DEw%40mail.gmail.com

   Tomas Vondra agreed with this in his answer, but no new version of the patch
   where produced.

   Finally, Melanie was pushing the idea to commit this patch no matter other
   pending patches/ideas:

   2019-09-05: «If Tomas or someone else has time to pick up and modify BufFile
     accounting patch, committing that still seems like the nest logical
     step.»
   https://www.postgresql.org/message-id/CAAKRu_b6%2BjC93WP%2BpWxqK5KAZJC5Rmxm8uquKtEf-KQ%2B%2B1Li6Q%40mail.gmail.com

   Unless I'm wrong, no one down voted this.

3. "per slice overflow file"
   last found patch: 2019-05-08
   https://www.postgresql.org/message-id/20190508150844.rij36rtuk4lhvztw%40development
   https://www.postgresql.org/message-id/attachment/101080/v4-per-slice-overflow-file-20190508.patch

   This patch has been withdraw after an off-list discussion with Thomas Munro
   because of a missing parallel hashJoin implementation. Plus, before any
   effort started on the parallel implementation, the BNLJ idea appeared and
   seemed more appealing.

   See:
   https://www.postgresql.org/message-id/20190529145517.sj2poqmb3cr4cg6w%40development

   By the time, it still seems to have some interest despite the BNLJ patch:

   2019-07-10: «If slicing is made to work for parallel-aware hashjoin and the
   code is in a committable state (and probably has the threshold I mentioned
     above), then I think that this patch should go in.»
   https://www.postgresql.org/message-id/CAAKRu_Yiam-%3D06L%2BR8FR%2BVaceb-ozQzzMqRiY2pDYku1VdZ%3DEw%40mail.gmail.com

   But this might have been disapproved later by Tomas:

   2019-09-10: «I have to admit I kinda lost track [...] My feeling is that we
     should get the BNLJ committed first, and then maybe use some of those
     additional strategies as fallbacks (depending on which issues are still
     unsolved by the BNLJ).»
   https://www.postgresql.org/message-id/20190910134751.x64idfqj6qgt37om%40development

4. "Block Nested Loop Join"
   last found patch: 2020-08-31
   https://www.postgresql.org/message-id/CAAKRu_aLMRHX6_y%3DK5i5wBMTMQvoPMO8DT3eyCziTHjsY11cVA%40mail.gmail.com
   https://www.postgresql.org/message-id/attachment/113608/v11-0001-Implement-Adaptive-Hashjoin.patch

   Most of the discussion was consideration about the BNLJ parallel and
   semi-join implementation. Melanie put a lot of work on this. This looks like
   the most advanced patch so far and add a fair amount of complexity.

   There were some open TODOs, but Melanie was waiting for some more review and
   feedback on v11 first.

5. Only split the skewed batches
   Discussion: 2019-07-11
   https://www.postgresql.org/message-id/CA%2BTgmoYqpbzC1g%2By0bxDFkpM60Kr2fnn0hVvT-RfVWonRY2dMA%40mail.gmail.com
   https://www.postgresql.org/message-id/CAB0yremvswRAT86Afb9MZ_PaLHyY9BT313-adCHbhMJ%3Dx_GEcg%40mail.gmail.com

   Robert Haas pointed out that current implementation and discussion were not
   really responding to the skew in a very effective way. He's considering
   splitting batches unevenly. Hubert Zhang stepped in, detailed some more and
   volunteer to work on such a patch.

   No one reacted.

   It seems to me this is an interesting track to explore. This looks like a
   good complement of 2. ("account for size of BatchFile structure in hashJoin"
   patch). However, this idea probably couldn't be backpatched.

   Note that it could help with 3. as well by slicing only the remaining skewed
   values.

   Also, this could have some impact on the "Block Nested Loop Join" patch if
   the later is kept to deal with the remaining skewed batches.

> I was hoping we'd solve this by the BNL, but if we didn't get that in 4
> years, maybe we shouldn't stall and get at least an imperfect stop-gap
> solution ...

Indeed. So, to sum-up:

* Patch 1 could be rebased/applied/backpatched
* Patch 2 is worth considering to backpatch
* Patch 3 seemed withdrawn in favor of BNLJ
* Patch 4 is waiting for some more review and has some TODO
* discussion 5 worth few minutes to discuss before jumping on previous topics

1 & 2 are imperfect solution but doesn't weight much and could be backpatched.
4 & 5 are long-term solutions for a futur major version needing some more
discussions, test and reviews.
3 is not 100% buried, but a last round in the arena might settle its destiny for
good.

Hopefully this sum-up is exhaustive and will help clarify this 3-years-old
topic.

Regards,



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_usleep for multisecond delays
Next
From: Tom Lane
Date:
Subject: Re: RLS makes COPY TO process child tables