Thread: Add mention of execution time memory for enable_partitionwise_* GUCs

Over on [1], there's a complaint about a query OOMing because the use
of enable_partitionwise_aggregate caused a plan with 1000 Hash
Aggregate nodes.

The only mention in the docs is the additional memory requirements and
CPU for query planning when that GUC is enabled. There's no mention
that execution could use work_mem * nparts more memory to be used.  I
think that's bad and we should fix it.

I've attached my proposal to fix that.

David

[1] https://postgr.es/m/3603c380-d094-136e-e333-610914fb3e80%40gmx.net

Attachment
On Thu, Jul 18, 2024 at 4:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Over on [1], there's a complaint about a query OOMing because the use
> of enable_partitionwise_aggregate caused a plan with 1000 Hash
> Aggregate nodes.
>
> The only mention in the docs is the additional memory requirements and
> CPU for query planning when that GUC is enabled. There's no mention
> that execution could use work_mem * nparts more memory to be used.  I
> think that's bad and we should fix it.
>
> I've attached my proposal to fix that.

If those GUCs are enabled, the planner consumes large amount of memory
and also takes longer irrespective of whether partitionwise plan is
used or not. That's why the default is false. If majority of those
joins use nested loop memory, or use index scans instead sorting,
memory consumption won't be as large. Saying that it "can" result in
large increase in execution memory is not accurate. But I agree that
we need to mention the effect of work_mem on partitionwise
join/aggregation.

I had an offlist email exchange with Dimitrios where I suggested that
we should mention this in the work_mem description. I.e. in work_mem
description change "Note that a complex query might perform several
sort and hash operations"
to "Note that a complex query or a query using partitionwise
aggregates or joins might perform several sort and hash operations' '.
And in the description of enable_partitionwise_* GUCs mention that
"Each of the partitionwise join or aggregation which performs
sorting/hashing may consume work_mem worth of memory increasing the
total memory consumed during query execution.

--
Best Wishes,
Ashutosh Bapat



On Thu, 18 Jul 2024 at 21:24, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> If those GUCs are enabled, the planner consumes large amount of memory
> and also takes longer irrespective of whether partitionwise plan is
> used or not. That's why the default is false. If majority of those
> joins use nested loop memory, or use index scans instead sorting,
> memory consumption won't be as large. Saying that it "can" result in
> large increase in execution memory is not accurate. But I agree that
> we need to mention the effect of work_mem on partitionwise
> join/aggregation.

hmm? please tell me what word other than "can" best describes
something that is possible to happen but does not always happen under
all circumstances.

David



On Thu, Jul 18, 2024 at 3:33 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 18 Jul 2024 at 21:24, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > If those GUCs are enabled, the planner consumes large amount of memory
> > and also takes longer irrespective of whether partitionwise plan is
> > used or not. That's why the default is false. If majority of those
> > joins use nested loop memory, or use index scans instead sorting,
> > memory consumption won't be as large. Saying that it "can" result in
> > large increase in execution memory is not accurate. But I agree that
> > we need to mention the effect of work_mem on partitionwise
> > join/aggregation.
>
> hmm? please tell me what word other than "can" best describes
> something that is possible to happen but does not always happen under
> all circumstances.

May I suggest "may"? :) [1], [2], [3].

My point is, we need to highlight the role of work_mem. So modify both
the descriptions.

[1] https://www.thesaurus.com/e/grammar/can-vs-may/
[2] https://www.britannica.com/dictionary/eb/qa/modal-verbs-may-might-can-could-and-ought
[3] https://www.merriam-webster.com/grammar/when-to-use-can-and-may

--
Best Wishes,
Ashutosh Bapat



On Thu, 18 Jul 2024 at 22:28, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 3:33 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > hmm? please tell me what word other than "can" best describes
> > something that is possible to happen but does not always happen under
> > all circumstances.
>
> May I suggest "may"? :) [1], [2], [3].

Is this a wind-up?

If it's not, I disagree that "may" is a better choice.  The
possibility example in your first link says "It may rain tomorrow.
(possibility)", but that's something someone would only say if there
was some evidence to support that, e.g. ominous clouds on the horizon
at dusk, or >0% chance of precipitation on the weather forecast.
Nobody is going to say that unless there's some supporting evidence.
For the executor using work_mem * nparts, we've no evidence either.
It's just a >0% possibility with no supporting evidence.

> My point is, we need to highlight the role of work_mem. So modify both
> the descriptions.

I considered writing about work_mem, but felt I wanted to keep it as
brief as possible and just have some words that might make someone
think twice.  The details in the work_mem documentation should inform
the reader that work_mem is per executor node.  It likely wouldn't
hurt to have more documentation around which executor node types can
use a work_mem, which use work_mem * hash_mem_multiplier and which use
neither. We tend to not write too much about executor nodes in the
documents, so I'm not proposing that for this patch.

David

> [1] https://www.thesaurus.com/e/grammar/can-vs-may/



On Thu, Jul 18, 2024 at 4:24 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 18 Jul 2024 at 22:28, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 3:33 PM David Rowley <dgrowleyml@gmail.com> wrote:
> > > hmm? please tell me what word other than "can" best describes
> > > something that is possible to happen but does not always happen under
> > > all circumstances.
> >
> > May I suggest "may"? :) [1], [2], [3].
>
> Is this a wind-up?
>
> If it's not, I disagree that "may" is a better choice.  The
> possibility example in your first link says "It may rain tomorrow.
> (possibility)", but that's something someone would only say if there
> was some evidence to support that, e.g. ominous clouds on the horizon
> at dusk, or >0% chance of precipitation on the weather forecast.
> Nobody is going to say that unless there's some supporting evidence.
> For the executor using work_mem * nparts, we've no evidence either.
> It's just a >0% possibility with no supporting evidence.

I am not a native English speaker and might have made a mistake when
interpreting the definitions. Will leave that aside.

>
> > My point is, we need to highlight the role of work_mem. So modify both
> > the descriptions.
>
> I considered writing about work_mem, but felt I wanted to keep it as
> brief as possible and just have some words that might make someone
> think twice.  The details in the work_mem documentation should inform
> the reader that work_mem is per executor node.  It likely wouldn't
> hurt to have more documentation around which executor node types can
> use a work_mem, which use work_mem * hash_mem_multiplier and which use
> neither. We tend to not write too much about executor nodes in the
> documents, so I'm not proposing that for this patch.

Something I didn't write in my first reply but wanted to discuss was
the intention of adding those GUCs. Sorry for missing it in my first
email. According to [1] these GUCs were added because of increased
memory consumption during planning and time taken to plan the query.
The execution time memory consumption issue was known even back then
but the GUC was not set to default because of that. But your patch
proposes to change the narrative. In the same thread [1], you will
find the discussion about turning the default to ON once we have fixed
planner's memory and time consumption. We have patches addressing
those issues [2] [3]. With your proposed changes we will almost never
have a chance to turn those GUCs ON by default. That seems a rather
sad prospect.

I am fine if we want to mention that the executor may consume a large
amount of memory when these GUCs are turned ON. Users may decide to
turn those OFF if they can not afford to spend that much memory during
execution. But I don't like tying execution time consumption with
default OFF.

[1] https://www.postgresql.org/message-id/CA+TgmoYggDp6k-HXNAgrykZh79w6nv2FevpYR_jeMbrORDaQrA@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAExHW5stmOUobE55pMt83r8UxvfCph%2BPvo5dNpdrVCsBgXEzDQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/CAJ2pMkZNCgoUKSE+_5LthD+KbXKvq6h2hQN8Esxpxd+cxmgomg@mail.gmail.com


--
Best Wishes,
Ashutosh Bapat



On Fri, 19 Jul 2024 at 17:24, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> According to [1] these GUCs were added because of increased
> memory consumption during planning and time taken to plan the query.
> The execution time memory consumption issue was known even back then
> but the GUC was not set to default because of that. But your patch
> proposes to change the narrative. In the same thread [1], you will
> find the discussion about turning the default to ON once we have fixed
> planner's memory and time consumption. We have patches addressing
> those issues [2] [3]. With your proposed changes we will almost never
> have a chance to turn those GUCs ON by default. That seems a rather
> sad prospect.

Sad prospect for who? If the day comes and we're considering enabling
these GUCs by default, I think we'll likely also be considering if
it'll be sad for users who get an increased likelihood of OOM kills
because the chosen plan uses $nparts times more memory to execute than
the old plan.

Can you honestly say that you have no concern about increased executor
memory consumption if we switched on these GUCs by default? I was
certainly concerned about it in [5], so I dropped that patch after
realising what could happen.

> I am fine if we want to mention that the executor may consume a large
> amount of memory when these GUCs are turned ON. Users may decide to
> turn those OFF if they can not afford to spend that much memory during
> execution. But I don't like tying execution time consumption with
> default OFF.

If you were to fix the planner issues then this text would need to be
revisited anyway. However, I seriously question your judgment on
fixing those alone being enough to allow us to switch these on by
default.  It seems unlikely that the planner would use anything near
any realistic work_mem setting extra memory per partition, but that's
what the executor would do, given enough data per partition.

I'd say any analysis that only found planner memory and time to be the
only reason to turn these GUCs off by default was incomplete analysis.
Maybe it was a case of stopping looking for all the reasons once
enough had been found to make the choice. If not, then they found the
molehill and failed to notice the mountain.

David

[4] https://postgr.es/m/3603c380-d094-136e-e333-610914fb3e80%40gmx.net
[5] https://postgr.es/m/CAApHDvojKdBR3MR59JXmaCYbyHB6Q_5qPRU+dy93En8wm+XiDA@mail.gmail.com



Re: Add mention of execution time memory for enable_partitionwise_* GUCs

From
Dimitrios Apostolou
Date:
Thank you for the patch improving the docs, I think it's a clear
improvement from before.

On Thu, 18 Jul 2024, David Rowley wrote:

> I considered writing about work_mem, but felt I wanted to keep it as
> brief as possible and just have some words that might make someone
> think twice.  The details in the work_mem documentation should inform
> the reader that work_mem is per executor node.  It likely wouldn't
> hurt to have more documentation around which executor node types can
> use a work_mem, which use work_mem * hash_mem_multiplier and which use
> neither. We tend to not write too much about executor nodes in the
> documents, so I'm not proposing that for this patch.

This is the only part I think is missing, since we now know (measurements
in [1], reproducible scenario in [2]) that the number of partitions plays
an important role in sizing the RAM of the server. It's just too big to
not mention that worst case will be n_partitions * work_mem.


[1] https://www.postgresql.org/message-id/d26e67d3-74bc-60aa-bf24-2a8fb83efe9c%40gmx.net

[2] https://www.postgresql.org/message-id/af6ed790-a5fe-19aa-1141-927595604c01%40gmx.net


I would also like to add an entry about this issue with links to the above
pages, to the TODO page at [3], as this is the only bugtracker I'm aware
of. Am I doing it right bringing it up for approval on this list?

[3] https://wiki.postgresql.org/wiki/Todo



Thanks,
Dimitris




On Fri, 19 Jul 2024 at 17:24, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> I am fine if we want to mention that the executor may consume a large
> amount of memory when these GUCs are turned ON. Users may decide to
> turn those OFF if they can not afford to spend that much memory during
> execution. But I don't like tying execution time consumption with
> default OFF.

Would the attached address your concern about the reasons for defaulting to off?

David

Attachment
On Mon, Jul 29, 2024 at 10:37 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 19 Jul 2024 at 17:24, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > I am fine if we want to mention that the executor may consume a large
> > amount of memory when these GUCs are turned ON. Users may decide to
> > turn those OFF if they can not afford to spend that much memory during
> > execution. But I don't like tying execution time consumption with
> > default OFF.
>
> Would the attached address your concern about the reasons for defaulting to off?

Thanks. This looks better. Nitpick

+        child partitions.  With this setting enabled, the number of executor
+        nodes whose memory usage is restricted by <varname>work_mem</varname>

This sentence appears to say that the memory usage of "all" nodes is
restricted by work_mem. I think what you want to convey is - nodes,
whose memory usage is subjected to <varname>work_mem</varname>
setting, ....

Or break it into two sentences

With this setting enabled, the number of executor nodes appearing in
the final plan can increase linearly proportional to the number of
partitions being scanned. Each of those nodes may use upto
<varname>work_mem</varname> memory. This can ...

I note that the work_mem documentation does not talk about executor
nodes, instead it uses the term "query operations".

--
Best Wishes,
Ashutosh Bapat



On Tue, 30 Jul 2024 at 21:12, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> Thanks. This looks better. Nitpick
>
> +        child partitions.  With this setting enabled, the number of executor
> +        nodes whose memory usage is restricted by <varname>work_mem</varname>
>
> This sentence appears to say that the memory usage of "all" nodes is
> restricted by work_mem. I think what you want to convey is - nodes,
> whose memory usage is subjected to <varname>work_mem</varname>
> setting, ....

I'm open to improving the sentence but I don't quite follow why
"subjected to" is better than "restricted by". It seems to remove
meaning without saving any words. With "restricted by" we can deduce
that the memory cannot go over work_mem, whereas "subjected to" only
gives us some indication that the memory usage somehow relates to the
work_mem setting and doesn't not mean that the memory used could be >=
work_mem.

> Or break it into two sentences
>
> With this setting enabled, the number of executor nodes appearing in
> the final plan can increase linearly proportional to the number of
> partitions being scanned. Each of those nodes may use upto
> <varname>work_mem</varname> memory. This can ...

... but that's incorrect. This means that all additional nodes that
appear in the plan as a result of enabling the setting can use up to
work_mem.  That's not the case as some of the new nodes might be
Nested Loops or Merge Joins and they're not going to use up to
work_mem.

As a compromise, I've dropped the word "executor" and it now just says
"the number of nodes whose memory usage is restricted by work_mem
appearing in the final plan".  I'd have used "plan nodes" but it seems
strange to use "plan nodes" then later "in the final plan".

> I note that the work_mem documentation does not talk about executor
> nodes, instead it uses the term "query operations".

I much prefer "nodes" to "operations".  If someone started asking me
about "query operations", I'd have to confirm what they meant. An I/O
is an operation that can occur during the execution of a query. Is
that a "query operation"? IMO, the wording you're proposing is less
concise. I'm not a fan of the terminology when talking about plan
nodes. I'd rather see the work_mem docs modified than copy what it
says.

Please have a look at: git grep -B 1 -i -F "plan node" -- doc/*

that seems like a popular term.

David

Attachment
On Tue, Jul 30, 2024 at 5:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 30 Jul 2024 at 21:12, Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
> > Thanks. This looks better. Nitpick
> >
> > +        child partitions.  With this setting enabled, the number of executor
> > +        nodes whose memory usage is restricted by <varname>work_mem</varname>
> >
> > This sentence appears to say that the memory usage of "all" nodes is
> > restricted by work_mem. I think what you want to convey is - nodes,
> > whose memory usage is subjected to <varname>work_mem</varname>
> > setting, ....
>
> I'm open to improving the sentence but I don't quite follow why
> "subjected to" is better than "restricted by". It seems to remove
> meaning without saving any words. With "restricted by" we can deduce
> that the memory cannot go over work_mem, whereas "subjected to" only
> gives us some indication that the memory usage somehow relates to the
> work_mem setting and doesn't not mean that the memory used could be >=
> work_mem.
>
> > Or break it into two sentences
> >
> > With this setting enabled, the number of executor nodes appearing in
> > the final plan can increase linearly proportional to the number of
> > partitions being scanned. Each of those nodes may use upto
> > <varname>work_mem</varname> memory. This can ...
>
> ... but that's incorrect. This means that all additional nodes that
> appear in the plan as a result of enabling the setting can use up to
> work_mem.  That's not the case as some of the new nodes might be
> Nested Loops or Merge Joins and they're not going to use up to
> work_mem.

Any wording, which indicates that "some" of those nodes "may" use upto
"work_mem" memory "each", is fine with me. If you think that your
current wording conveys that meaning, I am ok.

>
> I much prefer "nodes" to "operations".  If someone started asking me
> about "query operations", I'd have to confirm what they meant. An I/O
> is an operation that can occur during the execution of a query. Is
> that a "query operation"? IMO, the wording you're proposing is less
> concise. I'm not a fan of the terminology when talking about plan
> nodes. I'd rather see the work_mem docs modified than copy what it
> says.

We need to use consistent terms at both places. Somebody reading the
new text and then referring to work_mem description will wonder
whether "query operations" are the same thing as "executor nodes" or
not. But I get your point that query operation doesn't necessarily
mean operations performed by each executor node, especially when those
operations are not specified directly in the query. We can commit your
version and see if users find it confusing. May be they already know
that operations and nodes are interchangeable in this context.

I noticed a previous discussion about work_mem documentation [1]. But
that didn't change the first sentence in the description.

[1] https://www.postgresql.org/message-id/66590882-F48C-4A25-83E3-73792CF8C51F%40amazon.com


--
Best Wishes,
Ashutosh Bapat



On Wed, 31 Jul 2024 at 16:15, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> We can commit your
> version and see if users find it confusing.

Ok. I've now pushed the patch. Thanks for reviewing it.

David