Re: Add the ability to limit the amount of memory that can be allocated to backends. - Mailing list pgsql-hackers

From James Hunter
Subject Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date
Msg-id CAJVSvF6i_1Em6VPZ9po5wyTubGwifvfNFLrOYrdgT-e1GmR5Fw@mail.gmail.com
Whole thread Raw
In response to Re: Add the ability to limit the amount of memory that can be allocated to backends.  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Mon, Dec 30, 2024 at 2:56 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 31 Dec 2024 at 10:11, James Hunter <james.hunter.pg@gmail.com> wrote:
> > Does PostgreSQL currently rescan Hash Joins when they are "no longer
> > needed," to free work_mem early? If so, then I would try to reuse this
> > existing logic to decide which nodes need work_mem concurrently.
> >
> > If not, then all nodes that use work_mem actually use it
> > "concurrently," because we don't free that work_mem until we call
> > ExecutorEnd().
>
> The problem with that statement is that Hash Join isn't the only node
> type that uses work_mem. Plenty of other node types do.  Have a look
> at MultiExecBitmapOr(). You can see logic there that does tbm_free()
> after the tbm_union() call. Going by that, it seems there is at least
> one place where we might free some work_mem memory before allocating
> another lot.

OK, then as a first approximation, we can assume that all nodes that
use work_mem, use it concurrently. This assumption seems to hold for
Hash Join and [Hash] Agg. PostgreSQL already considers the hash
operations "more important," in some sense, than other operators that
use work_mem -- this is why they get to multiply by
hash_mem_multiplier (defaulted to 2.0).

I think the above approximation answers Tomas's first concern. For
now, we can assume that the amount of work memory used by a query is
the sum of the individual work memories used by all of its operators.
(In the future, we can be more precise, for operators that free their
work memory as soon as they have no more rows to produce.)

Dividing backend_work_mem up and distributing it to the query's
operators would, I think, work something like the following, using a
two-phase algorithm:

1. Planner records the nbytes it estimated, for a given Path, on that
Path. However, it would continue to make its costing (and
partitioning, for parallel Hash Join) decisions, same as before, using
"work_mem [* hash_mem_multiplier]". (So "work_mem" behaves sort of
like a hint to the optimizer.)

2. At runtime, executor checks backend_work_mem. It sums up all the
nbytes fields from all the query's Paths, and compares to
backend_work_mem. And then it sets each node's work_mem field to some
fraction of the total backend_work_mem, based on nbytes and work_mem.

3. Each node then runs exactly as it does now, except instead of
checking the global work_mem [* hash_mem_multiplier] GUCs, it checks
its own work_mem field.

The "proportion" chosen in (2) should probably take into account
nbytes, work_mem, hash_mem_multiplier, etc. I think we'd want it to
overprovision work memory for nodes/paths with small nbytes, to
mitigate error in planner estimates.

James



pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition
Next
From: James Hunter
Date:
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.