Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators - Mailing list pgsql-hackers

From James Hunter
Subject Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators
Date
Msg-id CAJVSvF5n3_uEGW5GZSRehDuTfz7XVDohbn7tVJ+2ZnweQEVFrQ@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators  (James Hunter <james.hunter.pg@gmail.com>)
List pgsql-hackers
Attaching a new revision, which substantially reworks the previous revision --

For the previous revision, I ran into problems (exposed by CI tests)
when trying to get my "subPlan" list to work, because this means we
have two pointers into a single SubPlan, which breaks both
serialization and copyObject().

This led to a new approach. The former Patch 1 is no longer needed,
because that "subPlan" logic never worked anyway.

Now, I store the workmem info, in Lists, first on the PlannerGlobal,
then transferred to the PlannedStmt. Every [Sub]Plan that needs
working memory now gets a "workmem_id" index into these Lists. Since
it's just an index, it survives serialization and copyObject().

So, now the workmem info can now be successfully roundtripped. It also
makes it easier (and faster) for an extension to adjust workmem limits
for an entire query, since all of the query's workmem info is
available directly from the PlannedStmt -- without requiring us to
traverse the Plan + Expr trees. (My example hook/extension dropped by
a couple hundred LoC, since the previous revision, because now it can
just loop over a List, instead of needing to walk a Plan tree.)

So, now we have:

 - Patch 1: adds a workmem limit to the PlannerGlobal, inside
createplan.c, and stores the corresponding workmem_id on the Plan or
SubPlan. The List is copied from the PlannerGlobal to the PlannedStmt,
as normal. We trivially set the workmem limit inside
ExecAssignWorkMem(), called from InitPlan.

This patch is a no-op, since it just copies existing GUC values to the
workmem limit, and then applies that limit inside ExecInitNode().

 - Patch 2: copies the planner's workmem estimate to the PlannerGlobal
/ PlannedStmt, to allow an extension to set the workmem limit
intelligently (without needing to traverse to the Plan or SubPlan).

This patch is a no-op, since it just records an estimate on the
PlannerGlobal / PlannedStmt, but doesn't do anything with it (yet).

 - Patch 3: displays the workmem info we set in Patches 1 and 2, to a
new EXPLAIN (work_mem on) option. Also adds a unit test.

- Patch 4: adds a hook and extension that show how to override the
default workmem limits, to implement a query_work_mem GUC.

I think this version is pretty close to a finished design proposal:

 * top-level list(s) of workmem info;
 * Plans and SubPlans that need workmem "registering" themselves
during createplan.c;
 * exec nodes reading their workmem limits from the PlannedStmt, via
plan->workmem_id (or variants, in cases where a [Sub]Plan has multiple
data structures of *different* sizes);
 * InitPlan() calls a function or hook to fill in the actual workmem limits;
 * Workmem info copied / serialized to PQ workers, and stored in Plan
cache (but the limit is always overwritten inside InitPlan()); and
 * Hook / extension reads the workmem info and sets a sensible limit,
based on its own heuristic.

Patch 4 shows that we can pretty easily (400 lines, including
comments) propagate a per-query workmem limit to individual
[Sub]Plans' data structures, in a reasonable way.

Compared to the previous revision, this patch set:
 -  eliminates the Plan traversal in execWorkMem.c and workmem.c;
 - removes the "SubPlan" logic from setrefs.c, leaving setrefs unchanged; and
 - sets the estimate and reserves a slot for the limit, inside createplan.c.

So, now, the logic to assign workmem limits is just a for- loop in
execWorkMem.c; and it's just 2 for- loops + 1 sort, in the workmem
extension.

Questions, comments?

Thanks,
James

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Reduce TupleHashEntryData struct size by half
Next
From: John Naylor
Date:
Subject: Re: Improve CRC32C performance on SSE4.2