Re: generic plans and "initial" pruning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: generic plans and "initial" pruning |
Date | |
Msg-id | CA+HiwqH8N-SxEB6SysEBsYNgV_KJs66k9Z2SNmqVzbBP-60yWg@mail.gmail.com Whole thread Raw |
In response to | Re: generic plans and "initial" pruning (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: generic plans and "initial" pruning
|
List | pgsql-hackers |
Hi Tomas, On Mon, Dec 2, 2024 at 3:36 AM Tomas Vondra <tomas@vondra.me> wrote: > Hi, > > I took a look at this patch, mostly to familiarize myself with the > pruning etc. I have a bunch of comments, but all of that is minor, > perhaps even nitpicking - with prior feedback from David, Tom and > Robert, I can't really compete with that. Thanks for looking at this. These are helpful. > FWIW the patch needs a rebase, there's a minor bitrot - but it was > simply enough to fix for a review / testing. > > > 0001 > ---- > > 1) But if we don't expect this error to actually happen, do we really > need to make it ereport()? Maybe it should be plain elog(). I mean, it's > "can't happen" and thus doesn't need translations etc. > > if (!bms_equal(relids, pruneinfo->relids)) > ereport(ERROR, > errcode(ERRCODE_INTERNAL_ERROR), > errmsg_internal("mismatching PartitionPruneInfo found at > part_prune_index %d", > part_prune_index), > errdetail_internal("plan node relids %s, pruneinfo > relids %s", > bmsToString(relids), > bmsToString(pruneinfo->relids))); I'm fine with elog() here even if it causes the message to be longer: elog(ERROR, "mismatching PartitionPruneInfo found at part_prune_index %d (plan node relids %s, pruneinfo relids %s) > Perhaps it should even be an assert? I am not sure about that. Having a message handy might be good if a user ends up hitting this case for whatever reason, like trying to run a corrupted plan. > 2) unnecessary newline added to execPartition.h Perhaps you meant "removed". Fixed. > 3) this comment in EState doesn't seem very helpful > > List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */ Agreed, fixed to be like the comment for es_rteperminfos: List *es_part_prune_infos; /* List of PartitionPruneInfo */ > 5) PlannerGlobal > > /* List of PartitionPruneInfo contained in the plan */ > List *partPruneInfos; > > Why does this say "contained in the plan" unlike the other fields? Is > there some sort of difference? I'm not saying it's wrong. Ok, maybe the following is a bit more helpful and like the comment for other fields: /* "flat" list of PartitionPruneInfos */ List *partPruneInfos; > 0002 > ---- > > 1) Isn't it weird/undesirable partkey_datum_from_expr() loses some of > the asserts? Would the assert be incorrect in the new implementation, or > are we removing it simply because we happen to not have one of the fields? The former -- the asserts would be incorrect in the new implementation -- because in the new implementation a standalone ExprContext is used that is independent of the parent PlanState (when available) for both types of runtime pruning. The old asserts, particularly the second one, weren't asserting something very useful anyway, IMO. What I mean is that the ExprContext provided in the PartitionPruneContext to be the same as the parent PlanState's ps_ExprContext isn't critical to the code that follows. Nor whether the PlanState is available or not. > 2) inconsistent spelling: run-time vs. runtime I assume you meant in this comment: * estate The EState for the query doing runtime pruning Fixed by using run-time, which is a more commonly used term in the source code than runtime. > 3) PartitionPruneContext.is_valid - I think I'd rename the flag to > "initialized" or something like that. The "is_valid" is a bit confusing, > because it might seem the context can get invalidated later, but AFAICS > that's not the case - we just initialize it lazily. Agree that "initialized" is better, so renamed. > 0003 > ---- > > 1) In InitPlan I'd move > > estate->es_part_prune_infos = plannedstmt->partPruneInfos; > > before the comment, which is more about ExecDoInitialPruning. Makes sense, done. > 2) I'm not quite sure what "exec" partition pruning is? > > /* > * ExecInitPartitionPruning > * Initialize the data structures needed for runtime "exec" partition > * pruning and return the result of initial pruning, if available. > > Is that the same thing as "runtime pruning"? "Exec" pruning refers to pruning performed during execution, using PARAM_EXEC parameters. In contrast, "init" pruning occurs during plan initialization, using parameters whose values remain constant during execution, such as PARAM_EXTERN parameters and stable functions. Before this patch, the ExecInitPartitionPruning function, called during ExecutorStart(), performed "init" pruning and set up state in the PartitionPruneState for subsequent "exec" pruning during ExecutorRun(). With this patch, "init" pruning is performed well before this function is called, leaving its sole responsibility to setting up the state for "exec" pruning. It may be worth renaming the function to better reflect this new role, rather than updating only the comment. Actually, that is what I decided to do in the attached, along with some other adjustments like moving ExecDoInitialPruning() to execPartition.c from execMain.c, fixing up some obsolete comments, etc. > 0004 > ---- > > 1) typo: paraller/parallel Oops, fixed. > 2) What about adding an assert to ExecFindMatchingSubPlans, to check > valisubplan_rtis is not NULL? It's just mentioned in a comment, but > better to explicitly enforce that? Good idea, done. > > 2) It may not be quite clear why ExecInitUpdateProjection() switches to > mt_updateColnosLists. Should that be explained in a comment, somewhere? There is a comment in the ModifyTableState struct definition: /* * List of valid updateColnosLists. Contains only those belonging to * unpruned relations from ModifyTable.updateColnosLists. */ List *mt_updateColnosLists; It seems redundant to reiterate this in ExecInitUpdateProjection(). > 3) unnecessary newline in ExecLookupResultRelByOid Removed. > 0005 > ---- > > 1) auto_explain.c - So what happens if the plan gets invalidated? The > hook explain_ExecutorStart returns early, but then what? Does that break > the user session somehow, or what? It will get called again after ExecutorStartExt() loops back to do ExecutorStart() with a new updated plan tree. > 2) Isn't it a bit fragile if this requires every extension to update > and add the ExecPlanStillValid() calls to various places? The ExecPlanStillValid() call only needs to be added immediately after the call to standard_ExecutorStart() in an extension's ExecutorStart_hook() implementation. > What if an > extension doesn't do that? What weirdness will happen? The QueryDesc.planstate won't contain a PlanState tree for starters and other state information that InitPlan() populates in EState based on the PlannedStmt. > Maybe it'd be > possible to at least check this in some other executor hook? Or at least > we could ensure the check was done in assert-enabled builds? Or > something to make extension authors aware of this? I've added a note in the commit message, but if that's not enough, one idea might be to change the return type of ExecutorStart_hook so that the extensions that implement it are forced to be adjusted. Say, from void to bool to indicate whether standard_ExecutorStart() succeeded and thus created a "valid" plan. I had that in the previous versions of the patch. Thoughts? > Aside from going through the patches, I did a simple benchmark to see > how this works in practice. I did a simple test, with pgbench -S and > variable number of partitions/clients. I also varied the number of locks > per transaction, because I was wondering if it may interact with the > fast-path improvements. See the attached xeon.sh script and CSV with > results from the 44/88-core machine. > > There's also two PDFs visualizing the results, to show the impact as a > difference between "master" (no patches) vs. "pruning" build with v57 > applied. As usual, "green" is good (faster), read is "bad" (slower). > > For most combinations of parameters, there's no impact on throughput. > Anything in 99-101% is just regular noise, possibly even more. I'm > trying to reduce the noise a bit more, but this seems acceptable. I'd > like to discuss three "cases" I see in the results: Thanks for doing these benchmarks. I'll reply separately to discuss the individual cases. > costing / auto mode > ------------------- > > Anyway, this leads me to a related question - not quite a "bug" in the > patch, but something to perhaps think about. And that's costing, and > what "auto" should do. > > There are two PNG charts, showing throughput for runs with -M prepared > and 1000 partitions. Each chart shows throughput for the three cache > modes, and different client counts. There's a clear distinction between > "master" and "patched" runs - the "generic" plans performed terribly, by > orders of magnitude. With the patches it beats the "custom" plans. > > Which is great! But it also means that while "auto" used to do the right > thing, with the patches that's not the case. > > AFAIK that's because we don't consider the runtime pruning when costing > the plans, so the cost is calculated as if no pruning happened. And so > it seems way more expensive than it should ... and it loses with the > custom scans. Is that correct, or do I understand this wrong? That's correct. The planner does not consider runtime pruning when assigning costs to Append or MergeAppend paths in create_{merge}append_path(). > Just to be clear, I'm not claiming the patch has to deal with this. I > suppose it can be handled as a future improvement, and I'm not even sure > there's a good way to consider this during costing. For example, can we > estimate how many partitions will be pruned? There have been discussions about this in the 2017 development thread of run-time pruning [1] and likely at some later point in other threads. One simple approach mentioned at [1] is to consider that only 1 partition will be scanned for queries containing WHERE partkey = $1, because only 1 partition can contain matching rows with that condition. I agree that this should be dealt with sooner than later so users get generic plans even without having to use force_generic_plan. I'll post the updated patches tomorrow. -- Thanks, Amit Langote [1] https://www.postgresql.org/message-id/CA%2BTgmoZv8sd9cKyYtHwmd_13%2BBAjkVKo%3DECe7G98tBK5Ejwatw%40mail.gmail.com
pgsql-hackers by date: