Re: generic plans and "initial" pruning - Mailing list pgsql-hackers

From Amul Sul
Subject Re: generic plans and "initial" pruning
Date
Msg-id CAAJ_b95Yf-rY0f0KQC+iWY9xzBZ74VhOgpF-sZ2CsYjLj9-xNw@mail.gmail.com
Whole thread Raw
In response to Re: generic plans and "initial" pruning  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: generic plans and "initial" pruning
List pgsql-hackers
On Fri, Dec 31, 2021 at 7:56 AM Amit Langote <amitlangote09@gmail.com> wrote:
>
> On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
>>
>> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <amitlangote09@gmail.com> wrote:
>> >
>> > Executing generic plans involving partitions is known to become slower
>> > as partition count grows due to a number of bottlenecks, with
>> > AcquireExecutorLocks() showing at the top in profiles.
>> >
>> > Previous attempt at solving that problem was by David Rowley [1],
>> > where he proposed delaying locking of *all* partitions appearing under
>> > an Append/MergeAppend until "initial" pruning is done during the
>> > executor initialization phase.  A problem with that approach that he
>> > has described in [2] is that leaving partitions unlocked can lead to
>> > race conditions where the Plan node belonging to a partition can be
>> > invalidated when a concurrent session successfully alters the
>> > partition between AcquireExecutorLocks() saying the plan is okay to
>> > execute and then actually executing it.
>> >
>> > However, using an idea that Robert suggested to me off-list a little
>> > while back, it seems possible to determine the set of partitions that
>> > we can safely skip locking.  The idea is to look at the "initial" or
>> > "pre-execution" pruning instructions contained in a given Append or
>> > MergeAppend node when AcquireExecutorLocks() is collecting the
>> > relations to lock and consider relations from only those sub-nodes
>> > that survive performing those instructions.   I've attempted
>> > implementing that idea in the attached patch.
>> >
>>
>> In which cases, we will have "pre-execution" pruning instructions that
>> can be used to skip locking partitions? Can you please give a few
>> examples where this approach will be useful?
>
>
> This is mainly to be useful for prepared queries, so something like:
>
> prepare q as select * from partitioned_table where key = $1;
>
> And that too when execute q(…) uses a generic plan. Generic plans are problematic because it must contain nodes for
allpartitions (without any plan time pruning), which means CheckCachedPlan() has to spend time proportional to the
numberof partitions to determine that the plan is still usable / has not been invalidated; most of that is
AcquireExecutorLocks().
>
> Other bottlenecks, not addressed in this patch, pertain to some executor startup/shutdown subroutines that process
therange table of a PlannedStmt in its entirety, whose length is also proportional to the number of partitions when the
planis generic. 
>
>> The benchmark is showing good results, indeed.
>
Indeed.

Here are few comments for v1 patch:

+   /* Caller error if we get here without contains_init_steps */
+   Assert(pruneinfo->contains_init_steps);

-       prunedata = prunestate->partprunedata[i];
-       pprune = &prunedata->partrelprunedata[0];

-       /* Perform pruning without using PARAM_EXEC Params */
-       find_matching_subplans_recurse(prunedata, pprune, true, &result);
+   if (parentrelids)
+       *parentrelids = NULL;

You got two blank lines after Assert.
--

+   /* Set up EState if not in the executor proper. */
+   if (estate == NULL)
+   {
+       estate = CreateExecutorState();
+       estate->es_param_list_info = params;
+       free_estate = true;
    }

... [Skip]

+   if (free_estate)
+   {
+       FreeExecutorState(estate);
+       estate = NULL;
    }

I think this work should be left to the caller.
--

    /*
     * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
     * does, except we don't need a PartitionPruneState here, so don't call
     * that function.
     *
     * XXX some refactoring might be good.
     */

+1, while doing it would be nice if foreach_current_index() is used
instead of the i & j sequence in the respective foreach() block, IMO.
--

+                   while ((i = bms_next_member(validsubplans, i)) >= 0)
+                   {
+                       Plan   *subplan = list_nth(subplans, i);
+
+                       context->relations =
+                           bms_add_members(context->relations,
+                                           get_plan_scanrelids(subplan));
+                   }

I think instead of get_plan_scanrelids() the
GetLockableRelations_worker() can be used; if so, then no need to add
get_plan_scanrelids() function.
--

     /* Nodes containing prunable subnodes. */
+       case T_MergeAppend:
+           {
+               PlannedStmt *plannedstmt = context->plannedstmt;
+               List       *rtable = plannedstmt->rtable;
+               ParamListInfo params = context->params;
+               PartitionPruneInfo *pruneinfo;
+               Bitmapset  *validsubplans;
+               Bitmapset  *parentrelids;

...
                if (pruneinfo && pruneinfo->contains_init_steps)
                {
                    int     i;
...
                   return false;
                }
            }
            break;

Most of the declarations need to be moved inside the if-block.

Also, initially, I was a bit concerned regarding this code block
inside GetLockableRelations_worker(), what if (pruneinfo &&
pruneinfo->contains_init_steps) evaluated to false? After debugging I
realized that plan_tree_walker() will do the needful -- a bit of
comment would have helped.
--

+       case T_CustomScan:
+           foreach(lc, ((CustomScan *) plan)->custom_plans)
+           {
+               if (walker((Plan *) lfirst(lc), context))
+                   return true;
+           }
+           break;

Why not plan_walk_members() call like other nodes?

Regards,
Amul



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: sqlsmith: ERROR: XX000: bogus varno: 2
Next
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes