Re: initial pruning in parallel append - Mailing list pgsql-hackers

From Amit Langote
Subject Re: initial pruning in parallel append
Date
Msg-id CA+HiwqEUR6XTaSfbmU50XgZXPy2GXgrJ-juFuLFTTTY=ZoBXrQ@mail.gmail.com
Whole thread Raw
In response to Re: initial pruning in parallel append  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: initial pruning in parallel append
List pgsql-hackers
On Tue, Aug 8, 2023 at 11:16 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Aug 8, 2023 at 2:58 AM Amit Langote <amitlangote09@gmail.com> wrote:
> > Or we could consider something like the patch I mentioned in my 1st
> > email.  The idea there was to pass the pruning result via a separate
> > channel, not the DSM chunk linked into the PlanState tree.  To wit, on
> > the leader side, ExecInitParallelPlan() puts the serialized
> > List-of-Bitmapset into the shm_toc with a dedicated PARALLEL_KEY,
> > alongside PlannedStmt, ParamListInfo, etc.  The List-of-Bitmpaset is
> > initialized during the leader's ExecInitNode().  On the worker side,
> > ExecParallelGetQueryDesc() reads the List-of-Bitmapset string and puts
> > the resulting node into the QueryDesc, that ParallelQueryMain() then
> > uses to do ExecutorStart() which copies the pointer to
> > EState.es_part_prune_results.  ExecInitAppend() consults
> > EState.es_part_prune_results and uses the Bitmapset from there, if
> > present, instead of performing initial pruning.
>
> This also seems reasonable.
>
> > I'm assuming it's not
> > too ugly if ExecInitAppend() uses IsParallelWorker() to decide whether
> > it should be writing to EState.es_part_prune_results or reading from
> > it -- the former if in the leader and the latter in a worker.
>
> I don't think that's too ugly. I mean you have to have an if statement
> someplace.

Yes, that makes sense.

It's just that I thought maybe I haven't thought hard enough about
options before adding a new IsParallelWorker(), because I don't find
too many instances of IsParallelWorker() in the generic executor code.
I think that's because most parallel worker-specific logic lives in
execParallel.c or in Exec*Worker() functions outside that file, so the
generic code remains parallel query agnostic as much as possible.

> > If we
> > are to go with this approach we will need to un-revert ec386948948c,
> > which moved PartitionPruneInfo nodes out of Append/MergeAppend nodes
> > to a List in PlannedStmt (copied into EState.es_part_prune_infos),
> > such that es_part_prune_results mirrors es_part_prune_infos.
>
> The comment for the revert (which was
> 5472743d9e8583638a897b47558066167cc14583) points to
> https://www.postgresql.org/message-id/4191508.1674157166@sss.pgh.pa.us
> as the reason, but it's not very clear to me why that email led to
> this being reverted. In any event, I agree that if we go with your
> idea to pass this via a separate PARALLEL_KEY, unreverting that patch
> seems to make sense, because otherwise I think we don't have a fast
> way to find the nodes that contain the state that we care about.

OK, I've attached the unreverted patch that adds
EState.es_part_prune_infos as 0001.

0002 adds EState.es_part_prune_results.   Parallel query leader stores
the bitmapset of initially valid subplans by performing initial
pruning steps contained in a given PartitionPruneInfo into that list
at the same index as the PartitionPruneInfo's index in
es_part_prune_infos.  ExecInitParallelPlan() serializes
es_part_prune_results and stores it in the DSM.  A worker initializes
es_part_prune_results in its own EState by reading the leader's value
from the DSM and for each PartitionPruneInfo in its own copy of
EState.es_part_prune_infos, gets the set of initially valid subplans
by referring to es_part_prune_results in lieu of performing initial
pruning again.

Should workers, as Tom says, instead do the pruning and cross-check
the result to give an error if it doesn't match the leader's?  The
error message can't specifically point out which, though a user would
at least know that they have functions in their database with wrong
volatility markings.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: logical decoding issue with concurrent ALTER TYPE
Next
From: David Rowley
Date:
Subject: Re: [PoC] Reducing planning time when tables have many partitions