Home > mailing lists

Re: Assert failure on running a completed portal again - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Assert failure on running a completed portal again
Date	December 11 00:14:13
Msg-id	CA+TgmoaOmbdK7vS=Wxovd0SJzPV1s1Sabbv5WaFHR_uXnjScOQ@mail.gmail.com Whole thread Raw
In response to	Re: Assert failure on running a completed portal again (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Tue, Dec 10, 2024 at 12:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Did you look at the commit message for
> > 691b8d59281b5177f16fe80858df921f77a8e955? IMHO, that's pretty clear
> > about what the goals of this were.
>
> Well, it's not very clear about what implementation limitations we
> are trying to protect.

OK, I see that I was vague about that. The core problem here is that
when we start parallel workers, we do a one-time state transfer from
the leader to the workers. If the leader makes any changes to any of
that state, or the workers do, then they're out of sync, and very bad
things will happen. To prevent that, we need a mechanism to keep any
of that state from changing while there are active workers, and that
mechanism is parallel mode. The leader enters parallel mode (via
EnterParallelMode()) before launching any workers and must only exit
parallel mode after they're all gone; the workers run in parallel mode
the whole time. Hence, we are (hopefully) guaranteed that the relevant
state can't change under us.

Hence, the most fundamental restriction here is that ExecutePlan() had
better not reach the call to ExitParallelMode() at the end of the
function while there are still workers active. Since workers don't
exit until they've run their portion of the plan to completion (except
in case of error), we need to run the plan to completion in the leader
too before reaching ExecutePlan() escapes from the loop; otherwise,
some worker could have filled up its output queue with tuples and then
blocked waiting for us to read them and we never did. In the case at
issue, if the plan was already run to completion, then there shouldn't
be any workers active any more and trying to re-execute the completed
tree should not launch any new ones, so in theory there's no problem,
but...

> Hmm.  As committed, the re-execution will happen with
> use_parallel_mode disabled, which might make that safer, or maybe it's
> less safe?  Do you think we need something closer to the upthread
> proposal to not run the executor at all during the "re-execution"?
> (I'd still be inclined to put that short-circuit into ExecutePlan
> not anything higher-level.)

...hypothetically, there could be code which doesn't enjoy being
called on the same plan tree first in parallel mode and then later in
non-parallel mode. I can't really see any good reason for such code to
actually exist. For example, ExecGather() bases its decision about
whether to launch workers on gather->num_workers > 0 &&
estate->es_use_parallel_mode, but it looks to me like all the
decisions about whether to read from workers or close down workers are
based on what workers actually got launched without further reference
to the criteria that ExecGather() used to decide whether to launch
them in the first place. That seems like the right way for things to
be coded, and I think that probably all the things are actually coded
that way. I just wanted to point out that it wasn't theoretically
impossible for this change to run into some lower-level problem.

> My recollection is that even before parallelism we had some plan node
> types that didn't cope well with being called again after they'd once
> returned EOF (ie a null tuple).  So maybe a defense against trying to
> do that would be wise.  It could probably be as simple as checking
> estate->es_finished.

It's not a crazy idea, but it might be better to fix those node types.
It might just be me, but I find the comments in the executor to be
pretty lacking and there's a good amount of boilerplate that seems to
have been copy-and-pasted around without the author totally
understanding what was happening. That never gets better if we just
paper over the bugs.

--
Robert Haas
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Masahiko Sawada
Date: 11 December, 00:05:52
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)

From: Masahiko Sawada
Date: 11 December, 00:14:24
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)

Re: Assert failure on running a completed portal again - Mailing list pgsql-hackers

Previous

Next