Thread: Re: pgsql: Don't enter parallel mode when holding interrupts.

Re: pgsql: Don't enter parallel mode when holding interrupts.

From
Robert Haas
Date:
On Wed, Sep 18, 2024 at 3:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Wed, 2024-09-18 at 02:58 +0000, Noah Misch wrote:
> > Don't enter parallel mode when holding interrupts.
> >
> > Doing so caused the leader to hang in wait_event=ParallelFinish, which
> > required an immediate shutdown to resolve.  Back-patch to v12 (all
> > supported versions).
> >
> > Francesco Degrassi
> >
> > Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
>
> Does that warrant mention on this page?
> https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

IMHO, no. This seems too low-level and too odd to mention.

TBH, I'm kind of surprised to learn that it's possible to start
executing a query while holding an LWLock. I see Tom is expressing
some doubts on the original thread, too. I wonder if we should instead
be erroring out if an LWLock is held at the start of query execution
-- or even earlier, like when we try to call a plpgsql function while
holding one. Leaving parallel query aside, what would prevent us from
attempting to reacquire the exact same LWLock that we already hold and
self-deadlocking? Or attempting to acquire some other LWLock and
deadlocking that way? I don't really feel like this is a parallel
query problem. I don't think we should be trying to run any
user-defined code while holding an LWLock, unless that code is written
in C (or C++, Rust, etc.). Trying to run procedural code at that point
doesn't seem reasonable.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Don't enter parallel mode when holding interrupts.

From
Noah Misch
Date:
On Thu, Sep 19, 2024 at 09:25:05AM -0400, Robert Haas wrote:
> On Wed, Sep 18, 2024 at 3:27 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > On Wed, 2024-09-18 at 02:58 +0000, Noah Misch wrote:
> > > Don't enter parallel mode when holding interrupts.
> > >
> > > Doing so caused the leader to hang in wait_event=ParallelFinish, which
> > > required an immediate shutdown to resolve.  Back-patch to v12 (all
> > > supported versions).
> > >
> > > Francesco Degrassi
> > >
> > > Discussion: https://postgr.es/m/CAC-SaSzHUKT=vZJ8MPxYdC_URPfax+yoA1hKTcF4ROz_Q6z0_Q@mail.gmail.com
> >
> > Does that warrant mention on this page?
> > https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html
> 
> IMHO, no. This seems too low-level and too odd to mention.

Agreed.  If I were documenting it, I would document it with the material for
writing opclasses.  It's probably too esoteric to document even there.

> TBH, I'm kind of surprised to learn that it's possible to start
> executing a query while holding an LWLock. I see Tom is expressing
> some doubts on the original thread, too. I wonder if we should instead
> be erroring out if an LWLock is held at the start of query execution
> -- or even earlier, like when we try to call a plpgsql function while
> holding one. Leaving parallel query aside, what would prevent us from
> attempting to reacquire the exact same LWLock that we already hold and
> self-deadlocking? Or attempting to acquire some other LWLock and
> deadlocking that way? I don't really feel like this is a parallel
> query problem. I don't think we should be trying to run any
> user-defined code while holding an LWLock, unless that code is written
> in C (or C++, Rust, etc.). Trying to run procedural code at that point
> doesn't seem reasonable.

Nothing prevents those lwlock deadlocks.  If you think it's worth breaking the
things folks use today (see original thread) in order to prevent that, please
do share that on the original thread.  I'm fine either way.  I think given
infinite resources across both postgresql.org and all extension maintainers, I
would do what you're thinking in v18 while in back branches, I would change
"erroring out" to "warn when assertions are enabled".  I also think it's a
low-priority bug, given the only known ways to reach it are C code or a custom
opclass.  Since resources aren't infinite, I'm inclined toward one of (a) stop
here or (b) all branches "warn when assertions are enabled" and maybe block
the plancache route discussed on the original thread.