Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan - Mailing list pgsql-bugs

From Pavel Stehule
Subject Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan
Date
Msg-id CAFj8pRBfi7fNtRLsAsCo0Y-mm+X6Y=9Z1ce8bocbWP5MQ65irQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs


ne 22. 3. 2020 v 4:23 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
PG Bug reporting form <noreply@postgresql.org> writes:
> [ $SUBJECT ]

I got around to looking at this today, and what I find is that the
problem is that exec_stmt_return_query() uses a portal (i.e. a cursor)
to read the results of the query.  That seemed like a good idea, back
in the late bronze age, because it allowed plpgsql to fetch the query
results a few rows at a time and not risk blowing out memory with a huge
SPI result.  However, the parallel-query infrastructure refuses to
parallelize when the query is being read via a cursor.

I think that the latter restriction is probably sane, because we don't
want to suspend execution of a parallel query while we've got worker
processes waiting.  And there might be some implementation restrictions
lurking under it too --- that's not a part of the code I know in any
detail.

However, there's no fundamental reason why exec_stmt_return_query has
to use a cursor.  It's going to run the query to completion immediately
anyway, and shove all the result rows into a tuplestore.  What we lack
is a way to get the SPI query to pass its results directly to a
tuplestore, without the SPITupleTable intermediary.  (Note that the
tuplestore can spill a large result to disk, whereas SPITupleTable
can't do that.)

So, attached is a draft patch to enable that.  By getting rid of the
intermediate SPITupleTable, this should improve the performance of
RETURN QUERY somewhat even without considering the possibility of
parallelizing the source query.  I've not tried to measure that though.
I've also not looked for other places that could use this new
infrastructure, but there may well be some.

One thing I'm not totally pleased about with this is adding another
SPI interface routine using the old parameter-values API (that is,
null flags as char ' '/'n').  That was the path of least resistance
given the other moving parts in pl_exec.c and spi.c, but maybe we
should try to modernize that before we set it in stone.

Another thing standing between this patch and committability is suitable
additions to the SPI documentation.  But I saw no value in writing that
before the previous point is settled.

I will go add this to the next commitfest (for v14), but I wonder
if we should try to squeeze it into v13?  This isn't the only
complaint we've gotten about non-parallelizability of RETURN QUERY.

+1

Pavel


                        regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16040: PL/PGSQL RETURN QUERY statement never uses a parallel plan
Next
From: Andres Freund
Date:
Subject: Re: BUG #16293: postgres segfaults and returns SQLSTATE 08006