Thread: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7
[PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7
#0 ExecutePlan (queryDesc=0x589c390, operation=CMD_SELECT, sendTuples=true, numberTuples=2, direction=ForwardScanDirection, dest=0xe15ca0 <spi_printtupDR>) at execMain.c:1654
#1 0x000000000075edb6 in standard_ExecutorRun (queryDesc=0x589c390, direction=ForwardScanDirection, count=2, execute_once=true) at execMain.c:366
#2 0x00007f5749c9b8d8 in explain_ExecutorRun (queryDesc=0x589c390, direction=ForwardScanDirection, count=2, execute_once=true) at auto_explain.c:334
#3 0x000000000075ec25 in ExecutorRun (queryDesc=0x589c390, direction=ForwardScanDirection, count=2, execute_once=true) at execMain.c:310
#4 0x00000000007c4a48 in _SPI_pquery (queryDesc=0x589c390, fire_triggers=true, tcount=2) at spi.c:2980
#5 0x00000000007c44a9 in _SPI_execute_plan (plan=0x5878780, options=0x7ffc6ad467e0, snapshot=0x0, crosscheck_snapshot=0x0, fire_triggers=true) at spi.c:2747
#6 0x00000000007c135f in SPI_execute_plan_with_paramlist (plan=0x5878780, params=0x0, read_only=false, tcount=2) at spi.c:765
#7 0x00007f5749eb4a8b in exec_run_select (estate=0x7ffc6ad46ba0, expr=0x5892b80, maxtuples=2, portalP=0x0) at pl_exec.c:5840 <-- maxtuples = 2
#8 0x00007f5749eb46fe in exec_eval_expr (estate=0x7ffc6ad46ba0, expr=0x5892b80, isNull=0x7ffc6ad46bc0, rettype=0x7ffc6ad46bc4, rettypmod=0x7ffc6ad468e8) at pl_exec.c:5734
Consider the following simple repro –
postgres=# create table test_tab(a int);
CREATE TABLE
postgres=# insert into test_tab (a) SELECT generate_series(1, 1000000);
INSERT 0 1000000
postgres=# analyse test_tab;
ANALYZE
postgres=# create function test_plpgsql() returns int
language plpgsql
as
$$
begin
return (select count(*) from test_tab where a between 5.0 and 999999.0);
end;
$$;
postgres=# LOAD 'auto_explain';
LOAD
postgres=# SET auto_explain.log_min_duration = 0;
SET
postgres=# SET auto_explain.log_analyze = true;
SET
postgres=# SET auto_explain.log_nested_statements = true;
SET
postgres=# select test_plpgsql();
test_plpgsql
--------------
999995
(1 row)
Plan logged in logfile -
Query Text: (select count(*) from test_tab where a between 5.0 and 999999.0)
Result (cost=13763.77..13763.78 rows=1 width=8) (actual time=912.108..912.110 rows=1 loops=1)
InitPlan 1
-> Finalize Aggregate (cost=13763.76..13763.77 rows=1 width=8) (actual time=912.103..912.104 rows=1 loops=1)
-> Gather (cost=13763.54..13763.75 rows=2 width=8) (actual time=912.096..912.098 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 0
-> Partial Aggregate (cost=12763.54..12763.55 rows=1 width=8) (actual time=912.095..912.096 rows=1 loops=1)
-> Parallel Seq Scan on test_tab (cost=0.00..12758.33 rows=2083 width=0) (actual time=0.022..812.253 rows=999995 loops=1)
Filter: (((a)::numeric >= 5.0) AND ((a)::numeric <= 999999.0))
Rows Removed by Filter: 5
Patch to fix this issue is attached. Proposed fix should not cause any regression because the number of returned rows is anyway being checked later inside exec_eval_expr(…).
Plan logged after fix –
Query Text: (select count(*) from test_tab where a between 5.0 and 999999.0)
Result (cost=13763.77..13763.78 rows=1 width=8) (actual time=324.397..328.007 rows=1.00 loops=1)
InitPlan 1
-> Finalize Aggregate (cost=13763.76..13763.77 rows=1 width=8) (actual time=324.391..327.999 rows=1.00 loops=1)
-> Gather (cost=13763.54..13763.75 rows=2 width=8) (actual time=324.052..327.989 rows=3.00 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=12763.54..12763.55 rows=1 width=8) (actual time=320.254..320.255 rows=1.00 loops=3)
-> Parallel Seq Scan on test_tab (cost=0.00..12758.33 rows=2083 width=0) (actual time=0.029..286.410 rows=333331.67 loops=3)
Filter: (((a)::numeric >= 5.0) AND ((a)::numeric <= 999999.0))
Rows Removed by Filter: 2
Thanks & Regards,
Dipesh
Attachment
Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7
On Mon, May 5, 2025 at 11:19 AM DIPESH DHAMELIYA <dipeshdhameliya125@gmail.com> wrote: > > Hello everyone, > > With the commit 556f7b7bc18d34ddec45392965c3b3038206bb62, Any plpgsql function that returns scalar value would not be ableto use parallelism to evaluate a return statement. It will not be considered for parallel execution because we are passingmaxtuples = 2 to exec_run_select from exec_eval_expr to evaluate the return expression of the function. > I could not find commit '556f7b7bc18d34ddec45392965c3b3038206bb62' in git log on the master branch, but here is my analysis after looking at your patch. I don't think we can remove the 'maxtuples' parameter from exec_run_select(). In this particular case, the query itself is returning a single tuple, so we are good. Still, in other cases where the query returns more tuples, it makes sense to stop the execution as soon as we have got enough tuples otherwise, it will do the execution until we produce all the tuples. Consider the below example where we just need to use the first tuple, but if we apply your patch, the executor will end up processing all the tuples, and it will impact the performance. So IMHO, the benefit you get by enabling a parallelism in some cases may hurt badly in other cases, as you will end up processing more tuples than required. CREATE OR REPLACE FUNCTION get_first_user_email() RETURNS TEXT AS $$ DECLARE user_email TEXT; BEGIN user_email = (SELECT email FROM users); RETURN user_email; END; $$ LANGUAGE plpgsql; -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Allow parallelism for plpgsql return expression after commit 556f7b7
> On Tue, May 20, 2025 at 11:57 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, May 5, 2025 at 11:19 AM DIPESH DHAMELIYA > <dipeshdhameliya125@gmail.com> wrote: > > > > Hello everyone, > > > > With the commit 556f7b7bc18d34ddec45392965c3b3038206bb62, Any plpgsql function that returns scalar value would not beable to use parallelism to evaluate a return statement. It will not be considered for parallel execution because we arepassing maxtuples = 2 to exec_run_select from exec_eval_expr to evaluate the return expression of the function. > > > I could not find commit '556f7b7bc18d34ddec45392965c3b3038206bb62' in > git log on the master branch, but here is my analysis after looking at > your patch. Here is the github link to commit - https://github.com/postgres/postgres/commit/556f7b7bc18d34ddec45392965c3b3038206bb62 and discussion - https://www.postgresql.org/message-id/flat/20241206062549.710dc01cf91224809dd6c0e1%40sraoss.co.jp > > I don't think we can remove the 'maxtuples' parameter from > exec_run_select(). In this particular case, the query itself is > returning a single tuple, so we are good. Still, in other cases where > the query returns more tuples, it makes sense to stop the execution as > soon as we have got enough tuples otherwise, it will do the execution > until we produce all the tuples. Consider the below example where we > just need to use the first tuple, but if we apply your patch, the > executor will end up processing all the tuples, and it will impact the > performance. So IMHO, the benefit you get by enabling a parallelism > in some cases may hurt badly in other cases, as you will end up > processing more tuples than required. > > CREATE OR REPLACE FUNCTION get_first_user_email() > RETURNS TEXT AS $$ > DECLARE > user_email TEXT; > BEGIN > user_email = (SELECT email FROM users); > RETURN user_email; > END; > $$ LANGUAGE plpgsql; > I understand but aren't we blocking parallelism for genuine cases with a very complex query where parallelism can help to some extent to improve execution time? Users can always rewrite a query (for example using TOP clause) if they are expecting one tuple to be returned.