Thread: Explain says 8 workers planned, only 1 executed
Hi all,
I have a long running query that I have tweaked along with config (e.g. min_parallel_table_scan_size) to execute nicely and very fast in parallel which works as expected executed directly from psql client. The query is then embedded in a psql function like "return query select * from function_that_executes_the_query()".
I am checking the explain output (using execute explain $query) just before executing inside my function and it the plan is identical to what I would expect, planning 8 workers. However, this query actually only uses 1 worker and takes many times longer than when ran directly on the psql command line with the same server configuration parameters.
Why would the explain output be different from the executed plan? Is this a limitation of plpgsql functions? Is there any way to debug this further?
If it is meaningful during parallel execution I notice lots of "postgres: parallel worker" proceses in top and when executing from my function just a single "postgres: $user $db $host($pid) SELECT" processes.
Best regards,
Alastair
On 3/21/20 10:25 AM, Alastair McKinley wrote: > Hi all, > > I have a long running query that I have tweaked along with config (e.g. > min_parallel_table_scan_size) to execute nicely and very fast in > parallel which works as expected executed directly from psql client. > The query is then embedded in a psql function like "return query select > * from function_that_executes_the_query()". Postgres version? What is happening in function_that_executes_the_query()? You might want to take a look at below to see any of the conditions apply: https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html > > I am checking the explain output (using execute explain $query) just > before executing inside my function and it the plan is identical to what > I would expect, planning 8 workers. However, this query actually only > uses 1 worker and takes many times longer than when ran directly on the > psql command line with the same server configuration parameters. > > Why would the explain output be different from the executed plan? Is > this a limitation of plpgsql functions? Is there any way to debug this > further? > > If it is meaningful during parallel execution I notice lots of > "postgres: parallel worker" proceses in top and when executing from my > function just a single "postgres: $user $db $host($pid) SELECT" processes. > > Best regards, > > Alastair > > -- Adrian Klaver adrian.klaver@aklaver.com
Hi Adrian,
Thanks for getting back to me.
Postgres version is:
PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
I simplified it to the following structure:
create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
select generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalone
for output in execute('explain ' || q) loop
raise notice '%',output; -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this launches one worker
select generate_query2() into q;
for output in execute('explain ' || q) loop
raise notice '%',output; -- this plan says 8 workers will be launched
end loop;
return query execute q; -- this also launches one worker
end;
language plpgsql;
Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions? Does the outer function need to be parallel safe?
Might a stored proc work better?
Best regards,
Alastair
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".
Postgres version?
What is happening in function_that_executes_the_query()?
You might want to take a look at below to see any of the conditions apply:
https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html
>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers. However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".
Postgres version?
What is happening in function_that_executes_the_query()?
You might want to take a look at below to see any of the conditions apply:
https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html
>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers. However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Unfortunately, return query will never use parallel workers. See: https://stackoverflow.com/q/58079898/895640 and https://www.postgresql.org/message-id/16040-eaacad11fecfb198@postgresql.org
On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
Hi Adrian,Thanks for getting back to me.Postgres version is:PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bitI simplified it to the following structure:create function f() returns setof my_type as$$declareq text;output text;beginselect generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalonefor output in execute('explain ' || q) loopraise notice '%',output; -- this plan says 8 workers will be launchedend loop;return query execute q; -- this launches one workerselect generate_query2() into q;for output in execute('explain ' || q) loopraise notice '%',output; -- this plan says 8 workers will be launchedend loop;return query execute q; -- this also launches one workerend;language plpgsql;Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions? Does the outer function need to be parallel safe?Might a stored proc work better?Best regards,AlastairFrom: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executedOn 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".
Postgres version?
What is happening in function_that_executes_the_query()?
You might want to take a look at below to see any of the conditions apply:
https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html
>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers. However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Hi Jeremy,
Thanks for solving the mystery. I think this might be a missing point in section 15.2 in the docs.
I wonder will this ever be improved or should I just write to temporary tables instead of return query?
Best regards,
Alastair
From: Jeremy Smith <jeremy@musicsmith.net>
Sent: 21 March 2020 20:50
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
Sent: 21 March 2020 20:50
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
Unfortunately, return query will never use parallel workers. See: https://stackoverflow.com/q/58079898/895640 and https://www.postgresql.org/message-id/16040-eaacad11fecfb198@postgresql.org
On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
Hi Adrian,Thanks for getting back to me.Postgres version is:PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bitI simplified it to the following structure:create function f() returns setof my_type as$$declareq text;output text;beginselect generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalonefor output in execute('explain ' || q) loopraise notice '%',output; -- this plan says 8 workers will be launchedend loop;return query execute q; -- this launches one workerselect generate_query2() into q;for output in execute('explain ' || q) loopraise notice '%',output; -- this plan says 8 workers will be launchedend loop;return query execute q; -- this also launches one workerend;language plpgsql;Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions? Does the outer function need to be parallel safe?Might a stored proc work better?Best regards,AlastairFrom: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executedOn 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client.
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".
Postgres version?
What is happening in function_that_executes_the_query()?
You might want to take a look at below to see any of the conditions apply:
https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html
>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers. However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Alastair McKinley <a.mckinley@analyticsengines.com> writes: > Thanks for solving the mystery. I think this might be a missing point in section 15.2 in the docs. > I wonder will this ever be improved or should I just write to temporary tables instead of return query? I just posted a patch to improve that [1], but it's not something we'd be likely to back-patch into existing releases. regards, tom lane [1] https://www.postgresql.org/message-id/1741.1584847383%40sss.pgh.pa.us