Thread: Explain says 8 workers planned, only 1 executed

Explain says 8 workers planned, only 1 executed

From
Alastair McKinley
Date:
Hi all,

I have a long running query that I have tweaked along with config (e.g. min_parallel_table_scan_size) to execute nicely and very fast in parallel which works as expected executed directly from psql client.  The query is then embedded in a psql function like "return query select * from function_that_executes_the_query()".

I am checking the explain output (using execute explain $query) just before executing inside my function and it the plan is identical to what I would expect, planning 8 workers.  However, this query actually only uses 1 worker and takes many times longer than when ran directly on the psql command line with the same server configuration parameters.

Why would the explain output be different from the executed plan? Is this a limitation of plpgsql functions? Is there any way to debug this further?

If it is meaningful during parallel execution I notice lots of "postgres: parallel worker" proceses in top and when executing from my function just a single "postgres: $user $db $host($pid) SELECT" processes.

Best regards,

Alastair


Re: Explain says 8 workers planned, only 1 executed

From
Adrian Klaver
Date:
On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
> 
> I have a long running query that I have tweaked along with config (e.g. 
> min_parallel_table_scan_size) to execute nicely and very fast in 
> parallel which works as expected executed directly from psql client.  
> The query is then embedded in a psql function like "return query select 
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

> 
> I am checking the explain output (using execute explain $query) just 
> before executing inside my function and it the plan is identical to what 
> I would expect, planning 8 workers.  However, this query actually only 
> uses 1 worker and takes many times longer than when ran directly on the 
> psql command line with the same server configuration parameters.
> 
> Why would the explain output be different from the executed plan? Is 
> this a limitation of plpgsql functions? Is there any way to debug this 
> further?
> 
> If it is meaningful during parallel execution I notice lots of 
> "postgres: parallel worker" proceses in top and when executing from my 
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
> 
> Best regards,
> 
> Alastair
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Explain says 8 workers planned, only 1 executed

From
Alastair McKinley
Date:
Hi Adrian,

Thanks for getting back to me.

Postgres version is:

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

I simplified it to the following structure:

create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
    select generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalone
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this launches one worker

   select generate_query2() into q;
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this also launches one worker
end;
language plpgsql;

Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions?  Does the outer function need to be parallel safe?
Might a stored proc work better?

Best regards,

Alastair



From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
 
On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client. 
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers.  However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Explain says 8 workers planned, only 1 executed

From
Jeremy Smith
Date:

On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
Hi Adrian,

Thanks for getting back to me.

Postgres version is:

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

I simplified it to the following structure:

create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
    select generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalone
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this launches one worker

   select generate_query2() into q;
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this also launches one worker
end;
language plpgsql;

Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions?  Does the outer function need to be parallel safe?
Might a stored proc work better?

Best regards,

Alastair



From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
 
On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client. 
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers.  However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Explain says 8 workers planned, only 1 executed

From
Alastair McKinley
Date:
Hi Jeremy,

Thanks for solving the mystery.  I think this might be a missing point in section 15.2 in the docs.

I wonder will this ever be improved or should I just write to temporary tables instead of return query?

Best regards,

Alastair

From: Jeremy Smith <jeremy@musicsmith.net>
Sent: 21 March 2020 20:50
To: Alastair McKinley <a.mckinley@analyticsengines.com>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
 

On Sat, Mar 21, 2020 at 1:59 PM Alastair McKinley <a.mckinley@analyticsengines.com> wrote:
Hi Adrian,

Thanks for getting back to me.

Postgres version is:

PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

I simplified it to the following structure:

create function f() returns setof my_type as
$$
declare
q text;
output text;
begin
    select generate_query1() into q; -- the query q executes in parallel with 8 workers if executed standalone
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this launches one worker

   select generate_query2() into q;
    for output in execute('explain ' || q) loop
        raise notice '%',output;  -- this plan says 8 workers will be launched
    end loop;
    return query execute q; -- this also launches one worker
end;
language plpgsql;

Should this work in principle or am I missing something subtle about parallel dynamic queries in plpgsql functions?  Does the outer function need to be parallel safe?
Might a stored proc work better?

Best regards,

Alastair



From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: 21 March 2020 17:38
To: Alastair McKinley <a.mckinley@analyticsengines.com>; pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Explain says 8 workers planned, only 1 executed
 
On 3/21/20 10:25 AM, Alastair McKinley wrote:
> Hi all,
>
> I have a long running query that I have tweaked along with config (e.g.
> min_parallel_table_scan_size) to execute nicely and very fast in
> parallel which works as expected executed directly from psql client. 
> The query is then embedded in a psql function like "return query select
> * from function_that_executes_the_query()".

Postgres version?

What is happening in function_that_executes_the_query()?

You might want to take a look at below to see any of the conditions apply:

https://www.postgresql.org/docs/12/when-can-parallel-query-be-used.html

>
> I am checking the explain output (using execute explain $query) just
> before executing inside my function and it the plan is identical to what
> I would expect, planning 8 workers.  However, this query actually only
> uses 1 worker and takes many times longer than when ran directly on the
> psql command line with the same server configuration parameters.
>
> Why would the explain output be different from the executed plan? Is
> this a limitation of plpgsql functions? Is there any way to debug this
> further?
>
> If it is meaningful during parallel execution I notice lots of
> "postgres: parallel worker" proceses in top and when executing from my
> function just a single "postgres: $user $db $host($pid) SELECT" processes.
>
> Best regards,
>
> Alastair
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Explain says 8 workers planned, only 1 executed

From
Tom Lane
Date:
Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> Thanks for solving the mystery.  I think this might be a missing point in section 15.2 in the docs.
> I wonder will this ever be improved or should I just write to temporary tables instead of return query?

I just posted a patch to improve that [1], but it's not something we'd be
likely to back-patch into existing releases.

            regards, tom lane

[1] https://www.postgresql.org/message-id/1741.1584847383%40sss.pgh.pa.us