Thread: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
"actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/14/using-explain.html Description: https://www.postgresql.org/docs/14/using-explain.html The doc says "the loops value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown." But I found for parallel operation, the above description maybe not correct. For example postgres=# create table c(id int); CREATE TABLE postgres=# insert into c select generate_series(1,1000000); INSERT 0 1000000 postgres=# explain analyze select count(*) from c; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual time=290.460..290.508 rows=1 loops=1) -> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual time=289.605..290.484 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8) (actual time=188.336..188.337 rows=1 loops=3) -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3) Planning Time: 0.331 ms Execution Time: 290.607 ms (8 rows) postgres=# According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but the total cost for this SQL is only 290ms. Is the output of this explain correct?
Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
> https://www.postgresql.org/docs/14/using-explain.html > The doc says > > "the loops value reports the total number of executions of the node, and the > > actual time and rows values shown are averages per-execution. This is done > > to make the numbers comparable with the way that the cost estimates are > > shown." > > But I found for parallel operation, the above description maybe not > > correct. > > For example > > postgres=# create table c(id int); > > CREATE TABLE > > postgres=# insert into c select generate_series(1,1000000); > > INSERT 0 1000000 > > postgres=# explain analyze select count(*) from c; > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual > > time=290.460..290.508 rows=1 loops=1) > > -> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual > > time=289.605..290.484 rows=3 loops=1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8) > > (actual time=188.336..188.337 rows=1 loops=3) > > -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 > > width=0) (actual time=0.030..140.036 rows=333333 loops=3) > > Planning Time: 0.331 ms > > Execution Time: 290.607 ms > > (8 rows) > > postgres=# > > According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but > > the total cost for this SQL is only 290ms. > > Is the output of this explain correct? Parallel query is explained in https://www.postgresql.org/docs/14/how-parallel-query-works.html and https://www.postgresql.org/docs/14/parallel-plans.html The docs seem clear to me that as the nodes are executed in parallel then the time execution time is not 140.036*3. The 140.036value is actual time the Parallel Seq Scan nodes ran for but there were up to 2 running in parallel.
Attachment
RE: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff <nikolai.berkoff@pm.me> wrote: > > Parallel query is explained in > https://www.postgresql.org/docs/14/how-parallel-query-works.html and > https://www.postgresql.org/docs/14/parallel-plans.html > > The docs seem clear to me that as the nodes are executed in parallel then the > time execution time is not 140.036*3. The 140.036 value is actual time the Parallel > Seq Scan nodes ran for but there were up to 2 running in parallel. Thanks for your reply. I read your references but still confused about the 'loops' in parallel query result. > -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3) In my previous example, actual row number is 333333*3=1e6(which is correct), so I think the actual time is 140.036*3ms. Do your think the loops(3) has no meaning for parallel scan node when calculate actual time? Regards, Tang > -----Original Message----- > From: nikolai.berkoff <nikolai.berkoff@pm.me> > Sent: Wednesday, November 17, 2021 6:29 AM > To: Tang, Haiying/唐 海英 <tanghy.fnst@fujitsu.com>; > pgsql-docs@lists.postgresql.org > Subject: Re: "actual time" in QUERY PLAN for parallel operation when loops is > bigger than 1 > > > https://www.postgresql.org/docs/14/using-explain.html > > The doc says > > > > > "the loops value reports the total number of executions of the node, and the > > > > > actual time and rows values shown are averages per-execution. This is done > > > > > to make the numbers comparable with the way that the cost estimates are > > > > > shown." > > > > > But I found for parallel operation, the above description maybe not > > > > > correct. > > > > > For example > > > > > postgres=# create table c(id int); > > > > > CREATE TABLE > > > > > postgres=# insert into c select generate_series(1,1000000); > > > > > INSERT 0 1000000 > > > > > postgres=# explain analyze select count(*) from c; > > > > > QUERY PLAN > > > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > --------------------------------------------------------------------------- > ----------------------------------------------------------- > > > > > Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8) (actual > > > > > time=290.460..290.508 rows=1 loops=1) > > > > > -> Gather (cost=10633.33..10633.54 rows=2 width=8) (actual > > > > > time=289.605..290.484 rows=3 loops=1) > > > > > Workers Planned: 2 > > > > > Workers Launched: 2 > > > > > -> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8) > > > > > (actual time=188.336..188.337 rows=1 loops=3) > > > > > -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 > > > > > width=0) (actual time=0.030..140.036 rows=333333 loops=3) > > > > > Planning Time: 0.331 ms > > > > > Execution Time: 290.607 ms > > > > > (8 rows) > > > > > postgres=# > > > > > According to PG-doc, the "Parallel Seq Scan" node cost 140.036*3=420ms, but > > > > > the total cost for this SQL is only 290ms. > > > > > Is the output of this explain correct? > > Parallel query is explained in > https://www.postgresql.org/docs/14/how-parallel-query-works.html and > https://www.postgresql.org/docs/14/parallel-plans.html > > The docs seem clear to me that as the nodes are executed in parallel then the > time execution time is not 140.036*3. The 140.036 value is actual time the Parallel > Seq Scan nodes ran for but there were up to 2 running in parallel.
Re: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
Wednesday, November 17, 2021 6:29 AM, nikolai.berkoff <nikolai.berkoff@pm.me> wrote:
>
> Parallel query is explained in
> https://www.postgresql.org/docs/14/how-parallel-query-works.html and
> https://www.postgresql.org/docs/14/parallel-plans.html
>
> The docs seem clear to me that as the nodes are executed in parallel then the
> time execution time is not 140.036*3. The 140.036 value is actual time the Parallel
> Seq Scan nodes ran for but there were up to 2 running in parallel.
Thanks for your reply.
I read your references but still confused about the 'loops' in parallel query result.
> -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3)
In my previous example, actual row number is 333333*3=1e6(which is correct), so I think the actual time is 140.036*3ms.
Do your think the loops(3) has no meaning for parallel scan node when calculate actual time?
RE: "actual time" in QUERY PLAN for parallel operation when loops is bigger than 1
On Wednesday, November 17, 2021 4:19 PM, Guillaume Lelarge guillaume@lelarge.info wrote:
>> -> Parallel Seq Scan on c (cost=0.00..8591.67 rows=416667 width=0) (actual time=0.030..140.036 rows=333333 loops=3)
>In my previous example, actual row number is 333333*3=1e6(which is correct), so I think the actual time is 140.036*3ms.
>Do your think the loops(3) has no meaning for parallel scan node when calculate actual time?
>
>As far as I understand it, you have to multiply the number of rows by the number of loops, but this doesn't apply to duration at least for parallel >queries.
Yes, I got your point.
I’m not familiar with PostgreSQL planner/executor, but if the code is correct, then maybe some modification should be done at [1] as below:
Before:
Multiply by the loops
value to get the total time actually spent in the node.
After:
Multiply by the loops
value to get the total time actually spent in the node (for node in the parallel portion of the plan, this is not needed).
[1] https://www.postgresql.org/docs/14/using-explain.html
What do you think?
Regards,
Tang