Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Seq Scan
Date	November 13, 2015 16:17:51
Msg-id	CAA4eK1KGXPz-SzhgsdU2H3p8qO0NxRBfjT3BDvKZTXPDDb2jtQ@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Nov 13, 2015 at 10:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 12, 2015 at 10:39 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The number of shared buffers hit could be different across different runs
> > because the read sequence of parallel workers can't be guaranteed, also
> > I don't think same is even guaranteed for Seq Scan node,
>
> The number of hits could be different. However, it seems like any
> sequential scan, parallel or not, should have a number of accesses
> (hit + read) equal to the size of the relation. Not sure if that's
> what is happening here.
>

After patch provided above to fix the issue reported by Pavel, that is

the behaviour, but I think there are few more things which we might

want to consider, just refer the below plan:

Total pages in table

--------------------------------

postgres=# select relname,relpages from pg_class where relname like '%t2%';

relname | relpages

---------+----------

t2 | 5406

(1 row)

Parallel Plan

-----------------------------

postgres=# explain (analyze,buffers,timing) select count(*) from t2 where c1 % 1

0 = 0;

QUERY PLAN

--------------------------------------------------------------------------------

------------------------------------------------

Aggregate (cost=8174.90..8174.91 rows=1 width=0) (actual time=1055.294..1055.2

94 rows=1 loops=1)

Buffers: shared hit=446 read=5054

-> Gather (cost=0.00..8162.40 rows=5000 width=0) (actual time=79.787..959.6

51 rows=100000 loops=1)

Number of Workers: 2

Buffers: shared hit=446 read=5054

-> Parallel Seq Scan on t2 (cost=0.00..8162.40 rows=5000 width=0) (ac

tual time=30.771..2518.844 rows=100000 loops=1)

Filter: ((c1 % 10) = 0)

Rows Removed by Filter: 900000

Buffers: shared hit=352 read=5054

Planning time: 0.170 ms

Execution time: 1059.400 ms

(11 rows)

Lets focus on Buffers and actual time in the above plan:

Buffers - At Parallel Seq Scan node, it shows total of 5406 (352+5054)

buffers which tallys with what is expected. However at Gather node,

it shows 5500 (446+5054) and the reason for the same is that we

accumulate overall buffer usage for parallel execution of worker which

includes start of node as well, refer ParallelQueryMain() and when the

that gets counted even towards the buffer calculation of Gather node.

The theory behind collecting overall buffer usage for parallel execution

was that we need it for pg_stat_statements where the stats is accumulated

for overall execution not on node-by-node basis refer queryDesc->totaltime

usage in standard_ExecutorRun().

I think here we need to decide what is the right value to display at

Gather node:

1. Display the same number of buffers at Gather node as at Parallel

Seq Scan node.

2. Display the number of buffers at Parallel Seq Scan node plus the

additional buffers used by parallel workers for miscellaneous work

like ExecutorStart(), etc.

3. Don't account for buffers used for parallel workers.

4. Anything better?

Also in conjuction with above, we need to see what should be accounted for

pg_stat_statements?

actual_time -

Actual time at Gather node: actual time = 79.787..959.651

Actual time at Parallel Seq Scan node = 30.771..2518.844

Time at Parallel Seq Scan node is more than time at Gather node as

the time for parallel workers is also accumulated for Parallel Seq Scan

node, whereas some doesn't get accounted for Gather node. Now it

could be confusing for users because time displayed at Parallel Seq

Scan node will be equal to - time_taken_by_worker-1 +

time_taken_by_worker-2 + ...

This time could be more than the actual time taken by query because each

worker is getting executed parallely but we have accounted the time such

that each one is executing serially.

I think the time for fetching the tuples from workers is already accounted

for Gather node, so may be for Parallel Seq Scan node we can omit

adding the time for each of the parallel workers.

Thoughts?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Thom Brown
Date: 13 November 2015, 15:48:34
Subject: Re: Parallel Seq Scan

From: Amit Kapila
Date: 13 November 2015, 16:34:07
Subject: Re: Parallel Seq Scan

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next