Thread: Parallel Index Scans

Parallel Index Scans

From

Amit Kapila

Date:

13 October 2016, 06:18:28

As of now, the driving table for parallel query is accessed by
parallel sequential scan which limits its usage to a certain degree.
Parallelising index scans would further increase the usage of parallel
query in many more cases.  This patch enables the parallelism for the
btree scans.  Supporting parallel index scan for other index types
like hash, gist, spgist can be done as separate patches.

The basic idea is quite similar to parallel heap scans which is that
each worker (including leader whenever possible) will scan a block and
then get the next block that is required to be scan. The parallelism
in implemented at the leaf level of a btree.  The first worker to
start a btree scan will scan till leaf and others will wait till the
first worker has reached till leaf.   The first worker after reading
the leaf block will set the next block to be read and wake the first
worker waiting to scan the next block and proceed with scanning tuples
from the block it has read, similarly each worker after reading the
block, sets the next block to be read and wakes up the first waiting
worker.  This is achieved by using the condition variable patch [1]
proposed by Robert.  Parallelism is supported for both forward and
backward scans.

The optimizer will choose the parallelism based on number of pages in
index relation and cpu cost for evaluating the rows is divided equally
among workers.  Index Scan node is made parallel aware and can be used
beneath Gather as shown below:

Current Plan for Index Scans
----------------------------------------
 Index Scan using idx2 on test  (cost=0.42..7378.96 rows=2433 width=29)
   Index Cond: (c < 10)


Parallel version of plan
----------------------------------
 Gather  (cost=1000.42..1243.40 rows=2433 width=29)
   Workers Planned: 1
   ->  Parallel Index Scan using idx2 on test  (cost=0.42..0.10
rows=1431 width=29)
         Index Cond: (c < 10)


The Parallel index scans can be used in parallelising aggregate
queries as well.  For example, given a query like:  select count(*)
from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below
form of parallel plans are possible:

 Finalize HashAggregate
   Group Key: c2
   ->  Gather
         Workers Planned: 1
         ->  Partial HashAggregate
               Group Key: c2
               ->  Parallel Index Scan using idx_t1_partial on t1
                     Index Cond: ((c1 > 1000) AND (c1 < 1100))
                     Filter: (c2 = 'aaa'::bpchar)

OR

Finalize GroupAggregate
   Group Key: c2
   ->  Sort
         ->  Gather
               Workers Planned: 1
               ->  Partial GroupAggregate
                     Group Key: c2
                     ->  Parallel Index Scan using idx_t1_partial on t1
                           Index Cond: ((c1 > 1000) AND (c1 < 1100))
                           Filter: (c2 = 'aaa'::bpchar)

In the second plan (GroupAggregate), the Sort + Gather step would be
replaced with GatherMerge, once we have a GatherMerge node as proposed
by Rushabh [2].  Note, that above examples are just taken to explain
the usage of parallel index scan, actual plans will be selected based
on cost.

Performance tests
----------------------------
This test has been performed on community m/c (hydra, POWER-7).

Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres)

Count the rows in pgbench_accounts based on values of aid and bid

Serial plan
------------------
set max_parallel_workers_per_gather=0;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

        QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
 Aggregate  (cost=4714590.52..4714590.53 rows=1 width=8) (actual
time=35684.425..35684.425 rows=1 loops=1)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.57..4707458.12 rows=2852961 width=4) (actual
time=29210.743..34385.271 rows=9900000 loops
=1)
         Index Cond: ((aid > 1000) AND (aid < 90000000))
         Filter: ((bid > 800) AND (bid < 900))
         Rows Removed by Filter: 80098999
 Planning time: 0.183 ms
 Execution time: 35684.459 ms
(7 rows)


Parallel Plan
-------------------
set max_parallel_workers_per_gather=2;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
 Finalize Aggregate  (cost=3924773.13..3924773.14 rows=1 width=8)
(actual time=15033.105..15033.105 rows=1 loops=1)
   ->  Gather  (cost=3924772.92..3924773.12 rows=2 width=8) (actual
time=15032.986..15033.093 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=3923772.92..3923772.92 rows=1
width=8) (actual time=15030.354..15030.354 rows=1 loops=3)
               ->  Parallel Index Scan using pgbench_accounts_pkey on
pgbench_accounts  (cost=0.57..3920801.08 rows=1188734 width=4) (actual
time=12476.068..14600.410 rows=3300000 loops=3)
                     Index Cond: ((aid > 1000) AND (aid < 90000000))
                     Filter: ((bid > 800) AND (bid < 900))
                     Rows Removed by Filter: 26699666
 Planning time: 0.244 ms
 Execution time: 15036.081 ms
(11 rows)

The above is a median of 3 runs, all the runs gave almost same
execution time.  Here, we can notice that execution time is reduced by
more than half with two workers and I have tested with four workers
where time is reduced to one-fourth (9128.420 ms) of serial plan.  I
think these results are quite similar to what we got for parallel
sequential scans. Another thing to note is that parallelising index
scans are more beneficial if there is a Filter which removes many rows
fetched from Index Scan or if the Filter is costly (example - filter
contains costly function execution). This observation is also quite
similar to what we have observed with Parallel Sequential Scans.

I think we can parallelise Index Only Scans as well, but I have not
evaluated the same and certainly it can be done as a separate patch in
future.

Contributions
--------------------
First patch (parallel_index_scan_v1.patch) implements parallelism at
IndexAM level - Rahila Syed and Amit Kapila based on design inputs and
suggestions by Robert Haas
Second patch (parallel_index_opt_exec_support_v1.patch) provides
optimizer and executor support for parallel index scans - Amit Kapila

The order to use these patches is first apply condition variable patch
[1] then  parallel_index_scan_v1.patch and then
parallel_index_opt_exec_support_v1.patch

Thoughts?

[1] - https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAGPqQf09oPX-cQRpBKS0Gq49Z%2Bm6KBxgxd_p9gX8CKk_d75HoQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Index Scans

From

Amit Kapila

Date:

18 October 2016, 06:08:11

On Thu, Oct 13, 2016 at 8:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> As of now, the driving table for parallel query is accessed by
> parallel sequential scan which limits its usage to a certain degree.
> Parallelising index scans would further increase the usage of parallel
> query in many more cases.  This patch enables the parallelism for the
> btree scans.  Supporting parallel index scan for other index types
> like hash, gist, spgist can be done as separate patches.
>

I would like to have an input on the method of selecting parallel
workers for scanning index.  Currently the patch selects number of
workers based on size of index relation and the upper limit of
parallel workers is max_parallel_workers_per_gather.  This is quite
similar to what we do for parallel sequential scan except for the fact
that in parallel seq. scan, we use the parallel_workers option if
provided by user during Create Table.  User can provide
parallel_workers option as below:

Create Table .... With (parallel_workers = 4);

Is it desirable to have similar option for parallel index scans, if
yes then what should be the interface for same?  One possible way
could be to allow user to provide it during Create Index as below:

Create Index .... With (parallel_workers = 4);

If above syntax looks sensible, then we might need to think what
should be used for parallel index build.  It seems to me that parallel
tuple sort patch [1] proposed by Peter G. is using above syntax for
getting the parallel workers input from user for parallel index
builds.

Another point which needs some thoughts is whether it is good idea to
use index relation size to calculate parallel workers for index scan.
I think ideally for index scans it should be based on number of pages
to be fetched/scanned from index.

[1] - https://www.postgresql.org/message-id/CAM3SWZTmkOFEiCDpUNaO4n9-1xcmWP-1NXmT7h0Pu3gM2YuHvg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Index Scans

From

Rahila Syed

Date:

18 October 2016, 13:38:34

>Another point which needs some thoughts is whether it is good idea to
>use index relation size to calculate parallel workers for index scan.
>I think ideally for index scans it should be based on number of pages
>to be fetched/scanned from index.

IIUC, its not possible to know the exact number of pages scanned from an index

in advance.

What we are essentially making parallel is the scan of the leaf pages.
So it will make sense to have the number of workers based on number of leaf pages.
Having said that, I think it will not make much difference as compared to existing method because
currently total index pages are used to calculate the number of workers. As far as I understand,in large indexes, the difference between

number of leaf pages and total pages is not significant. In other words, internal pages form a small fraction of total pages.

Also the calculation is based on log of number of pages so it will make even lesser difference.

Thank you,

Rahila Syed

On Tue, Oct 18, 2016 at 8:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Oct 13, 2016 at 8:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> As of now, the driving table for parallel query is accessed by
> parallel sequential scan which limits its usage to a certain degree.
> Parallelising index scans would further increase the usage of parallel
> query in many more cases. This patch enables the parallelism for the
> btree scans. Supporting parallel index scan for other index types
> like hash, gist, spgist can be done as separate patches.
>

I would like to have an input on the method of selecting parallel
workers for scanning index. Currently the patch selects number of
workers based on size of index relation and the upper limit of
parallel workers is max_parallel_workers_per_gather. This is quite
similar to what we do for parallel sequential scan except for the fact
that in parallel seq. scan, we use the parallel_workers option if
provided by user during Create Table. User can provide
parallel_workers option as below:

Create Table .... With (parallel_workers = 4);

Is it desirable to have similar option for parallel index scans, if
yes then what should be the interface for same? One possible way
could be to allow user to provide it during Create Index as below:

Create Index .... With (parallel_workers = 4);

If above syntax looks sensible, then we might need to think what
should be used for parallel index build. It seems to me that parallel
tuple sort patch [1] proposed by Peter G. is using above syntax for
getting the parallel workers input from user for parallel index
builds.

Another point which needs some thoughts is whether it is good idea to
use index relation size to calculate parallel workers for index scan.
I think ideally for index scans it should be based on number of pages
to be fetched/scanned from index.

[1] - https://www.postgresql.org/message-id/CAM3SWZTmkOFEiCDpUNaO4n9-1xcmWP-1NXmT7h0Pu3gM2YuHvg%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Parallel Index Scans

From

Amit Kapila

Date:

18 October 2016, 15:02:18

On Tue, Oct 18, 2016 at 4:08 PM, Rahila Syed <rahilasyed90@gmail.com> wrote:
>>Another point which needs some thoughts is whether it is good idea to
>>use index relation size to calculate parallel workers for index scan.
>>I think ideally for index scans it should be based on number of pages
>>to be fetched/scanned from index.
> IIUC, its not possible to know the exact number of pages scanned from an
> index
> in advance.

We can't find the exact numbers of index pages to be scanned, but I
think we can find estimated number of pages to be fetched (refer
cost_index).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Index Scans

From

Peter Geoghegan

Date:

20 October 2016, 05:09:36

On Mon, Oct 17, 2016 at 8:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Create Index .... With (parallel_workers = 4);
>
> If above syntax looks sensible, then we might need to think what
> should be used for parallel index build.  It seems to me that parallel
> tuple sort patch [1] proposed by Peter G. is using above syntax for
> getting the parallel workers input from user for parallel index
> builds.

Apparently you see a similar issue with other major database systems,
where similar storage parameter things are kind of "overloaded" like
this (they are used by both index creation, and by the optimizer in
considering whether it should use a parallel index scan). That can be
a kind of a gotcha for their users, but maybe it's still worth it. In
any case, the complaints I saw about that were from users who used
parallel CREATE INDEX with the equivalent of my parallel_workers index
storage parameter, and then unexpectedly found this also forced the
use of parallel index scan. Not the other way around.

Ideally, the parallel_workers storage parameter will rarely be
necessary because the optimizer will generally do the right thing in
all case.

-- 
Peter Geoghegan

Re: Parallel Index Scans

From

Amit Kapila

Date:

20 October 2016, 06:07:51

On Thu, Oct 20, 2016 at 7:39 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Oct 17, 2016 at 8:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Create Index .... With (parallel_workers = 4);
>>
>> If above syntax looks sensible, then we might need to think what
>> should be used for parallel index build.  It seems to me that parallel
>> tuple sort patch [1] proposed by Peter G. is using above syntax for
>> getting the parallel workers input from user for parallel index
>> builds.
>
> Apparently you see a similar issue with other major database systems,
> where similar storage parameter things are kind of "overloaded" like
> this (they are used by both index creation, and by the optimizer in
> considering whether it should use a parallel index scan). That can be
> a kind of a gotcha for their users, but maybe it's still worth it.
>

I have also checked and found that you are right.  In SQL Server, they
are using max degree of parallelism (MAXDOP) parameter which is I
think is common for all the sql statements.

> In
> any case, the complaints I saw about that were from users who used
> parallel CREATE INDEX with the equivalent of my parallel_workers index
> storage parameter, and then unexpectedly found this also forced the
> use of parallel index scan. Not the other way around.
>

I can understand that it can be confusing to users, so other option
could be to provide separate parameters like parallel_workers_build
and parallel_workers where first can be used for index build and
second can be used for scan.  My personal opinion is to have one
parameter, so that users have one less thing to learn about
parallelism.

> Ideally, the parallel_workers storage parameter will rarely be
> necessary because the optimizer will generally do the right thing in
> all case.
>

Yeah, we can choose not to provide any parameter for parallel index
scans, but some users might want to have a parameter similar to
parallel table scans, so it could be handy for them to use.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Index Scans

From

Peter Geoghegan

Date:

20 October 2016, 06:24:59

On Wed, Oct 19, 2016 at 8:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I have also checked and found that you are right.  In SQL Server, they
> are using max degree of parallelism (MAXDOP) parameter which is I
> think is common for all the sql statements.

It's not just that one that does things this way, for what it's worth.

> I can understand that it can be confusing to users, so other option
> could be to provide separate parameters like parallel_workers_build
> and parallel_workers where first can be used for index build and
> second can be used for scan.  My personal opinion is to have one
> parameter, so that users have one less thing to learn about
> parallelism.

That's my first instinct too, but I don't really have an opinion yet.

I think that this is the kind of thing where it could make sense to
take a "wait and see" approach, and then make a firm decision
immediately prior to beta. This is what we did in deciding the name of
and fine details around what ultimately became the
max_parallel_workers_per_gather GUC (plus related GUCs and storage
parameters).

-- 
Peter Geoghegan

Re: Parallel Index Scans

From

Robert Haas

Date:

20 October 2016, 20:03:29

On Wed, Oct 19, 2016 at 11:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Ideally, the parallel_workers storage parameter will rarely be
>> necessary because the optimizer will generally do the right thing in
>> all case.
>
> Yeah, we can choose not to provide any parameter for parallel index
> scans, but some users might want to have a parameter similar to
> parallel table scans, so it could be handy for them to use.

I think the parallel_workers reloption should override the degree of
parallelism for any sort of parallel scan on that table.  Had I
intended it to apply only to sequential scans, I would have named it
differently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Index Scans

From

Amit Kapila

Date:

21 October 2016, 16:28:03

On Thu, Oct 20, 2016 at 10:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 19, 2016 at 11:07 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Ideally, the parallel_workers storage parameter will rarely be
>>> necessary because the optimizer will generally do the right thing in
>>> all case.
>>
>> Yeah, we can choose not to provide any parameter for parallel index
>> scans, but some users might want to have a parameter similar to
>> parallel table scans, so it could be handy for them to use.
>
> I think the parallel_workers reloption should override the degree of
> parallelism for any sort of parallel scan on that table.  Had I
> intended it to apply only to sequential scans, I would have named it
> differently.
>

I think there is big difference of size of relation to scan between
parallel sequential scan and parallel (range) index scan which could
make it difficult for user to choose the value of this parameter.  Why
do you think that the parallel_workers reloption should suffice all
type of scans for a table?  I could only think of providing it based
on thinking that lesser config knobs makes life easier.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Index Scans

From

Robert Haas

Date:

21 October 2016, 20:25:34

On Fri, Oct 21, 2016 at 9:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think the parallel_workers reloption should override the degree of
>> parallelism for any sort of parallel scan on that table.  Had I
>> intended it to apply only to sequential scans, I would have named it
>> differently.
>
> I think there is big difference of size of relation to scan between
> parallel sequential scan and parallel (range) index scan which could
> make it difficult for user to choose the value of this parameter.  Why
> do you think that the parallel_workers reloption should suffice all
> type of scans for a table?  I could only think of providing it based
> on thinking that lesser config knobs makes life easier.

Well, we could do that, but it would be fairly complicated and it
doesn't seem to me to be the right place to focus our efforts.  I'd
rather try to figure out some way to make the planner smarter, because
even if users can override the number of workers on a
per-table-per-scan-type basis, they're probably still going to find
using parallel query pretty frustrating unless we make the
number-of-workers formula smarter than it is today.  Anyway, even if
we do decide to add more reloptions than just parallel_degree someday,
couldn't that be left for a separate patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel Index Scans

From

Amit Kapila

Date:

22 October 2016, 06:37:41

On Fri, Oct 21, 2016 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 9:27 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> I think the parallel_workers reloption should override the degree of
>>> parallelism for any sort of parallel scan on that table.  Had I
>>> intended it to apply only to sequential scans, I would have named it
>>> differently.
>>
>> I think there is big difference of size of relation to scan between
>> parallel sequential scan and parallel (range) index scan which could
>> make it difficult for user to choose the value of this parameter.  Why
>> do you think that the parallel_workers reloption should suffice all
>> type of scans for a table?  I could only think of providing it based
>> on thinking that lesser config knobs makes life easier.
>
> Well, we could do that, but it would be fairly complicated and it
> doesn't seem to me to be the right place to focus our efforts.  I'd
> rather try to figure out some way to make the planner smarter, because
> even if users can override the number of workers on a
> per-table-per-scan-type basis, they're probably still going to find
> using parallel query pretty frustrating unless we make the
> number-of-workers formula smarter than it is today.  Anyway, even if
> we do decide to add more reloptions than just parallel_degree someday,
> couldn't that be left for a separate patch?
>

That makes sense to me.  As of now, patch doesn't consider reloptions
for parallel index scans.  So, I think we can leave it as it is and
then later as a separate patch decide whether to use reloption of
table or a separate reloption for index would be better choice.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel Index Scans

From

Amit Kapila

Date:

26 November 2016, 14:35:58

On Sat, Oct 22, 2016 at 9:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I have rebased the patch (parallel_index_scan_v2) based on latest
commit e8ac886c (condition variables).  I have removed the usage of
ConditionVariablePrepareToSleep as that is is no longer mandatory.  I
have also updated docs for wait event introduced by this patch (thanks
to Dilip for noticing it).  There is no change in
parallel_index_opt_exec_support patch, but just attaching here for
easier reference.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Parallel Index Scans

From

Haribabu Kommi

Date:

05 December 2016, 08:07:14

On Sat, Nov 26, 2016 at 10:35 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sat, Oct 22, 2016 at 9:07 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Oct 21, 2016 at 10:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I have rebased the patch (parallel_index_scan_v2) based on latest
commit e8ac886c (condition variables). I have removed the usage of
ConditionVariablePrepareToSleep as that is is no longer mandatory. I
have also updated docs for wait event introduced by this patch (thanks
to Dilip for noticing it). There is no change in
parallel_index_opt_exec_support patch, but just attaching here for
easier reference.

Moved to next CF with "needs review" status.

Regards,

Hari Babu

Fujitsu Australia