Parallel Index Scans - Mailing list pgsql-hackers

From Amit Kapila
Subject Parallel Index Scans
Date
Msg-id CAA4eK1J9GQPN0UaXKX8XnW4M-OeUFja4+ySt4=7FfRJBUxZ9vQ@mail.gmail.com
Whole thread Raw
Responses Re: Parallel Index Scans  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
As of now, the driving table for parallel query is accessed by
parallel sequential scan which limits its usage to a certain degree.
Parallelising index scans would further increase the usage of parallel
query in many more cases.  This patch enables the parallelism for the
btree scans.  Supporting parallel index scan for other index types
like hash, gist, spgist can be done as separate patches.

The basic idea is quite similar to parallel heap scans which is that
each worker (including leader whenever possible) will scan a block and
then get the next block that is required to be scan. The parallelism
in implemented at the leaf level of a btree.  The first worker to
start a btree scan will scan till leaf and others will wait till the
first worker has reached till leaf.   The first worker after reading
the leaf block will set the next block to be read and wake the first
worker waiting to scan the next block and proceed with scanning tuples
from the block it has read, similarly each worker after reading the
block, sets the next block to be read and wakes up the first waiting
worker.  This is achieved by using the condition variable patch [1]
proposed by Robert.  Parallelism is supported for both forward and
backward scans.

The optimizer will choose the parallelism based on number of pages in
index relation and cpu cost for evaluating the rows is divided equally
among workers.  Index Scan node is made parallel aware and can be used
beneath Gather as shown below:

Current Plan for Index Scans
----------------------------------------
 Index Scan using idx2 on test  (cost=0.42..7378.96 rows=2433 width=29)
   Index Cond: (c < 10)


Parallel version of plan
----------------------------------
 Gather  (cost=1000.42..1243.40 rows=2433 width=29)
   Workers Planned: 1
   ->  Parallel Index Scan using idx2 on test  (cost=0.42..0.10
rows=1431 width=29)
         Index Cond: (c < 10)


The Parallel index scans can be used in parallelising aggregate
queries as well.  For example, given a query like:  select count(*)
from t1 where c1 > 1000 and c1 < 1100 and c2='aaa' Group By c2; below
form of parallel plans are possible:

 Finalize HashAggregate
   Group Key: c2
   ->  Gather
         Workers Planned: 1
         ->  Partial HashAggregate
               Group Key: c2
               ->  Parallel Index Scan using idx_t1_partial on t1
                     Index Cond: ((c1 > 1000) AND (c1 < 1100))
                     Filter: (c2 = 'aaa'::bpchar)

OR

Finalize GroupAggregate
   Group Key: c2
   ->  Sort
         ->  Gather
               Workers Planned: 1
               ->  Partial GroupAggregate
                     Group Key: c2
                     ->  Parallel Index Scan using idx_t1_partial on t1
                           Index Cond: ((c1 > 1000) AND (c1 < 1100))
                           Filter: (c2 = 'aaa'::bpchar)

In the second plan (GroupAggregate), the Sort + Gather step would be
replaced with GatherMerge, once we have a GatherMerge node as proposed
by Rushabh [2].  Note, that above examples are just taken to explain
the usage of parallel index scan, actual plans will be selected based
on cost.

Performance tests
----------------------------
This test has been performed on community m/c (hydra, POWER-7).

Initialize pgbench with 3000 scale factor (./pgbench -i -s 3000 postgres)

Count the rows in pgbench_accounts based on values of aid and bid

Serial plan
------------------
set max_parallel_workers_per_gather=0;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

        QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------
----
 Aggregate  (cost=4714590.52..4714590.53 rows=1 width=8) (actual
time=35684.425..35684.425 rows=1 loops=1)
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.57..4707458.12 rows=2852961 width=4) (actual
time=29210.743..34385.271 rows=9900000 loops
=1)
         Index Cond: ((aid > 1000) AND (aid < 90000000))
         Filter: ((bid > 800) AND (bid < 900))
         Rows Removed by Filter: 80098999
 Planning time: 0.183 ms
 Execution time: 35684.459 ms
(7 rows)


Parallel Plan
-------------------
set max_parallel_workers_per_gather=2;

postgres=# explain analyze select count(aid) from pgbench_accounts
where aid > 1000 and aid < 90000000 and bid > 800 and bid < 900;

                  QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
 Finalize Aggregate  (cost=3924773.13..3924773.14 rows=1 width=8)
(actual time=15033.105..15033.105 rows=1 loops=1)
   ->  Gather  (cost=3924772.92..3924773.12 rows=2 width=8) (actual
time=15032.986..15033.093 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=3923772.92..3923772.92 rows=1
width=8) (actual time=15030.354..15030.354 rows=1 loops=3)
               ->  Parallel Index Scan using pgbench_accounts_pkey on
pgbench_accounts  (cost=0.57..3920801.08 rows=1188734 width=4) (actual
time=12476.068..14600.410 rows=3300000 loops=3)
                     Index Cond: ((aid > 1000) AND (aid < 90000000))
                     Filter: ((bid > 800) AND (bid < 900))
                     Rows Removed by Filter: 26699666
 Planning time: 0.244 ms
 Execution time: 15036.081 ms
(11 rows)

The above is a median of 3 runs, all the runs gave almost same
execution time.  Here, we can notice that execution time is reduced by
more than half with two workers and I have tested with four workers
where time is reduced to one-fourth (9128.420 ms) of serial plan.  I
think these results are quite similar to what we got for parallel
sequential scans. Another thing to note is that parallelising index
scans are more beneficial if there is a Filter which removes many rows
fetched from Index Scan or if the Filter is costly (example - filter
contains costly function execution). This observation is also quite
similar to what we have observed with Parallel Sequential Scans.

I think we can parallelise Index Only Scans as well, but I have not
evaluated the same and certainly it can be done as a separate patch in
future.

Contributions
--------------------
First patch (parallel_index_scan_v1.patch) implements parallelism at
IndexAM level - Rahila Syed and Amit Kapila based on design inputs and
suggestions by Robert Haas
Second patch (parallel_index_opt_exec_support_v1.patch) provides
optimizer and executor support for parallel index scans - Amit Kapila

The order to use these patches is first apply condition variable patch
[1] then  parallel_index_scan_v1.patch and then
parallel_index_opt_exec_support_v1.patch

Thoughts?

[1] - https://www.postgresql.org/message-id/CAEepm%3D0zshYwB6wDeJCkrRJeoBM%3DjPYBe%2B-k_VtKRU_8zMLEfA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAGPqQf09oPX-cQRpBKS0Gq49Z%2Bm6KBxgxd_p9gX8CKk_d75HoQ%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: parallel.sgml
Next
From: Alvaro Herrera
Date:
Subject: Re: Non-empty default log_line_prefix