Home > mailing lists

Re: Parallel Index Scans - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Index Scans
Date	October 18, 2016 06:08:11
Msg-id	CAA4eK1JJinL8eZ=ohLxiS3Ggyiu-fZAQH4DDu2-KgwA0q-YFqA@mail.gmail.com Whole thread Raw
In response to	Parallel Index Scans (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel Index Scans (Rahila Syed <rahilasyed90@gmail.com>) Re: Parallel Index Scans (Peter Geoghegan <pg@heroku.com>)
List	pgsql-hackers

Tree view

On Thu, Oct 13, 2016 at 8:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> As of now, the driving table for parallel query is accessed by
> parallel sequential scan which limits its usage to a certain degree.
> Parallelising index scans would further increase the usage of parallel
> query in many more cases.  This patch enables the parallelism for the
> btree scans.  Supporting parallel index scan for other index types
> like hash, gist, spgist can be done as separate patches.
>

I would like to have an input on the method of selecting parallel
workers for scanning index.  Currently the patch selects number of
workers based on size of index relation and the upper limit of
parallel workers is max_parallel_workers_per_gather.  This is quite
similar to what we do for parallel sequential scan except for the fact
that in parallel seq. scan, we use the parallel_workers option if
provided by user during Create Table.  User can provide
parallel_workers option as below:

Create Table .... With (parallel_workers = 4);

Is it desirable to have similar option for parallel index scans, if
yes then what should be the interface for same?  One possible way
could be to allow user to provide it during Create Index as below:

Create Index .... With (parallel_workers = 4);

If above syntax looks sensible, then we might need to think what
should be used for parallel index build.  It seems to me that parallel
tuple sort patch [1] proposed by Peter G. is using above syntax for
getting the parallel workers input from user for parallel index
builds.

Another point which needs some thoughts is whether it is good idea to
use index relation size to calculate parallel workers for index scan.
I think ideally for index scans it should be based on number of pages
to be fetched/scanned from index.

[1] - https://www.postgresql.org/message-id/CAM3SWZTmkOFEiCDpUNaO4n9-1xcmWP-1NXmT7h0Pu3gM2YuHvg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Peter Eisentraut
Date: 18 October 2016, 05:26:28
Subject: Re: Idempotency for all DDL statements

From: Michael Paquier
Date: 18 October 2016, 06:12:59
Subject: Re: FSM corruption leading to errors

Re: Parallel Index Scans - Mailing list pgsql-hackers

Previous

Next