Re: Parallel Index Scans - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Index Scans
Date
Msg-id CAA4eK1+V9L9Tdv19S17=AqveDiSdZhWqfq8Eypvm3Go0q_EsTg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Index Scans  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Parallel Index Scans  (Peter Geoghegan <pg@heroku.com>)
Re: Parallel Index Scans  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Oct 20, 2016 at 7:39 AM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Oct 17, 2016 at 8:08 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Create Index .... With (parallel_workers = 4);
>>
>> If above syntax looks sensible, then we might need to think what
>> should be used for parallel index build.  It seems to me that parallel
>> tuple sort patch [1] proposed by Peter G. is using above syntax for
>> getting the parallel workers input from user for parallel index
>> builds.
>
> Apparently you see a similar issue with other major database systems,
> where similar storage parameter things are kind of "overloaded" like
> this (they are used by both index creation, and by the optimizer in
> considering whether it should use a parallel index scan). That can be
> a kind of a gotcha for their users, but maybe it's still worth it.
>

I have also checked and found that you are right.  In SQL Server, they
are using max degree of parallelism (MAXDOP) parameter which is I
think is common for all the sql statements.

> In
> any case, the complaints I saw about that were from users who used
> parallel CREATE INDEX with the equivalent of my parallel_workers index
> storage parameter, and then unexpectedly found this also forced the
> use of parallel index scan. Not the other way around.
>

I can understand that it can be confusing to users, so other option
could be to provide separate parameters like parallel_workers_build
and parallel_workers where first can be used for index build and
second can be used for scan.  My personal opinion is to have one
parameter, so that users have one less thing to learn about
parallelism.

> Ideally, the parallel_workers storage parameter will rarely be
> necessary because the optimizer will generally do the right thing in
> all case.
>

Yeah, we can choose not to provide any parameter for parallel index
scans, but some users might want to have a parameter similar to
parallel table scans, so it could be handy for them to use.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Disable autovacuum guc?
Next
From: Peter Geoghegan
Date:
Subject: Re: Parallel Index Scans