Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1K31AhrswmLHUufRyvgwDjajdKp6MdPWcjJnJkvXSB5xQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (José Luis Tallón <jltallon@adv-solutions.net>)
List pgsql-hackers
On Fri, Dec 5, 2014 at 8:38 PM, José Luis Tallón <jltallon@adv-solutions.net> wrote:
>
> On 12/04/2014 07:35 AM, Amit Kapila wrote:
>>
>> [snip]
>>
>> The number of worker backends that can be used for
>> parallel seq scan can be configured by using a new GUC
>> parallel_seqscan_degree, the default value of which is zero
>> and it means parallel seq scan will not be considered unless
>> user configures this value.
>
>
> The number of parallel workers should be capped (of course!) at the maximum amount of "processors" (cores/vCores, threads/hyperthreads) available.
>

Also, it should consider MaxConnections configured by user.

> More over, when load goes up, the relative cost of parallel working should go up as well.
> Something like:
>     p = number of cores
>     l = 1min-load
>
>     additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
>
> (for c>1, of course)
>

How will you identify load in above formula and what is exactly 'c'
(is it parallel workers involved?).

For now, I have managed this simply by having a configuration
variable and it seems to me that the same should be good
enough for first version, we can definitely enhance it in future
version by dynamically allocating the number of workers based
on their availability and need of query, but I think lets leave that
for another day.

>
>> In ExecutorStart phase, initiate the required number of workers
>> as per parallel seq scan plan and setup dynamic shared memory and
>> share the information required for worker to execute the scan.
>> Currently I have just shared the relId, targetlist and number
>> of blocks to be scanned by worker, however I think we might want
>> to generate a plan for each of the workers in master backend and
>> then share the same to individual worker.
>
> [snip]
>>
>> Attached patch is just to facilitate the discussion about the
>> parallel seq scan and may be some other dependent tasks like
>> sharing of various states like combocid, snapshot with parallel
>> workers.  It is by no means ready to do any complex test, ofcourse
>> I will work towards making it more robust both in terms of adding
>> more stuff and doing performance optimizations.
>>
>> Thoughts/Suggestions?
>
>
> Not directly (I haven't had the time to read the code yet), but I'm thinking about the ability to simply *replace* executor methods from an extension.
> This could be an alternative to providing additional nodes that the planner can include in the final plan tree, ready to be executed.
>
> The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost.
> I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds.
>

Idea sounds interesting and I think probably in some cases
different implementation of same node might help, but may be
at this stage if we focus on one kind of implementation (which is
a win for reasonable number of cases) and make it successful,
then doing alternative implementations will be comparatively
easier and have more chances of success. 

> If this latter line of thinking is too off-topic within this thread and there is any interest, we can move the comments to another thread and I'd begin work on a PoC patch. It might as well make sense to implement the executor overloading mechanism alongide the custom plan API, though.
>

Sure, please go ahead which ever way you like to proceed.
If you want to contribute in this area/patch, then you are
welcome.

> Any comments appreciated.
>
>
> Thank you for your work, Amit

Many thanks to you as well for showing interest.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Elusive segfault with 9.3.5 & query cancel
Next
From: David Rowley
Date:
Subject: Re: Parallel Seq Scan