Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1++HD31vgA2-C5btiX+6bLZrtoTwmwzgk4Cp9A0YEgUJw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
On Mon, Dec 22, 2014 at 7:34 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
> On 12/21/14, 12:42 AM, Amit Kapila wrote:
>>
>> On Fri, Dec 19, 2014 at 6:21 PM, Stephen Frost <sfrost@snowman.net <mailto:sfrost@snowman.net>> wrote:
>> a. Instead of passing value array, just pass tuple id, but retain the
>> buffer pin till master backend reads the tuple based on tupleid.
>> This has side effect that we have to retain buffer pin for longer
>> period of time, but again that might not have any problem in
>> real world usage of parallel query.
>>
>> b. Instead of passing value array, pass directly the tuple which could
>> be directly propagated by master backend to upper layer or otherwise
>> in master backend change some code such that it could propagate the
>> tuple array received via shared memory queue directly to frontend.
>> Basically save the one extra cycle of form/deform tuple.
>>
>> Both these need some new message type and handling for same in
>> Executor code.
>>
>> Having said above, I think we can try to optimize this in multiple
>> ways, however we need additional mechanism and changes in Executor
>> code which is error prone and doesn't seem to be important at this
>> stage where we want the basic feature to work.
>
>
> Would b require some means of ensuring we didn't try and pass raw tuples to frontends?

That seems to be already there, before sending the tuple
to frontend, we already ensure to deform it (refer printtup()->

slot_getallattrs())


>Other than that potential wrinkle, it seems like less work than a.
>

Here, I am assuming that you are mentioning about *pass the tuple*
directly approach;  We also need to devise a new protocol message
and mechanism to directly pass the tuple via shared memory queues,
also I think currently we can send only the things via shared memory
queues which we can do via FE/BE protocol and we don't send tuples
directly to frontend.   Apart from this, I am not sure how much benefit it
can give, because it will reduce one part of tuple communication, but still
the amount of data transferred will be almost same.

This is an area of improvement which needs more investigation and even
without this we can get benefit in many cases as shown upthread and
I think after that we can try to parallelize the aggregation (Simon Riggs and
David Rowley have already worked out some infrastructure for the same)
that will surely give us good benefits.  So I suggest it's better to focus on
the remaining things with which this patch could be in a shape (in terms of
robustness/stability) where it can be accepted rather than trying to
optimize tuple communication which we can do later as well.

> ...
>
>> I think there are mainly two things which can lead to benefit
>> by employing parallel workers
>> a. Better use of available I/O bandwidth
>> b. Better use of available CPU's by doing expression evaluation
>> by multiple workers.
>
>
> ...
>
>> In the above tests, it seems to me that the maximum benefit due to
>> 'a' is realized upto 4~8 workers
>
>
> I'd think a good first estimate here would be to just use effective_io_concurrency.
>

One thing we should be cautious about this parameter is that currently
it is mapped to number of pages that needs to prefetched, and using
it for deciding degree of parallelism could be slightly tricky, however I
will consider it while working on cost model.

Thanks for your suggestions.


With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: no test programs in contrib
Next
From: Amit Kapila
Date:
Subject: Re: Suppressing elog.c context messages (was Re: Wait free LW_SHARED acquisition)