Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id CAJrrPGca43oS9kKKaFsunbpf3QH-PHdxiqjgB0t5oJqUxCEttQ@mail.gmail.com
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: using custom scan nodes to prototype parallel sequential scan  (Amit Kapila <amit.kapila16@gmail.com>)
Re: using custom scan nodes to prototype parallel sequential scan  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On Tue, Nov 11, 2014 at 2:35 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Nov 11, 2014 at 5:30 AM, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund <andres@2ndquadrant.com>
>> wrote:
>> > On 2014-11-10 10:57:16 -0500, Robert Haas wrote:
>> >> Does parallelism help at all?
>> >
>> > I'm pretty damn sure. We can't even make a mildly powerfull storage
>> > fully busy right now. Heck, I can't make my workstation's storage with a
>> > raid 10 out of four spinning disks fully busy.
>> >
>> > I think some of that benefit also could be reaped by being better at
>> > hinting the OS...
>>
>> Yes, it definitely helps but not only limited to IO bound operations.
>> It gives a good gain for the queries having CPU intensive where
>> conditions.
>>
>> One more point we may need to consider, is there any overhead in passing
>> the data row from workers to backend?
>
> I am not sure if that overhead will be too much visible if we improve the
> use of I/O subsystem by making parallel tasks working on it.

I feel there may be an overhead because of workers needs to put the result
data in the shared memory and the backend has to read from there to process
it further. If the cost of transfering data from worker to backend is more than
fetching a tuple from the scan, then the overhead is visible when the
selectivity is more.

> However
> another idea here could be that instead of passing tuple data, we just
> pass tuple id, but in that case we have to retain the pin on the buffer
> that contains tuple untill master backend reads from it that might have
> it's own kind of problems.

Transfering tuple id doesn't solve the scenarios if the node needs any
projection.

Regards,
Hari Babu
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Next
From: Michael Paquier
Date:
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()