Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1L42kdf2jMXBc7nCP3CHPUmzm50wv1F8MeC_wW5OgsG8A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Sat, Jan 24, 2015 at 12:24 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
>
> On 01/23/2015 10:44 AM, Jim Nasby wrote:
>>
>> number of workers especially at slightly higher worker count.
>>
>> Those fixed chunk numbers look pretty screwy. 2, 4 and 8 workers make no
>> difference, then suddenly 16 cuts times by 1/2 to 1/3? Then 32 cuts time
>> by another 1/2 to 1/3?
>

There is variation in tests at different worker count but there is
definitely improvement from 0 to 2 worker count (if you refer my
initial mail on this data, with 2 workers there is a benefit of ~20%)
and I think we run the tests in a similar way (like compare 0 and 2
or 0 or 4 or 0 and 8), then the other effects could be minimised and
we might see better consistency, however the general trend with
fixed-chunk seems to be that scanning that way is better.

I think the real benefit with the current approach/patch can be seen
with qualifications (especially costly expression evaluation).

Further, if we want to just get the benefit of parallel I/O, then
I think we can get that by parallelising partition scan where different
table partitions reside on different disk partitions, however that is
a matter of separate patch.

>
> cached? First couple of runs gets the relations into memory?
>

Not entirely, as the table size is double than RAM, so each run
has to perform I/O.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: WITH CHECK and Column-Level Privileges
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: searching in array function - array_position