Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1+B=c6rNNTNFcap=QXeCaEeDijqdz6dwdrdcD-T58b7ig@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
Re: Parallel Seq Scan
List pgsql-hackers
On Thu, Jan 22, 2015 at 7:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 1. Scanning block-by-block has negative impact on performance and
> > I thin it will degrade more if we increase parallel count as that can lead
> > to more randomness.
> >
> > 2. Scanning in fixed chunks improves the performance. Increasing
> > parallel count to a very large number might impact the performance,
> > but I think we can have a lower bound below which we will not allow
> > multiple processes to scan the relation.
>
> I'm confused.  Your actual test numbers seem to show that the
> performance with the block-by-block approach was slightly higher with
> parallelism than without, where as the performance with the
> chunk-by-chunk approach was lower with parallelism than without, but
> the text quoted above, summarizing those numbers, says the opposite.
>
> Also, I think testing with 2 workers is probably not enough.  I think
> we should test with 8 or even 16.
>

Below is the data with more number of workers, the amount of data and
other configurations remains as previous, I have only increased parallel
worker count:

Block-By-Block






No. of workers/Time (ms)0248162432
Run-1257851287353350091330193284913338001295057
Run-2263241314083342166347337378057351916348292
Run-3315374334208389907340327328695330048330102
Run-4301054312790314682352835323926324042302147
Run-5304547314171349158350191350468341219281315


Fixed-Chunks






No. of workers/Time (ms)0248162432
Run-1250536266279251263234347879305047435474
Run-224958723062822564819334083036351409100
Run-32349632206712300022561831053826249327903
Run-42391112454482240571891961237806379424746
Run-52399372228202190252204781140077796539766


The trend remains same although there is some variation.
In block-by-block approach, it performance dips (execution takes
more time) with more number of workers, though it stabilizes at
some higher value, still I feel it is random as it leads to random
scan.
In Fixed-chunk approach, the performance improves with more
number of workers especially at slightly higher worker count.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Dilip kumar
Date:
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Next
From: Alvaro Herrera
Date:
Subject: Re: WITH CHECK and Column-Level Privileges