Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Parallel Seq Scan
Date	January 23, 2015 14:42:59
Msg-id	CAA4eK1+B=c6rNNTNFcap=QXeCaEeDijqdz6dwdrdcD-T58b7ig@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Parallel Seq Scan Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On Thu, Jan 22, 2015 at 7:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 1. Scanning block-by-block has negative impact on performance and
> > I thin it will degrade more if we increase parallel count as that can lead
> > to more randomness.
> >
> > 2. Scanning in fixed chunks improves the performance. Increasing
> > parallel count to a very large number might impact the performance,
> > but I think we can have a lower bound below which we will not allow
> > multiple processes to scan the relation.
>
> I'm confused. Your actual test numbers seem to show that the
> performance with the block-by-block approach was slightly higher with
> parallelism than without, where as the performance with the
> chunk-by-chunk approach was lower with parallelism than without, but
> the text quoted above, summarizing those numbers, says the opposite.
>
> Also, I think testing with 2 workers is probably not enough. I think
> we should test with 8 or even 16.
>

Below is the data with more number of workers, the amount of data and

other configurations remains as previous, I have only increased parallel

worker count:

Block-By-Block
No. of workers/Time (ms)	0	2	4	8	16	24	32
Run-1	257851	287353	350091	330193	284913	338001	295057
Run-2	263241	314083	342166	347337	378057	351916	348292
Run-3	315374	334208	389907	340327	328695	330048	330102
Run-4	301054	312790	314682	352835	323926	324042	302147
Run-5	304547	314171	349158	350191	350468	341219	281315

Fixed-Chunks
No. of workers/Time (ms)	0	2	4	8	16	24	32
Run-1	250536	266279	251263	234347	87930	50474	35474
Run-2	249587	230628	225648	193340	83036	35140	9100
Run-3	234963	220671	230002	256183	105382	62493	27903
Run-4	239111	245448	224057	189196	123780	63794	24746
Run-5	239937	222820	219025	220478	114007	77965	39766

The trend remains same although there is some variation.

In block-by-block approach, it performance dips (execution takes

more time) with more number of workers, though it stabilizes at

some higher value, still I feel it is random as it leads to random

scan.

In Fixed-chunk approach, the performance improves with more

number of workers especially at slightly higher worker count.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Dilip kumar
Date: 23 January 2015, 14:24:43
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Alvaro Herrera
Date: 23 January 2015, 16:10:59
Subject: Re: WITH CHECK and Column-Level Privileges

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next