Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Gavin Flower
Subject Re: Parallel Seq Scan
Date
Msg-id 54947BE4.8080900@archidevsys.co.nz
Whole thread Raw
In response to Re: Parallel Seq Scan  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 20/12/14 03:54, Heikki Linnakangas wrote:
> On 12/19/2014 04:39 PM, Stephen Frost wrote:
>> * Marko Tiikkaja (marko@joh.to) wrote:
>>> On 12/19/14 3:27 PM, Stephen Frost wrote:
>>>> We'd have to coach our users to
>>>> constantly be tweaking the enable_parallel_query (or whatever) option
>>>> for the queries where it helps and turning it off for others.  I'm not
>>>> so excited about that.
>>>
>>> I'd be perfectly (that means 100%) happy if it just defaulted to
>>> off, but I could turn it up to 11 whenever I needed it.  I don't
>>> believe to be the only one with this opinion, either.
>>
>> Perhaps we should reconsider our general position on hints then and
>> add them so users can define the plan to be used..  For my part, I don't
>> see this as all that much different.
>>
>> Consider if we were just adding HashJoin support today as an example.
>> Would we be happy if we had to default to enable_hashjoin = off?  Or if
>> users had to do that regularly because our costing was horrid? It's bad
>> enough that we have to resort to those tweaks today in rare cases.
>
> This is somewhat different. Imagine that we achieve perfect 
> parallelization, so that when you set enable_parallel_query=8, every 
> query runs exactly 8x faster on an 8-core system, by using all eight 
> cores.
>
> Now, you might still want to turn parallelization off, or at least set 
> it to a lower setting, on an OLTP system. You might not want a single 
> query to hog all CPUs to run one query faster; you'd want to leave 
> some for other queries. In particular, if you run a mix of short 
> transactions, and some background-like tasks that run for minutes or 
> hours, you do not want to starve the short transactions by giving all 
> eight CPUs to the background task.
>
> Admittedly, this is a rather crude knob to tune for such things,
> but it's quite intuitive to a DBA: how many CPU cores is one query 
> allowed to utilize? And we don't really have anything better.
>
> In real life, there's always some overhead to parallelization, so that 
> even if you can make one query run faster by doing it, you might hurt 
> overall throughput. To some extent, it's a latency vs. throughput 
> tradeoff, and it's quite reasonable to have a GUC for that because 
> people have different priorities.
>
> - Heikki
>
>
>
How about 3 numbers:
   minCPUs # > 0   maxCPUs           # >= minCPUs   fractionOfCPUs    # rounded up


If you just have the /*number*/ of CPUs then a setting that is 
appropriate for quad core, may be too /*small*/ for an octo core processor.

If you just have the /*fraction*/ of CPUs then a setting that is 
appropriate for quad core, may be too /*large*/ for an octo core processor.



Cheers,
Gavin



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Commitfest problems
Next
From: Stephen Frost
Date:
Subject: Re: Parallel Seq Scan