Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Parallel Seq Scan
Date
Msg-id 54943C2E.6010401@vmware.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Parallel Seq Scan  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On 12/19/2014 04:39 PM, Stephen Frost wrote:
> * Marko Tiikkaja (marko@joh.to) wrote:
>> On 12/19/14 3:27 PM, Stephen Frost wrote:
>>> We'd have to coach our users to
>>> constantly be tweaking the enable_parallel_query (or whatever) option
>>> for the queries where it helps and turning it off for others.  I'm not
>>> so excited about that.
>>
>> I'd be perfectly (that means 100%) happy if it just defaulted to
>> off, but I could turn it up to 11 whenever I needed it.  I don't
>> believe to be the only one with this opinion, either.
>
> Perhaps we should reconsider our general position on hints then and
> add them so users can define the plan to be used..  For my part, I don't
> see this as all that much different.
>
> Consider if we were just adding HashJoin support today as an example.
> Would we be happy if we had to default to enable_hashjoin = off?  Or if
> users had to do that regularly because our costing was horrid?  It's bad
> enough that we have to resort to those tweaks today in rare cases.

This is somewhat different. Imagine that we achieve perfect 
parallelization, so that when you set enable_parallel_query=8, every 
query runs exactly 8x faster on an 8-core system, by using all eight cores.

Now, you might still want to turn parallelization off, or at least set 
it to a lower setting, on an OLTP system. You might not want a single 
query to hog all CPUs to run one query faster; you'd want to leave some 
for other queries. In particular, if you run a mix of short 
transactions, and some background-like tasks that run for minutes or 
hours, you do not want to starve the short transactions by giving all 
eight CPUs to the background task.

Admittedly, this is a rather crude knob to tune for such things,
but it's quite intuitive to a DBA: how many CPU cores is one query 
allowed to utilize? And we don't really have anything better.

In real life, there's always some overhead to parallelization, so that 
even if you can make one query run faster by doing it, you might hurt 
overall throughput. To some extent, it's a latency vs. throughput 
tradeoff, and it's quite reasonable to have a GUC for that because 
people have different priorities.

- Heikki




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Parallel Seq Scan
Next
From: Tom Lane
Date:
Subject: Re: Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2