Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Parallel Seq Scan
Date	December 19, 2014 14:55:08
Msg-id	54943C2E.6010401@vmware.com Whole thread Raw
In response to	Re: Parallel Seq Scan (Stephen Frost <sfrost@snowman.net>)
Responses	Re: Parallel Seq Scan Re: Parallel Seq Scan
List	pgsql-hackers

Tree view

On 12/19/2014 04:39 PM, Stephen Frost wrote:
> * Marko Tiikkaja (marko@joh.to) wrote:
>> On 12/19/14 3:27 PM, Stephen Frost wrote:
>>> We'd have to coach our users to
>>> constantly be tweaking the enable_parallel_query (or whatever) option
>>> for the queries where it helps and turning it off for others.  I'm not
>>> so excited about that.
>>
>> I'd be perfectly (that means 100%) happy if it just defaulted to
>> off, but I could turn it up to 11 whenever I needed it.  I don't
>> believe to be the only one with this opinion, either.
>
> Perhaps we should reconsider our general position on hints then and
> add them so users can define the plan to be used..  For my part, I don't
> see this as all that much different.
>
> Consider if we were just adding HashJoin support today as an example.
> Would we be happy if we had to default to enable_hashjoin = off?  Or if
> users had to do that regularly because our costing was horrid?  It's bad
> enough that we have to resort to those tweaks today in rare cases.

This is somewhat different. Imagine that we achieve perfect 
parallelization, so that when you set enable_parallel_query=8, every 
query runs exactly 8x faster on an 8-core system, by using all eight cores.

Now, you might still want to turn parallelization off, or at least set 
it to a lower setting, on an OLTP system. You might not want a single 
query to hog all CPUs to run one query faster; you'd want to leave some 
for other queries. In particular, if you run a mix of short 
transactions, and some background-like tasks that run for minutes or 
hours, you do not want to starve the short transactions by giving all 
eight CPUs to the background task.

Admittedly, this is a rather crude knob to tune for such things,
but it's quite intuitive to a DBA: how many CPU cores is one query 
allowed to utilize? And we don't really have anything better.

In real life, there's always some overhead to parallelization, so that 
even if you can make one query run faster by doing it, you might hurt 
overall throughput. To some extent, it's a latency vs. throughput 
tradeoff, and it's quite reasonable to have a GUC for that because 
people have different priorities.

- Heikki

pgsql-hackers by date:

From: Robert Haas
Date: 19 December 2014, 14:54:00
Subject: Re: Parallel Seq Scan

From: Tom Lane
Date: 19 December 2014, 14:58:06
Subject: Re: Re: [pgsql-pkg-debian] Updated libpq5 packages cause connection errors on postgresql 9.2

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next