Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Gavin Flower |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | 54947BE4.8080900@archidevsys.co.nz Whole thread Raw |
In response to | Re: Parallel Seq Scan (Heikki Linnakangas <hlinnakangas@vmware.com>) |
List | pgsql-hackers |
On 20/12/14 03:54, Heikki Linnakangas wrote: > On 12/19/2014 04:39 PM, Stephen Frost wrote: >> * Marko Tiikkaja (marko@joh.to) wrote: >>> On 12/19/14 3:27 PM, Stephen Frost wrote: >>>> We'd have to coach our users to >>>> constantly be tweaking the enable_parallel_query (or whatever) option >>>> for the queries where it helps and turning it off for others. I'm not >>>> so excited about that. >>> >>> I'd be perfectly (that means 100%) happy if it just defaulted to >>> off, but I could turn it up to 11 whenever I needed it. I don't >>> believe to be the only one with this opinion, either. >> >> Perhaps we should reconsider our general position on hints then and >> add them so users can define the plan to be used.. For my part, I don't >> see this as all that much different. >> >> Consider if we were just adding HashJoin support today as an example. >> Would we be happy if we had to default to enable_hashjoin = off? Or if >> users had to do that regularly because our costing was horrid? It's bad >> enough that we have to resort to those tweaks today in rare cases. > > This is somewhat different. Imagine that we achieve perfect > parallelization, so that when you set enable_parallel_query=8, every > query runs exactly 8x faster on an 8-core system, by using all eight > cores. > > Now, you might still want to turn parallelization off, or at least set > it to a lower setting, on an OLTP system. You might not want a single > query to hog all CPUs to run one query faster; you'd want to leave > some for other queries. In particular, if you run a mix of short > transactions, and some background-like tasks that run for minutes or > hours, you do not want to starve the short transactions by giving all > eight CPUs to the background task. > > Admittedly, this is a rather crude knob to tune for such things, > but it's quite intuitive to a DBA: how many CPU cores is one query > allowed to utilize? And we don't really have anything better. > > In real life, there's always some overhead to parallelization, so that > even if you can make one query run faster by doing it, you might hurt > overall throughput. To some extent, it's a latency vs. throughput > tradeoff, and it's quite reasonable to have a GUC for that because > people have different priorities. > > - Heikki > > > How about 3 numbers: minCPUs # > 0 maxCPUs # >= minCPUs fractionOfCPUs # rounded up If you just have the /*number*/ of CPUs then a setting that is appropriate for quad core, may be too /*small*/ for an octo core processor. If you just have the /*fraction*/ of CPUs then a setting that is appropriate for quad core, may be too /*large*/ for an octo core processor. Cheers, Gavin
pgsql-hackers by date: