Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | 20150110180336.GQ3062@tamriel.snowman.net Whole thread Raw |
In response to | Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
* Amit Kapila (amit.kapila16@gmail.com) wrote: > At this moment if we can ensure that parallel plan should not be selected > for cases where it will perform poorly is more than enough considering > we have lots of other work left to even make any parallel operation work. The problem with this approach is that it doesn't consider any options between 'serial' and 'parallelize by factor X'. If the startup cost is 1000 and the factor is 32, then a seqscan which costs 31000 won't ever be parallelized, even though a factor of 8 would have parallelized it. You could forget about the per-process startup cost entirely, in fact, and simply say "only parallelize if it's more than X". Again, I don't like the idea of designing this with the assumption that the user dictates the right level of parallelization for each and every query. I'd love to go out and tell users "set the factor to the number of CPUs you have and we'll just use what makes sense." The same goes for max number of backends. If we set the parallel level to the number of CPUs and set the max backends to the same, then we end up with only one parallel query running at a time, ever. That's terrible. Now, we could set the parallel level lower or set the max backends higher, but either way we're going to end up either using less than we could or over-subscribing, neither of which is good. I agree that this makes it a bit different from work_mem, but in this case there's an overall max in the form of the maximum number of background workers. If we had something similar for work_mem, then we could set that higher and still trust the system to only use the amount of memory necessary (eg: a hashjoin doesn't use all available work_mem and neither does a sort, unless the set is larger than available memory). > > > Execution: > > > Most other databases does partition level scan for partition on > > > different disks by each individual parallel worker. However, > > > it seems amazon dynamodb [2] also works on something > > > similar to what I have used in patch which means on fixed > > > blocks. I think this kind of strategy seems better than dividing > > > the blocks at runtime because dividing randomly the blocks > > > among workers could lead to random scan for a parallel > > > sequential scan. > > > > Yeah, we also need to consider the i/o side of this, which will > > definitely be tricky. There are i/o systems out there which are faster > > than a single CPU and ones where a single CPU can manage multiple i/o > > channels. There are also cases where the i/o system handles sequential > > access nearly as fast as random and cases where sequential is much > > faster than random. Where we can get an idea of that distinction is > > with seq_page_cost vs. random_page_cost as folks running on SSDs tend to > > lower random_page_cost from the default to indicate that. > > > I am not clear, do you expect anything different in execution strategy > than what I have mentioned or does that sound reasonable to you? What I'd like is a way to figure out the right amount of CPU for each tablespace (0.25, 1, 2, 4, etc) and then use that many. Using a single CPU for each tablespace is likely to starve the CPU or starve the I/O system and I'm not sure if there's a way to address that. Note that I intentionally said tablespace there because that's how users can tell us what the different i/o channels are. I realize this ends up going beyond the current scope, but the parallel seqscan at the per relation level will only ever be using one i/o channel. It'd be neat if we could work out how fast that i/o channel is vs. the CPUs and determine how many CPUs are necessary to keep up with the i/o channel and then use more-or-less exactly that many for the scan. I agree that some of this can come later but I worry that starting out with a design that expects to always be told exactly how many CPUs to use when running a parallel query will be difficult to move away from later. Thanks, Stephen
pgsql-hackers by date: