Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Seq Scan vs kernel read ahead |
Date | |
Msg-id | CAApHDvqi+z0+zgUuKFJDntPnTgb8X31w0PZJw-4M5b9MFE5woQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan vs kernel read ahead (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel Seq Scan vs kernel read ahead
|
List | pgsql-hackers |
On Tue, 16 Jun 2020 at 03:29, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Jun 13, 2020 at 2:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > The performance can vary based on qualification where some workers > > discard more rows as compared to others, with the current system with > > step-size as one, the probability of unequal work among workers is > > quite low as compared to larger step-sizes. > > It seems like this would require incredibly bad luck, though. If the > step size is less than 1/1024 of the relation size, and we ramp down > for, say, the last 5% of the relation, then the worst case is that > chunk 972 of 1024 is super-slow compared to all the other blocks, so > that it takes longer to process chunk 972 only than it does to process > chunks 973-1024 combined. It is not impossible, but that chunk has to > be like 50x worse than all the others, which doesn't seem like > something that is going to happen often enough to be worth worrying > about very much. I'm not saying it will never happen. I'm just > skeptical about the benefit of adding a GUC or reloption for a corner > case like this. I think people will fiddle with it when it isn't > really needed, and won't realize it exists in the scenarios where it > would have helped. I'm trying to think of likely scenarios where "lots of work at the end" is going to be common. I can think of queue processing, but everything I can think about there requires an UPDATE to the processed flag, which won't be using parallel query anyway. There's then processing something based on some period of time like "the last hour", "today". For append-only tables the latest information is likely to be at the end of the heap. For that, anyone that's getting a SeqScan on a large relation should likely have added an index. If a btree is too costly, then BRIN is pretty perfect for that case. FWIW, I'm not really keen on adding a reloption or a GUC. I've also voiced here that I'm not even keen on the ramp-up. To summarise what's all been proposed so far: 1. Use a constant, (e.g. 64) as the parallel step size 2. Ramp up the step size over time 3. Ramp down the step size towards the end of the scan. 4. Auto-determine a good stepsize based on the size of the relation. 5. Add GUC to allow users to control or override the step size. 6. Add relption to allow users to control or override the step size. Here are my thoughts on each of those: #1 is a bad idea as there are legitimate use-cases for using parallel query on small tables. e.g calling some expensive parallel safe function. Small tables are more likely to be cached. #2 I don't quite understand why this is useful #3 I understand this is to try to make it so workers all complete around about the same time. #4 We really should be doing it this way. #5 Having a global knob to control something that is very specific to the size of a relation does not make much sense to me. #6. I imagine someone will have some weird use-case that works better when parallel workers get 1 page at a time. I'm not convinced that they're not doing something else wrong. So my vote is for 4 with possibly 3, if we can come up with something smart enough * that works well in parallel. I think there's less of a need for this if we divided the relation into more chunks, e.g. 8192 or 16384. * Perhaps when there are less than 2 full chunks remaining, workers can just take half of what is left. Or more specifically Max(pg_next_power2(remaining_blocks) / 2, 1), which ideally would work out allocating an amount of pages proportional to the amount of beer each mathematician receives in the "An infinite number of mathematicians walk into a bar" joke, obviously with the exception that we stop dividing when we get to 1. However, I'm not quite sure how well that can be made to work with multiple bartenders working in parallel. David
pgsql-hackers by date: