Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAA4eK1JawFkqkP8xn1aWHTDzQLkAnsDFxVAXmbKCnOW1u4MhSA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
|
List | pgsql-hackers |
On Thu, Jan 18, 2018 at 8:52 AM, Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Jan 17, 2018 at 10:40 AM, Robert Haas <robertmhaas@gmail.com> wrote: > >>> (It might make sense to allow this if parallel_leader_participation >>> was *purely* a testing GUC, only for use by by backend hackers, but >>> AFAICT it isn't.) >> >> As applied to parallel CREATE INDEX, it pretty much is just a testing >> GUC, which is why I was skeptical about leaving support for it in the >> patch. There's no anticipated advantage to having the leader not >> participate -- unlike for parallel queries, where it is quite possible >> that setting parallel_leader_participation=off could be a win, even >> generally. If you just have a Gather over a parallel sequential scan, >> it is unlikely that parallel_leader_participation=off will help; it >> will most likely hurt, at least up to the point where more >> participants become a bad idea in general due to contention. > > It's unlikely to hurt much, since as you yourself said, > compute_parallel_worker() doesn't consider the leader's participation. > Actually, if we assume that compute_parallel_worker() is perfect, then > surely parallel_leader_participation=off would beat > parallel_leader_participation=on for CREATE INDEX -- it would allow us > to use the value that compute_parallel_worker() truly intended. Which > is the opposite of what you say about > parallel_leader_participation=off above. > > I am only trying to understand your perspective here. I don't think > that parallel_leader_participation support is that important. I think > that parallel_leader_participation=off might be slightly useful as a > way of discouraging parallel CREATE INDEX on smaller tables, just like > it is for parallel sequential scan (though this hinges on specifically > disallowing "degenerate parallel scan" cases). More often, it will > make hardly any difference if parallel_leader_participation is on or > off. > >> In other words, right now, parallel_leader_participation is not >> strictly a testing GUC, but if we make CREATE INDEX respect it, then >> we're pushing it towards being a GUC that you don't ever want to >> enable except for testing. I'm still not sure that's a very good >> idea, but if we're going to do it, then surely we should be >> consistent. > I see your point. OTOH, I think we should have something for testing purpose as that helps in catching the bugs and makes it easy to write tests that cover worker part of the code. > > I'm confused. I *don't* want it to be something that you can only use > for testing. I want to not hurt whatever case there is for the > parallel_leader_participation GUC being something that a DBA may tune > in production. I don't see the conflict here. > >> It's true that having one worker and no parallel leader >> participation can never be better than just having the leader do it, >> but it is also true that having two leaders and no parallel leader >> participation can never be better than having 1 worker with leader >> participation. I don't see a reason to treat those cases differently. > > You must mean "having two workers and no parallel leader participation...". > > The reason to treat those two cases differently is simple: One > couldn't possibly be desirable in production, and undermines the whole > idea of parallel_leader_participation being user visible by adding a > sharp edge. The other is likely to be pretty harmless, especially > because leader participation is generally pretty fudged, and our cost > model is fairly rough. The difference here isn't what is important; > avoiding doing something that we know couldn't possibly help under any > circumstances is important. I think that we should do that on general > principle. > > As I said in a prior e-mail, even parallel query's use of > parallel_leader_participation is consistent with what I propose here, > practically speaking, because a partial path without leader > participation will always lose to a serial sequential scan path in > practice. The fact that the optimizer will create a partial path that > makes a useless "degenerate parallel scan" a *theoretical* possibility > is irrelevant, because the optimizer has its own way of making sure > that such a plan doesn't actually get picked. It has its way, and so I > must have my own. > Can you please elaborate what part of optimizer are you talking about where without leader participation partial path will always lose to a serial sequential scan path? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: