Re: Doubts about pushing LIMIT to MergeAppendPath - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: Doubts about pushing LIMIT to MergeAppendPath |
Date | |
Msg-id | 31077.1541173740@localhost Whole thread Raw |
In response to | Re: Doubts about pushing LIMIT to MergeAppendPath (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 11/02/2018 08:16 AM, Antonin Houska wrote: > > Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > >> OK, so the reason is that when building child paths, we don't keep > >> the pathkeys unless it matches the "interesting" pathkeys. > >> > >> So for example we may have an IndexPath, but with pathkeys=NIL if > >> the index does not match the ORDER BY we need. > > > > I don't agree that IndexPath will necessarily have pathkeys set to > > NIL in such a case. Even if the index ordering does not match ORDER > > BY clause of the query, the path can still be useful, typically for > > merge join. > > > > But wouldn't that mean there's a MergeJoin below the Limit? I just wanted to say that pathkeys of MergeAppendPath do depend on those of child paths, however the MergeAppendPath should expect any pathkeys from the children. I mentioned MergeJoin just as an example reason for IndexPath to have pathkeys unrelated to the ORDER BY clause. > AFAIK we don't push limit_tuples to that node. I guess you mean that in a plan like this Limit -> MergeJoin -> MergeAppend the Limit node would not get propagated to MergeAppend and thus it would not prevent it from providing sufficient input for the additional sort? Anyway, the optimization we discuss here is only applied if the MergeAppend path is at the top of the join tree, so there cannot be any join above it. > After looking at this a bit more after a healthy dose of coffee, I think > for this issue to happen the Limit would have to be placed immediately > above the MergeAppend node. But if the ordering does not match, there > will be an explicit Sort node in between, and we'll push the limit only > to that node (and not below). Which is probably what's happening in the > incremental sort query, BTW. Yes, the Limit should be applied "from above", when we know that it's appropriate. (Currently it is applied by create_merge_append_path(), and at the moment we don't know what will hapen to the output.) But the problem of MergeAppend is that the limit can affect cost of the child nodes and the MergeAppend node itself. That's not clearly bad because if the MergeAppendPath won without the limit, the limit should not make the cost worse. However I think the only correct strategy is to apply the limit to all paths of given relation before set_cheapest() is called, because another path can benefit from the limit even more than the one to which we're trying to apply the limit. However at that early stage we don't have the complete plan tree so we don't know if the limit would ever get propagated down, and what its value would be ... > I certainly agree correctness must not depend on costing. But I don't > think that's really the case here - what you mentioned is merely one > part of the optimization, but there are other bits that make it work. As I mentioned upthread, I think the problem is currently hidden because the limit_tuples field of MergeAppendPath is only used for cost estimates, but executor is not aware of it. > At least that's my understanding - if you could construct a counter-example > demonstrating the failure, that'd be a clear proof of course. I spent some time on it today, but w/o success. However the incremental sort example I posted upthread should be sufficient reason to do something. I just don't know what. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
pgsql-hackers by date: