Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parameterized-path cost comparisons need some work |
Date | |
Msg-id | CA+TgmoZfOxNrnMJp8NgL08Y78-fh7TMCxDtCxDpoF6VqLFAXpg@mail.gmail.com Whole thread Raw |
In response to | Re: Parameterized-path cost comparisons need some work (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parameterized-path cost comparisons need some work
Re: Parameterized-path cost comparisons need some work |
List | pgsql-hackers |
On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've been hacking away on a patch to do this, and attached is something > that I think is pretty close to committable. It needs another going-over > and some new regression test cases, but it seems to work, and it fixes a > number of things besides the above-mentioned issue. In particular, this > has a much more principled approach than HEAD does to the problem of where > to place parameterizable join clauses in the plan tree; that can be seen > in the one change in the existing regression tests, where we no longer > generate a redundant upper-level copy of an OR join clause that the old > code wasn't bright enough to get rid of. > > The patch is a bit large because I chose to revise the data representation. > Instead of each Path having its own required_outer, rows, and > param_clauses fields, now a parameterized Path has a pointer to a > ParamPathInfo struct that it shares with other Paths for the same rel and > the same parameterization. This guarantees that such paths will have the > same estimated rowcount, because we only compute that once per > parameterization, which should save some work as well as making the world > safe for add_path_precheck. Seems reasonable. > The only place where this approach proved a bit tricky was in handling > AppendPaths and MergeAppendPaths, which didn't surprise me because that > was a rough spot for the old way too (and indeed they aren't handled > completely correctly in HEAD). A parameterized path is now *required* > to enforce all clauses that the join clause movement rules assign to it; > but Append and MergeAppend don't do qual checking, and I didn't feel like > changing that. The method that I have settled on is to require all child > paths of a parameterized append to have the exact same parameterization, > IOW we push the qual checks down below the append. Now the interesting > point about that is that we want to support Appends wherein some children > are seqscans and some are indexscans (consider a partitioned table where > the parent is a dummy empty table with no indexes). The "raw" situation > there is that we'll have a plain seqscan path for the parent and then a > collection of similarly-parameterized indexscan paths for the live > partition children. To make it possible to convert that case into a > parameterized append path, I added parameterization support to seqscans > and then wrote "reparameterize_path", which changes a Path to increase > its parameterization level (and thereby assign it more pushed-down join > clauses to check at runtime). That allows us to reconfigure the seqscan > to match the other children. I've also added such support to > SubqueryScan, on the grounds that the other common use of append paths is > UNION ALL across subqueries. We might later want to add parameterization > support to other path types, but this seemed like enough for the moment. OK. > BTW, after writing the code for it I decided to remove creation of > parameterized MergeAppendPaths from allpaths.c, though there is still some > support for them elsewhere. On reflection it seemed to me that the code > was willing to create far too many of these, much more than their > potential usefulness could justify (remember that parameterized paths must > be on the inside of a nestloop, so their sort ordering is typically of > marginal use). We can put that back if we can think of a more restrictive > heuristic for when to create them. I guess the case in which this would matter is if you wrote something like A LJ (B IJ C) where B and/or C has child tables and the best method of joining them to each other is a marge join. That doesn't seem all that likely; normally a hash join or nested loop will be better. On the flip side I can't see that generating a bunch of extra paths is going to hurt you much there either; they will fall away pretty quickly if they aren't useful for merging. Now, if you have something like A IJ B or A LJ B, where B is partitioned, it's clearly a waste of time to generate parameterized paths with pathkeys. > The core of the patch is in the new functions get_baserel_parampathinfo > and get_joinrel_parampathinfo, which look up or construct ParamPathInfos, > and join_clause_is_parameterizable_for and > join_clause_is_parameterizable_within, which control > movement of parameterizable join clauses. (I'm not that thrilled with the > names of the latter two functions, anybody got a better idea?) The rest > of it is pretty much boilerplate changes and replacing ad-hoc logic with > uses of this stuff. Parameterizable is such a mouthful. I wish we had a better word. > I have a couple of other ideas in mind in the way of mop-up, but they are > not in this patch to keep it from bloating even more. First, I'm thinking > we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan > cost estimators to invoke cost_qual_eval explicitly. That field has been > vestigial from a planning-speed standpoint for a long time, ever since we > started caching eval costs in RestrictInfos. The most commonly used cost > estimators don't use it anymore as of this patch, and it'd likely be best > to have a uniform coding pattern there. Second, I've gotten dissatisfied > with the terminology "required_outer" that was used in the original param > plans patch. I'm considering a global search and replace with > "param_relids" or some variant of that. Personally, I find required_outer more clear. YMMV. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: