Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parameterized-path cost comparisons need some work
Date
Msg-id CA+TgmoZfOxNrnMJp8NgL08Y78-fh7TMCxDtCxDpoF6VqLFAXpg@mail.gmail.com
Whole thread Raw
In response to Re: Parameterized-path cost comparisons need some work  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parameterized-path cost comparisons need some work  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Parameterized-path cost comparisons need some work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Apr 17, 2012 at 12:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've been hacking away on a patch to do this, and attached is something
> that I think is pretty close to committable.  It needs another going-over
> and some new regression test cases, but it seems to work, and it fixes a
> number of things besides the above-mentioned issue.  In particular, this
> has a much more principled approach than HEAD does to the problem of where
> to place parameterizable join clauses in the plan tree; that can be seen
> in the one change in the existing regression tests, where we no longer
> generate a redundant upper-level copy of an OR join clause that the old
> code wasn't bright enough to get rid of.
>
> The patch is a bit large because I chose to revise the data representation.
> Instead of each Path having its own required_outer, rows, and
> param_clauses fields, now a parameterized Path has a pointer to a
> ParamPathInfo struct that it shares with other Paths for the same rel and
> the same parameterization.  This guarantees that such paths will have the
> same estimated rowcount, because we only compute that once per
> parameterization, which should save some work as well as making the world
> safe for add_path_precheck.

Seems reasonable.

> The only place where this approach proved a bit tricky was in handling
> AppendPaths and MergeAppendPaths, which didn't surprise me because that
> was a rough spot for the old way too (and indeed they aren't handled
> completely correctly in HEAD).  A parameterized path is now *required*
> to enforce all clauses that the join clause movement rules assign to it;
> but Append and MergeAppend don't do qual checking, and I didn't feel like
> changing that.  The method that I have settled on is to require all child
> paths of a parameterized append to have the exact same parameterization,
> IOW we push the qual checks down below the append.  Now the interesting
> point about that is that we want to support Appends wherein some children
> are seqscans and some are indexscans (consider a partitioned table where
> the parent is a dummy empty table with no indexes).  The "raw" situation
> there is that we'll have a plain seqscan path for the parent and then a
> collection of similarly-parameterized indexscan paths for the live
> partition children.  To make it possible to convert that case into a
> parameterized append path, I added parameterization support to seqscans
> and then wrote "reparameterize_path", which changes a Path to increase
> its parameterization level (and thereby assign it more pushed-down join
> clauses to check at runtime).  That allows us to reconfigure the seqscan
> to match the other children.  I've also added such support to
> SubqueryScan, on the grounds that the other common use of append paths is
> UNION ALL across subqueries.  We might later want to add parameterization
> support to other path types, but this seemed like enough for the moment.

OK.

> BTW, after writing the code for it I decided to remove creation of
> parameterized MergeAppendPaths from allpaths.c, though there is still some
> support for them elsewhere.  On reflection it seemed to me that the code
> was willing to create far too many of these, much more than their
> potential usefulness could justify (remember that parameterized paths must
> be on the inside of a nestloop, so their sort ordering is typically of
> marginal use).  We can put that back if we can think of a more restrictive
> heuristic for when to create them.

I guess the case in which this would matter is if you wrote something
like A LJ (B IJ C) where B and/or C has child tables and the best
method of joining them to each other is a marge join.  That doesn't
seem all that likely; normally a hash join or nested loop will be
better.  On the flip side I can't see that generating a bunch of extra
paths is going to hurt you much there either; they will fall away
pretty quickly if they aren't useful for merging.  Now, if you have
something like A IJ B or A LJ B, where B is partitioned, it's clearly
a waste of time to generate parameterized paths with pathkeys.

> The core of the patch is in the new functions get_baserel_parampathinfo
> and get_joinrel_parampathinfo, which look up or construct ParamPathInfos,
> and join_clause_is_parameterizable_for and
> join_clause_is_parameterizable_within, which control
> movement of parameterizable join clauses.  (I'm not that thrilled with the
> names of the latter two functions, anybody got a better idea?)  The rest
> of it is pretty much boilerplate changes and replacing ad-hoc logic with
> uses of this stuff.

Parameterizable is such a mouthful.  I wish we had a better word.

> I have a couple of other ideas in mind in the way of mop-up, but they are
> not in this patch to keep it from bloating even more.  First, I'm thinking
> we should get rid of RelOptInfo.baserestrictcost, thus forcing all scan
> cost estimators to invoke cost_qual_eval explicitly.  That field has been
> vestigial from a planning-speed standpoint for a long time, ever since we
> started caching eval costs in RestrictInfos.  The most commonly used cost
> estimators don't use it anymore as of this patch, and it'd likely be best
> to have a uniform coding pattern there.  Second, I've gotten dissatisfied
> with the terminology "required_outer" that was used in the original param
> plans patch.  I'm considering a global search and replace with
> "param_relids" or some variant of that.

Personally, I find required_outer more clear.  YMMV.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Gsoc2012 idea, tablesample
Next
From: Alvaro Herrera
Date:
Subject: Re: libpq URI and regression testing