Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parameterized-path cost comparisons need some work
Date
Msg-id CA+TgmoY1GxgZB-Dk3YjA6bMP4r=npW-HbWVrbmMw4uDK302yRg@mail.gmail.com
Whole thread Raw
In response to Re: Parameterized-path cost comparisons need some work  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parameterized-path cost comparisons need some work  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Apr 12, 2012 at 3:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. Lobotomize add_path_precheck so it always returns true for a
> parameterized path.  This sounds horrid, but in the test cases I'm using
> it seems that this only results in doing the full path construction for
> a very small number of additional paths.

Query planner engineering is hard, because it's hard to predict what
kind of queries people will write, but this seems basically sane to
me.  Given that (if I recall our previous discuss on this point
correctly) we avoid generating parameterized paths in situations where
we could have simply revised the join order instead, we should only
really be getting any parameterized paths at all in situations where
they are likely to help.  Queries involving only inner joins, for
example, should never need a parameterized path covering more than a
single baserel; and even if you've got outer joins in the mix, most of
my queries tend to look like A IJ B IJ C IJ D LJ E LJ F LJ G, rather
than A IJ (B LJ C) which is where we actually need this machinery.  If
we spend a little more effort in that case it's quite likely to be
worth it; the trick is just to keep the extra work from bleeding into
the cases where it won't help.

> 3. Rearrange plan generation so that a parameterized path always uses
> all join clauses available from the specified outer rels.  (Any that
> don't work as indexquals would have to be applied as filter conditions.)
> If we did that, then we would be back to a situation where all paths
> with the same parameterization should yield the same rowcount, thus
> justifying letting add_path_precheck work as it does now.
>
> #3 would amount to pushing quals that would otherwise be checked at the
> nestloop join node down to the lowest inner-relation level where they
> could be checked.  This is something I'd suspected would be a good idea
> to start with, but hadn't gotten around to implementing for non-index
> quals.  It had not occurred to me that it might simplify cost estimation
> to always do that.

This seems like it could be quite a significant win.  It doesn't
really matter in <= 9.1 because in an old-style parameterized nestloop
the join filter is going to get applied immediately after the index
filter anyway, though I guess it's possible that you might save a
little bit by optimizing the order in which multiple filter conditions
are applied.  But if there can be intermediate joins in there then
it's a big deal; and the fact that it makes it easier to compare paths
and prune away bad ones earlier seems like a major benefit as well.
So I would be in favor of doing this if possible, but...

> I'm going to take a closer look at #3, but it may not be practical to
> try to squeeze it into 9.2; if not, I think #1 will do as a stopgap.

....I agree with this, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Memory usage during sorting
Next
From: Tom Lane
Date:
Subject: Improving our clauseless-join heuristics