Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)
Date
Msg-id CA+TgmoZtjpCKZJx+fKU+armgg3AjmYM3R1ky-C91xv7=CWigLw@mail.gmail.com
Whole thread Raw
In response to Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sun, May 31, 2015 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> Obviously it makes little sense to use an (a,b,c) index to look up just
>> (a,b) and then filter on c; the question is, what is the planner doing
>> that leads it to get this so wrong?
>
> It's not so astonishing as all that; compare
>
> regression=# explain select * from t1 where a=3 and b=4;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Index Only Scan using t1_pkey on t1  (cost=0.28..8.29 rows=1 width=12)
>    Index Cond: ((a = 3) AND (b = 4))
> (2 rows)
>
> regression=# explain select * from t1 where a=3 and b=4 and c=5;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Index Only Scan using t1_pkey on t1  (cost=0.28..8.30 rows=1 width=12)
>    Index Cond: ((a = 3) AND (b = 4) AND (c = 5))
> (2 rows)
>
> Once you're down to an estimate of one row retrieved, adding additional
> index conditions simply increases the cost (not by much, but it increases)
> without delivering any visible benefit.

But Andrew's example is equivalent to planning the second query by
putting the quals on a and b into the index qual and treating c=5 as a
post-filter condition even though the index we're using is on (a, b,
c), which I don't believe we'd ever do.  If we did, I'd find that
astonishing, too.

> I believe what probably happened in this case is that the planner
> considered both forms of the indexscan path and concluded that they were
> fuzzily the same cost and rowcount, yet the path using only t2.a and t3.b
> clearly dominated by requiring strictly fewer outer relations for
> parameters.  So it threw away the path that also had the c = t4.c
> comparison before it ever got to the join stage.  Even had it kept that
> path, the join cost estimate wouldn't have looked any better than the one
> for the join it did pick, so there would have been no certainty of picking
> the "correct" plan.

It's just hard to believe that it's ever better to treat something as
a join filter than as an index condition.  Yes, checking the index
condition isn't free, either. But it doesn't seem like it should be
particularly more expensive than checking the same thing as a join
filter.  And if there a lot of rows involved, it's going to be a whole
lot LESS expensive.

I guess it's hard for me to credit the idea that a parameterized index
path constraining a superset of the columns present in some other
parameterized path on the same index has insufficient additional
selectivity to justify its existence.  Even in a world where planner
estimates are never wrong, I doubt treating the qual as a join filter
is ever meaningfully better.  The absolute best case - or so it seems
to me - is a tie.  If we underestimate the row counts, it's a loss.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Multixid hindsight design
Next
From: Josh Berkus
Date:
Subject: Re: pg_xlog -> pg_xjournal?