Partial-index predicate proofs got dumber in 9.2 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Partial-index predicate proofs got dumber in 9.2 |
Date | |
Msg-id | 22779.1353014615@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
I looked into the optimization regression reported here: http://archives.postgresql.org/pgsql-performance/2012-11/msg00140.php It's easy to reproduce the problem in the regression database: create index ti on tenk1 (fivethous) where fivethous is not null; explain select * from int4_tbl, tenk1 where f1 = fivethous; 9.1 and earlier will produce a nestloop-with-inner-indexscan, but 9.2 fails to (and ends up with a much-more-expensive hash join) because it doesn't think the partial index "ti" has been proven usable for the query. However, because the "=" operator is strict, no row in which fivethous is null could be relevant to the query result, so we should be able to use the partial index. The older code succeeds at this because when it is considering whether it can use an index for an inner indexscan, it will apply all the join clauses found by find_clauses_for_join() to the task of proving the index predicate true. (It's notable that this includes all join clauses mentioning the target relation, not only those that can be used with the index's columns; so we'll be able to prove things true even if the index predicate involves table columns that aren't in the index proper.) In 9.2, which has been rewritten to generate inner indexscans "bottom up" as parameterized paths, we simply skip generating parameterized paths for any index that's not marked predOK, and the predOK test is made using only the restriction clauses for that relation, not join clauses. Ooops. We could try to replicate the way the previous code did it, but it'd probably add significant expense. What I'm thinking at the moment is that check_partial_indexes() has been missing a bet all along, because there is no reason for it to confine its attention to restriction clauses. We should just have it include join clauses for the rel in the restriction list passed to predicate_implied_by(). That is, a join clause can be assumed true for the purposes of deciding whether an index is usable *whether or not we ever actually use that join clause with the index*. This claim is shaky in the presence of outer joins, though. Consider select ... from t1 left join t2 on (t1.a = t2.b) We can use a partial index on t2 that requires t2.b IS NOT NULL, since no row where b is null will affect the result. A more interesting case is select ... from t1 left join t2 on (t1.a = t2.b)where t2.c > 0; This is actually going to get simplified to a plain inner join, but even if it did not, we needn't fetch t2 rows where c is null or negative. That might result in generating some null-extended join rows that shouldn't be there, if particular values of t1.a no longer have matches, but the outer WHERE clause would eliminate such bogus rows anyway. However, that last conclusion depends on the outer WHERE clause being strict, which doesn't work for instance with select ... from t1 left join t2 on (t1.a = t2.b)where t2.c is null; We could not use an index having the predicate "c is null" to scan t2, else we might eliminate some rows that should produce matches to t1, allowing bogus null-extended rows to be produced (which would survive the outer WHERE). So I'm thinking that the correct heuristic for check_partial_indexes() is to use restriction clauses plus any join clauses that satisfy join_clause_is_movable_to(). That should prevent unsafe use of upper-level join clauses when there are outer joins below them. Thoughts? regards, tom lane
pgsql-hackers by date: