Re: Query optimization problem - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Query optimization problem
Date
Msg-id 13738.1280327696@sss.pgh.pa.us
Whole thread Raw
In response to Re: Query optimization problem  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-hackers
Yeb Havinga <yebhavinga@gmail.com> writes:
> Robert Haas wrote:
>> On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>>> Wouldn't it be relatively easy, to rewrite the filter expression by adding
>>> expressions, instead of replacing constants, in the disjunctive case, so the
>>> example at hand would become:
>>> 
>>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>>> AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

>> Yeah, that could be done, but it's not necessarily a win from a
>> performance standpoint.

> Not necessarily a win, but on the test case no significant increase in 
> planning time.

The problem is that it could cost you a lot in execution time, because
of the useless extra filter conditions that will be applied.  The
planner isn't going to notice that those conditions are redundant.
An even worse problem is that because it doesn't know that, it's going
to underestimate the combined selectivity of the two WHERE conditions,
resulting in drastic underestimates of the numbers of rows emitted,
possibly resulting in horribly bad plan choices that kill whatever
performance improvement you got at the bottom level.

What the EquivalenceClass machinery actually buys us is the ability to
deal with a set of partially-redundant possible filter conditions and
apply only enough of them to get a correct plan.  As an example, if the
query has A=B and B=C, we could deduce A=C, but we don't want to apply
all three equality conditions at runtime.  Instead we put all three
variables into an EC, and then there is logic to determine which of the
equality clauses implied by the EC should actually get applied where.
This avoids both the useless-checks-at-runtime problem and the problem
of wrong selectivity estimates.

To do something like this without generating stupid plans, we'd need
some sort of generalized EC mechanism that could figure out which
variants of the clause made the most sense in different contexts.
Or maybe something else entirely --- but just generating a lot of
variants of a clause and throwing them all into the existing mechanism
is not workable.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: ALTER TABLE SET STATISTICS requires AccessExclusiveLock
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Toward a column reorder solution