Re: Weird, bad 0.5% selectivity estimate for a column equal to itself - Mailing list pgsql-performance

From Tom Lane
Subject Re: Weird, bad 0.5% selectivity estimate for a column equal to itself
Date
Msg-id 24522.1371932688@sss.pgh.pa.us
Whole thread Raw
In response to Re: Weird, bad 0.5% selectivity estimate for a column equal to itself  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> On 06/21/2013 02:32 PM, Tom Lane wrote:
>> See DEFAULT_EQ_SEL.

> Why is it using that?  We have statistics on the column.  What reason
> would it have for using a default estimate?

The stats are generally consulted for "Var Op Constant" scenarios.
It doesn't know what to do with "Var Op Var" cases that aren't joins.
As long as we lack cross-column-correlation stats I doubt it'd be very
helpful to try to derive a stats-based number for such cases.  Of
course, "X = X" is a special case, but ...

>> But why exactly do you care?  Surely it's a stupid
>> query and you should fix it.

> (b) that query is also auto-generated by external software, so "just fix
> it" isn't as easy as it sounds.

Personally, I'll bet lunch that that external software is outright
broken, ie it probably thinks "X = X" is constant true and they found
they could save two lines of code and a few machine cycles by emitting
that rather than not emitting anything.  Of course, the amount of
parsing/planning time wasted in dealing with the useless-and-incorrect
clause exceeds what was saved by multiple orders of magnitude, but hey
it was easy.

It wouldn't take too much new code to get the planner to replace "X = X"
with "X IS NOT NULL", but I think we're probably fixing the wrong piece
of software if we do.

            regards, tom lane


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Weird, bad 0.5% selectivity estimate for a column equal to itself
Next
From: Tom Kincaid
Date:
Subject: Re: PHP Postgres query slower then PgAdmin