Re: Row count estimation bug in BETWEEN? - Mailing list pgsql-general

From Tom Lane
Subject Re: Row count estimation bug in BETWEEN?
Date
Msg-id 10092.1434298145@sss.pgh.pa.us
Whole thread Raw
In response to Re: Row count estimation bug in BETWEEN?  (Yaroslav <ladayaroslav@yandex.ru>)
Responses Re: Row count estimation bug in BETWEEN?  (Yaroslav <ladayaroslav@yandex.ru>)
List pgsql-general
Yaroslav <ladayaroslav@yandex.ru> writes:
> Tom Lane-2 wrote
>> PG doesn't try to estimate inequalities exactly, because it usually
>> doesn't make enough of a difference to matter.  Currently we don't
>> even bother to distinguish say ">" from ">=" for estimation purposes,
>> though certainly we would need to in order to deal with zero-width ranges
>> with any great amount of precision.

> Thank you for your answer!

> I'm sorry, but after looking into documentation and sources
> (scalarineqsel function in selfuncs.c, clauselist_selectivity and
> addRangeClause functions in clausesel.c) and experimenting a little I've
> got an impression that PostgreSQL actually bothers to distinguish ">"
> from ">=" for estimation purposes sometimes (probably, when MCV is
> used), but in my example it uses histogram and indeed doesn't
> distinguish them.

Well, I was oversimplifying a bit.  When testing the MCV list we use the
original operator, so that if the comparison constant is equal to some
MCV entry, it will indeed matter whether you said ">" or ">=".  When
dealing with the histogram, however, we don't pay attention to the
difference.  The assumption is that the histogram represents a
continuous distribution of values in which no one value occurs often
enough to be interesting (if it did, it would be in the MCV list...).
Therefore it does not matter much whether any specific histogram entry
is exactly "=".  And of course, for comparison values that are between
histogram entries, we have no idea whatsoever whether there are any
"=" entries in the table; so even if the code did distinguish ">" from
">=", it would be unclear what to do with the knowledge.

            regards, tom lane


pgsql-general by date:

Previous
From: Yaroslav
Date:
Subject: Re: Row count estimation bug in BETWEEN?
Next
From: Anton Bushmelev
Date:
Subject: pg_last_xact_replay_timestamp lies