pgsql: Improve ineq_histogram_selectivity's behavior for non-default or - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Improve ineq_histogram_selectivity's behavior for non-default or
Date
Msg-id E1jhJNn-0002ki-2x@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Improve ineq_histogram_selectivity's behavior for non-default orderings.

ineq_histogram_selectivity() can be invoked in situations where the
ordering we care about is not that of the column's histogram.  We could
be considering some other collation, or even more drastically, the
query operator might not agree at all with what was used to construct
the histogram.  (We'll get here for anything using scalarineqsel-based
estimators, so that's quite likely to happen for extension operators.)

Up to now we just ignored this issue and assumed we were dealing with
an operator/collation whose sort order exactly matches the histogram,
possibly resulting in junk estimates if the binary search gets confused.
It's past time to improve that, since the use of nondefault collations
is increasing.  What we can do is verify that the given operator and
collation match what's recorded in pg_statistic, and use the existing
code only if so.  When they don't match, instead execute the operator
against each histogram entry, and take the fraction of successes as our
selectivity estimate.  This gives an estimate that is probably good to
about 1/histogram_size, with no assumptions about ordering.  (The quality
of the estimate is likely to degrade near the ends of the value range,
since the two orderings probably don't agree on what is an extremal value;
but this is surely going to be more reliable than what we did before.)

At some point we might further improve matters by storing more than one
histogram calculated according to different orderings.  But this code
would still be good fallback logic when no matches exist, so that is
not an argument for not doing this.

While here, also improve get_variable_range() to deal more honestly
with non-default collations.

This isn't back-patchable, because it requires adding another argument
to ineq_histogram_selectivity, and because it might have significant
impact on the estimation results for extension operators relying on
scalarineqsel --- mostly for the better, one hopes, but in any case
destabilizing plan choices in back branches is best avoided.

Per investigation of a report from James Lucas.

Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0c882e52a8660114234a0c4a29db919bb727e552

Modified Files
--------------
src/backend/utils/adt/like_support.c     |   4 +-
src/backend/utils/adt/selfuncs.c         | 206 ++++++++++++++++++++++---------
src/backend/utils/cache/lsyscache.c      |  62 ++++++++--
src/include/utils/lsyscache.h            |   1 +
src/include/utils/selfuncs.h             |   3 +-
src/test/regress/expected/privileges.out |   3 +
src/test/regress/sql/privileges.sql      |   3 +
7 files changed, 205 insertions(+), 77 deletions(-)


pgsql-committers by date:

Previous
From: Joe Conway
Date:
Subject: pgsql: Add unlikely() to CHECK_FOR_INTERRUPTS()
Next
From: Tom Lane
Date:
Subject: pgsql: Doc: remove annotations about multi-row output of set-returning