Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Date
Msg-id 666679.1591138428@sss.pgh.pa.us
Whole thread Raw
In response to Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
List pgsql-bugs
I wrote:
> I guess the path of least resistance is to change the selectivity
> functions to use the query's collation; then, if you get an error
> here you would have done so at runtime anyway.  The problem of
> inconsistency with the histogram collation will be real for
> ineq_histogram_selectivity; but we had a variant of that before,
> in that always using DEFAULT_COLLATION_OID would give answers
> that were wrong for a query using a different collation.

I worked on this for awhile and came up with the attached patchset.

0001 does about the minimum required to avoid this failure, by
passing the query's collation not stacoll to operators and selectivity
functions invoked during selectivity estimation.  Unfortunately, it
doesn't seem like we could sanely back-patch this, because it requires
adding parameters to several globally-visible functions.  The odds
that some external code is calling those functions seem too high to
risk an ABI break.  So, while I'd like to squeeze this into v13,
we still need to think about what to do for v12.

0002 addresses the mentioned problem with ineq_histogram_selectivity
by having that function actually verify that the query operator and
collation match what the pg_statistic histogram was generated with.
If they don't match, all is not lost.  What we can do is just
sequentially apply the query's operator and comparison constant to
each histogram entry, and take the fraction of matches as our
selectivity estimate.  This is more or less the same insight we have
used in generic_restriction_selectivity: the histogram is a pretty
decent sample of the column, even if its ordering is not quite what
you want.

0002 also deletes a hack I had put in get_attstatsslot() to insert a
dummy value into sslot->stacoll.  That hack isn't necessary any longer
(because indeed we aren't using sslot->stacoll's value anywhere as of
0001), and it breaks the verification check that 0002 wants to add to
ineq_histogram_selectivity, which depends on stacoll being truthful.
I also adjusted get_variable_range() to deal with collations more
honestly.

When I went to test 0002, I found out that it broke some test cases
in privileges.sql, and the reason was rather interesting.  What those
cases are relying on is getting a highly accurate selectivity
estimate for a user-defined operator, for which the only thing the
planner knows for sure is that it uses scalarltsel as the restriction
estimator.  Despite this lack of knowledge, the existing code just
blithely uses the histogram as though it is *precisely* applicable
to the user-defined operator.  (Which it is, since that operator is
just a wrapper around regular "<" ... but the system has no business
assuming that.)  So with the patch, the case exercises the new code
path that just counts matches, and that gives us only
1/default_statistics_target resolution in the selectivity estimate;
which is not enough to get the expected plan to be selected.  I worked
around this for the moment by cranking up default_statistics_target
while running the ANALYZE in that test script, but I wonder if we
should instead tweak those test cases to be more robust.

I think the combination of 0001+0002 really moves the goalposts a
long way in terms of having honest stats estimation for non-default
collations, so I'd like to sneak it into v13.  As for v12, about
the only alternatives I can think of are:

1. Do nothing, reasoning that if nobody noticed for a year, this
situation is enough of a corner case that we can leave it unfixed.
Obviously that's pretty unsatisfying.

2. Change all the stats functions to pass DEFAULT_COLLATION_OID
when invoking operator functions.  This is not too attractive
either because it essentially reverts 5e0928005; in fact, to avoid
breaking things completely we'd likely have to revert the part
of that commit that taught ANALYZE to collect stats using column
collations instead of DEFAULT_COLLATION_OID.  Then we get into
questions like what about 6b0faf723 --- it's going to be a mess.

3. Hack things up so that the core code renames all these exposed
functions to, say, ineq_histogram_selectivity_ext() and so on,
allowing the additional arguments to exist, but the old names would
still be there as ABI compatibility wrappers.  This might produce
slightly funny results for external code calling the wrappers, since
the wrappers would have to assume DEFAULT_COLLATION_OID, but it'd
avoid an ABI break at least.  I don't want to propagate such a thing
into HEAD, so this would leave us with unsightly differences between
v12 and earlier/later branches -- but there aren't *that* many places
involved.  (I'd envision this approach as back-porting 0001 but not
0002.  For one reason, there's noplace for a wrapper to get the
additional operator OID needed for ineq_histogram_selectivity_ext.
For another, the results for the privilege test suggest that 0002
might have surprising effects on user-defined operators, so back
patching it might draw more complaints.)

Alternatives #2 and #3 would result in (different) changes in the
selectivity estimates v12 produces when considering columns with
non-default collations and/or queries using collations that don't
match the relevant columns.  So that might be an argument for
doing nothing in v12; people tend not to like it when minor
releases cause unexpected plan changes.  Also, #2 is probably
strictly worse than #3 on this score, since it'd move such
estimates away from reality not towards it.

Thoughts?

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 4ac2ed5e54..778dbf1e98 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -582,7 +582,7 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     double        selec;

     /* Use generic restriction selectivity logic, with default 0.001. */
-    selec = generic_restriction_selectivity(root, operator,
+    selec = generic_restriction_selectivity(root, operator, InvalidOid,
                                             args, varRelid,
                                             0.001);

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 286e000d4e..ae5c8f084e 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -92,6 +92,7 @@ static Pattern_Prefix_Status pattern_fixed_prefix(Const *patt,
 static Selectivity prefix_selectivity(PlannerInfo *root,
                                       VariableStatData *vardata,
                                       Oid eqopr, Oid ltopr, Oid geopr,
+                                      Oid collation,
                                       Const *prefixcon);
 static Selectivity like_selectivity(const char *patt, int pattlen,
                                     bool case_insensitive);
@@ -534,12 +535,6 @@ patternsel_common(PlannerInfo *root,
      * something binary-compatible but different.)    We can use it to identify
      * the comparison operators and the required type of the comparison
      * constant, much as in match_pattern_prefix().
-     *
-     * NOTE: this logic does not consider collations.  Ideally we'd force use
-     * of "C" collation, but since ANALYZE only generates statistics for the
-     * column's specified collation, we have little choice but to use those.
-     * But our results are so approximate anyway that it probably hardly
-     * matters.
      */
     vartype = vardata.vartype;

@@ -622,7 +617,7 @@ patternsel_common(PlannerInfo *root,
         /*
          * Pattern specifies an exact match, so estimate as for '='
          */
-        result = var_eq_const(&vardata, eqopr, prefix->constvalue,
+        result = var_eq_const(&vardata, eqopr, collation, prefix->constvalue,
                               false, true, false);
     }
     else
@@ -654,7 +649,8 @@ patternsel_common(PlannerInfo *root,
             opfuncid = get_opcode(oprid);
         fmgr_info(opfuncid, &opproc);

-        selec = histogram_selectivity(&vardata, &opproc, constval, true,
+        selec = histogram_selectivity(&vardata, &opproc, collation,
+                                      constval, true,
                                       10, 1, &hist_size);

         /* If not at least 100 entries, use the heuristic method */
@@ -666,6 +662,7 @@ patternsel_common(PlannerInfo *root,
             if (pstatus == Pattern_Prefix_Partial)
                 prefixsel = prefix_selectivity(root, &vardata,
                                                eqopr, ltopr, geopr,
+                                               collation,
                                                prefix);
             else
                 prefixsel = 1.0;
@@ -698,7 +695,8 @@ patternsel_common(PlannerInfo *root,
          * directly to the result selectivity.  Also add up the total fraction
          * represented by MCV entries.
          */
-        mcv_selec = mcv_selectivity(&vardata, &opproc, constval, true,
+        mcv_selec = mcv_selectivity(&vardata, &opproc, collation,
+                                    constval, true,
                                     &sumcommon);

         /*
@@ -1196,7 +1194,7 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
  * population represented by the histogram --- the caller must fold this
  * together with info about MCVs and NULLs.
  *
- * We use the specified btree comparison operators to do the estimation.
+ * We use the given comparison operators and collation to do the estimation.
  * The given variable and Const must be of the associated datatype(s).
  *
  * XXX Note: we make use of the upper bound to estimate operator selectivity
@@ -1207,11 +1205,11 @@ pattern_fixed_prefix(Const *patt, Pattern_Type ptype, Oid collation,
 static Selectivity
 prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
                    Oid eqopr, Oid ltopr, Oid geopr,
+                   Oid collation,
                    Const *prefixcon)
 {
     Selectivity prefixsel;
     FmgrInfo    opproc;
-    AttStatsSlot sslot;
     Const       *greaterstrcon;
     Selectivity eq_sel;

@@ -1220,6 +1218,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,

     prefixsel = ineq_histogram_selectivity(root, vardata,
                                            &opproc, true, true,
+                                           collation,
                                            prefixcon->constvalue,
                                            prefixcon->consttype);

@@ -1229,27 +1228,18 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         return DEFAULT_MATCH_SEL;
     }

-    /*-------
-     * If we can create a string larger than the prefix, say
-     * "x < greaterstr".  We try to generate the string referencing the
-     * collation of the var's statistics, but if that's not available,
-     * use DEFAULT_COLLATION_OID.
-     *-------
+    /*
+     * If we can create a string larger than the prefix, say "x < greaterstr".
      */
-    if (HeapTupleIsValid(vardata->statsTuple) &&
-        get_attstatsslot(&sslot, vardata->statsTuple,
-                         STATISTIC_KIND_HISTOGRAM, InvalidOid, 0))
-         /* sslot.stacoll is set up */ ;
-    else
-        sslot.stacoll = DEFAULT_COLLATION_OID;
     fmgr_info(get_opcode(ltopr), &opproc);
-    greaterstrcon = make_greater_string(prefixcon, &opproc, sslot.stacoll);
+    greaterstrcon = make_greater_string(prefixcon, &opproc, collation);
     if (greaterstrcon)
     {
         Selectivity topsel;

         topsel = ineq_histogram_selectivity(root, vardata,
                                             &opproc, false, false,
+                                            collation,
                                             greaterstrcon->constvalue,
                                             greaterstrcon->consttype);

@@ -1278,7 +1268,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
      * probably off the end of the histogram, and thus we probably got a very
      * small estimate from the >= condition; so we still need to clamp.
      */
-    eq_sel = var_eq_const(vardata, eqopr, prefixcon->constvalue,
+    eq_sel = var_eq_const(vardata, eqopr, collation, prefixcon->constvalue,
                           false, true, false);

     prefixsel = Max(prefixsel, eq_sel);
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 863efd3d76..955e0ee87f 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -137,7 +137,8 @@ networksel(PG_FUNCTION_ARGS)
      * by MCV entries.
      */
     fmgr_info(get_opcode(operator), &proc);
-    mcv_selec = mcv_selectivity(&vardata, &proc, constvalue, varonleft,
+    mcv_selec = mcv_selectivity(&vardata, &proc, InvalidOid,
+                                constvalue, varonleft,
                                 &sumcommon);

     /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cfb05682bc..2332277307 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -88,11 +88,7 @@
  * (if any) is passed using the standard fmgr mechanism, so that the estimator
  * function can fetch it with PG_GET_COLLATION().  Note, however, that all
  * statistics in pg_statistic are currently built using the relevant column's
- * collation.  Thus, in most cases where we are looking at statistics, we
- * should ignore the operator collation and use the stats entry's collation.
- * We expect that the error induced by doing this is usually not large enough
- * to justify complicating matters.  In any case, doing otherwise would yield
- * entirely garbage results for ordered stats data such as histograms.
+ * collation.
  *----------
  */

@@ -149,14 +145,14 @@ get_relation_stats_hook_type get_relation_stats_hook = NULL;
 get_index_stats_hook_type get_index_stats_hook = NULL;

 static double eqsel_internal(PG_FUNCTION_ARGS, bool negate);
-static double eqjoinsel_inner(Oid opfuncoid,
+static double eqjoinsel_inner(Oid opfuncoid, Oid collation,
                               VariableStatData *vardata1, VariableStatData *vardata2,
                               double nd1, double nd2,
                               bool isdefault1, bool isdefault2,
                               AttStatsSlot *sslot1, AttStatsSlot *sslot2,
                               Form_pg_statistic stats1, Form_pg_statistic stats2,
                               bool have_mcvs1, bool have_mcvs2);
-static double eqjoinsel_semi(Oid opfuncoid,
+static double eqjoinsel_semi(Oid opfuncoid, Oid collation,
                              VariableStatData *vardata1, VariableStatData *vardata2,
                              double nd1, double nd2,
                              bool isdefault1, bool isdefault2,
@@ -194,10 +190,11 @@ static double convert_timevalue_to_scalar(Datum value, Oid typid,
 static void examine_simple_variable(PlannerInfo *root, Var *var,
                                     VariableStatData *vardata);
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                               Oid sortop, Datum *min, Datum *max);
+                               Oid sortop, Oid collation,
+                               Datum *min, Datum *max);
 static bool get_actual_variable_range(PlannerInfo *root,
                                       VariableStatData *vardata,
-                                      Oid sortop,
+                                      Oid sortop, Oid collation,
                                       Datum *min, Datum *max);
 static bool get_actual_variable_endpoint(Relation heapRel,
                                          Relation indexRel,
@@ -235,6 +232,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -268,12 +266,12 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
      * in the query.)
      */
     if (IsA(other, Const))
-        selec = var_eq_const(&vardata, operator,
+        selec = var_eq_const(&vardata, operator, collation,
                              ((Const *) other)->constvalue,
                              ((Const *) other)->constisnull,
                              varonleft, negate);
     else
-        selec = var_eq_non_const(&vardata, operator, other,
+        selec = var_eq_non_const(&vardata, operator, collation, other,
                                  varonleft, negate);

     ReleaseVariableStats(vardata);
@@ -287,7 +285,7 @@ eqsel_internal(PG_FUNCTION_ARGS, bool negate)
  * This is exported so that some other estimation functions can use it.
  */
 double
-var_eq_const(VariableStatData *vardata, Oid operator,
+var_eq_const(VariableStatData *vardata, Oid operator, Oid collation,
              Datum constval, bool constisnull,
              bool varonleft, bool negate)
 {
@@ -356,7 +354,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
              * eqproc returns NULL, though really equality functions should
              * never do that.
              */
-            InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot.stacoll,
+            InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                      NULL, NULL);
             fcinfo->args[0].isnull = false;
             fcinfo->args[1].isnull = false;
@@ -458,7 +456,7 @@ var_eq_const(VariableStatData *vardata, Oid operator,
  * This is exported so that some other estimation functions can use it.
  */
 double
-var_eq_non_const(VariableStatData *vardata, Oid operator,
+var_eq_non_const(VariableStatData *vardata, Oid operator, Oid collation,
                  Node *other,
                  bool varonleft, bool negate)
 {
@@ -573,6 +571,7 @@ neqsel(PG_FUNCTION_ARGS)
  */
 static double
 scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
+              Oid collation,
               VariableStatData *vardata, Datum constval, Oid consttype)
 {
     Form_pg_statistic stats;
@@ -672,7 +671,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      * to the result selectivity.  Also add up the total fraction represented
      * by MCV entries.
      */
-    mcv_selec = mcv_selectivity(vardata, &opproc, constval, true,
+    mcv_selec = mcv_selectivity(vardata, &opproc, collation, constval, true,
                                 &sumcommon);

     /*
@@ -681,6 +680,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      */
     hist_selec = ineq_histogram_selectivity(root, vardata,
                                             &opproc, isgt, iseq,
+                                            collation,
                                             constval, consttype);

     /*
@@ -722,7 +722,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
  * if there is no MCV list.
  */
 double
-mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc, Oid collation,
                 Datum constval, bool varonleft,
                 double *sumcommonp)
 {
@@ -749,7 +749,7 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
          * operators that can return NULL.  A small side benefit is to not
          * need to re-initialize the fcinfo struct from scratch each time.
          */
-        InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll,
+        InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -813,7 +813,8 @@ mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
  * prudent to clamp the result range, ie, disbelieve exact 0 or 1 outputs.
  */
 double
-histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+histogram_selectivity(VariableStatData *vardata,
+                      FmgrInfo *opproc, Oid collation,
                       Datum constval, bool varonleft,
                       int min_hist_size, int n_skip,
                       int *hist_size)
@@ -846,7 +847,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
              * is to not need to re-initialize the fcinfo struct from scratch
              * each time.
              */
-            InitFunctionCallInfoData(*fcinfo, opproc, 2, sslot.stacoll,
+            InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
                                      NULL, NULL);
             fcinfo->args[0].isnull = false;
             fcinfo->args[1].isnull = false;
@@ -903,7 +904,7 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
  * Otherwise, fall back to the default selectivity provided by the caller.
  */
 double
-generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
+generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
                                 List *args, int varRelid,
                                 double default_selectivity)
 {
@@ -946,7 +947,8 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
         /*
          * Calculate the selectivity for the column's most common values.
          */
-        mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft,
+        mcvsel = mcv_selectivity(&vardata, &opproc, collation,
+                                 constval, varonleft,
                                  &mcvsum);

         /*
@@ -955,7 +957,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
          * population.  Otherwise use the default selectivity for the non-MCV
          * population.
          */
-        selec = histogram_selectivity(&vardata, &opproc,
+        selec = histogram_selectivity(&vardata, &opproc, collation,
                                       constval, varonleft,
                                       10, 1, &hist_size);
         if (selec < 0)
@@ -1029,6 +1031,7 @@ double
 ineq_histogram_selectivity(PlannerInfo *root,
                            VariableStatData *vardata,
                            FmgrInfo *opproc, bool isgt, bool iseq,
+                           Oid collation,
                            Datum constval, Oid consttype)
 {
     double        hist_selec;
@@ -1042,9 +1045,11 @@ ineq_histogram_selectivity(PlannerInfo *root,
      * column type.  However, to make that work we will need to figure out
      * which staop to search for --- it's not necessarily the one we have at
      * hand!  (For example, we might have a '<=' operator rather than the '<'
-     * operator that will appear in staop.)  For now, assume that whatever
-     * appears in pg_statistic is sorted the same way our operator sorts, or
-     * the reverse way if isgt is true.
+     * operator that will appear in staop.)  The collation might not agree
+     * either.  For now, just assume that whatever appears in pg_statistic is
+     * sorted the same way our operator sorts, or the reverse way if isgt is
+     * true.  This could result in a bogus estimate, but it still seems better
+     * than falling back to the default estimate.
      */
     if (HeapTupleIsValid(vardata->statsTuple) &&
         statistic_proc_security_check(vardata, opproc->fn_oid) &&
@@ -1090,6 +1095,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                 have_end = get_actual_variable_range(root,
                                                      vardata,
                                                      sslot.staop,
+                                                     collation,
                                                      &sslot.values[0],
                                                      &sslot.values[1]);

@@ -1107,17 +1113,19 @@ ineq_histogram_selectivity(PlannerInfo *root,
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          &sslot.values[0],
                                                          NULL);
                 else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2)
                     have_end = get_actual_variable_range(root,
                                                          vardata,
                                                          sslot.staop,
+                                                         collation,
                                                          NULL,
                                                          &sslot.values[probe]);

                 ltcmp = DatumGetBool(FunctionCall2Coll(opproc,
-                                                       sslot.stacoll,
+                                                       collation,
                                                        sslot.values[probe],
                                                        constval));
                 if (isgt)
@@ -1202,7 +1210,7 @@ ineq_histogram_selectivity(PlannerInfo *root,
                  * values to a uniform comparison scale, and do a linear
                  * interpolation within this bin.
                  */
-                if (convert_to_scalar(constval, consttype, sslot.stacoll,
+                if (convert_to_scalar(constval, consttype, collation,
                                       &val,
                                       sslot.values[i - 1], sslot.values[i],
                                       vardata->vartype,
@@ -1342,6 +1350,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     VariableStatData vardata;
     Node       *other;
     bool        varonleft;
@@ -1394,7 +1403,7 @@ scalarineqsel_wrapper(PG_FUNCTION_ARGS, bool isgt, bool iseq)
     }

     /* The rest of the work is done by scalarineqsel(). */
-    selec = scalarineqsel(root, operator, isgt, iseq,
+    selec = scalarineqsel(root, operator, isgt, iseq, collation,
                           &vardata, constval, consttype);

     ReleaseVariableStats(vardata);
@@ -1459,7 +1468,7 @@ boolvarsel(PlannerInfo *root, Node *arg, int varRelid)
          * A boolean variable V is equivalent to the clause V = 't', so we
          * compute the selectivity as if that is what we have.
          */
-        selec = var_eq_const(&vardata, BooleanEqualOperator,
+        selec = var_eq_const(&vardata, BooleanEqualOperator, InvalidOid,
                              BoolGetDatum(true), false, true, false);
     }
     else
@@ -2185,6 +2194,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     JoinType    jointype = (JoinType) PG_GETARG_INT16(3);
 #endif
     SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) PG_GETARG_POINTER(4);
+    Oid            collation = PG_GET_COLLATION();
     double        selec;
     double        selec_inner;
     VariableStatData vardata1;
@@ -2235,7 +2245,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
     }

     /* We need to compute the inner-join selectivity in all cases */
-    selec_inner = eqjoinsel_inner(opfuncoid,
+    selec_inner = eqjoinsel_inner(opfuncoid, collation,
                                   &vardata1, &vardata2,
                                   nd1, nd2,
                                   isdefault1, isdefault2,
@@ -2262,7 +2272,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
             inner_rel = find_join_input_rel(root, sjinfo->min_righthand);

             if (!join_is_reversed)
-                selec = eqjoinsel_semi(opfuncoid,
+                selec = eqjoinsel_semi(opfuncoid, collation,
                                        &vardata1, &vardata2,
                                        nd1, nd2,
                                        isdefault1, isdefault2,
@@ -2275,7 +2285,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
                 Oid            commop = get_commutator(operator);
                 Oid            commopfuncoid = OidIsValid(commop) ? get_opcode(commop) : InvalidOid;

-                selec = eqjoinsel_semi(commopfuncoid,
+                selec = eqjoinsel_semi(commopfuncoid, collation,
                                        &vardata2, &vardata1,
                                        nd2, nd1,
                                        isdefault2, isdefault1,
@@ -2323,7 +2333,7 @@ eqjoinsel(PG_FUNCTION_ARGS)
  * that it's worth trying to distinguish them here.
  */
 static double
-eqjoinsel_inner(Oid opfuncoid,
+eqjoinsel_inner(Oid opfuncoid, Oid collation,
                 VariableStatData *vardata1, VariableStatData *vardata2,
                 double nd1, double nd2,
                 bool isdefault1, bool isdefault2,
@@ -2373,7 +2383,7 @@ eqjoinsel_inner(Oid opfuncoid,
          * returns NULL, though really equality functions should never do
          * that.
          */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll,
+        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -2520,7 +2530,7 @@ eqjoinsel_inner(Oid opfuncoid,
  * Unlike eqjoinsel_inner, we have to cope with opfuncoid being InvalidOid.
  */
 static double
-eqjoinsel_semi(Oid opfuncoid,
+eqjoinsel_semi(Oid opfuncoid, Oid collation,
                VariableStatData *vardata1, VariableStatData *vardata2,
                double nd1, double nd2,
                bool isdefault1, bool isdefault2,
@@ -2603,7 +2613,7 @@ eqjoinsel_semi(Oid opfuncoid,
          * returns NULL, though really equality functions should never do
          * that.
          */
-        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, sslot1->stacoll,
+        InitFunctionCallInfoData(*fcinfo, &eqproc, 2, collation,
                                  NULL, NULL);
         fcinfo->args[0].isnull = false;
         fcinfo->args[1].isnull = false;
@@ -2851,6 +2861,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     Oid            op_lefttype;
     Oid            op_righttype;
     Oid            opno,
+                collation,
                 lsortop,
                 rsortop,
                 lstatop,
@@ -2875,6 +2886,7 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     if (!is_opclause(clause))
         return;                    /* shouldn't happen */
     opno = ((OpExpr *) clause)->opno;
+    collation = ((OpExpr *) clause)->inputcollid;
     left = get_leftop((Expr *) clause);
     right = get_rightop((Expr *) clause);
     if (!right)
@@ -3008,20 +3020,20 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
     /* Try to get ranges of both inputs */
     if (!isgt)
     {
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmin, &leftmax))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmin, &rightmax))
             goto fail;            /* no range available from stats */
     }
     else
     {
         /* need to swap the max and min */
-        if (!get_variable_range(root, &leftvar, lstatop,
+        if (!get_variable_range(root, &leftvar, lstatop, collation,
                                 &leftmax, &leftmin))
             goto fail;            /* no range available from stats */
-        if (!get_variable_range(root, &rightvar, rstatop,
+        if (!get_variable_range(root, &rightvar, rstatop, collation,
                                 &rightmax, &rightmin))
             goto fail;            /* no range available from stats */
     }
@@ -3031,13 +3043,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * fraction that's <= the right-side maximum value.  But only believe
      * non-default estimates, else stick with our 1.0.
      */
-    selec = scalarineqsel(root, leop, isgt, true, &leftvar,
+    selec = scalarineqsel(root, leop, isgt, true, collation, &leftvar,
                           rightmax, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftend = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revleop, isgt, true, &rightvar,
+    selec = scalarineqsel(root, revleop, isgt, true, collation, &rightvar,
                           leftmax, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightend = selec;
@@ -3061,13 +3073,13 @@ mergejoinscansel(PlannerInfo *root, Node *clause,
      * minimum value.  But only believe non-default estimates, else stick with
      * our own default.
      */
-    selec = scalarineqsel(root, ltop, isgt, false, &leftvar,
+    selec = scalarineqsel(root, ltop, isgt, false, collation, &leftvar,
                           rightmin, op_righttype);
     if (selec != DEFAULT_INEQ_SEL)
         *leftstart = selec;

     /* And similarly for the right variable. */
-    selec = scalarineqsel(root, revltop, isgt, false, &rightvar,
+    selec = scalarineqsel(root, revltop, isgt, false, collation, &rightvar,
                           leftmin, op_lefttype);
     if (selec != DEFAULT_INEQ_SEL)
         *rightstart = selec;
@@ -3147,10 +3159,11 @@ matchingsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
+    Oid            collation = PG_GET_COLLATION();
     double        selec;

     /* Use generic restriction selectivity logic. */
-    selec = generic_restriction_selectivity(root, operator,
+    selec = generic_restriction_selectivity(root, operator, collation,
                                             args, varRelid,
                                             DEFAULT_MATCHING_SEL);

@@ -5337,9 +5350,11 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
  *
  * sortop is the "<" comparison operator to use.  This should generally
  * be "<" not ">", as only the former is likely to be found in pg_statistic.
+ * The collation must be specified too.
  */
 static bool
-get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
+get_variable_range(PlannerInfo *root, VariableStatData *vardata,
+                   Oid sortop, Oid collation,
                    Datum *min, Datum *max)
 {
     Datum        tmin = 0;
@@ -5359,7 +5374,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      * before enabling this.
      */
 #ifdef NOT_USED
-    if (get_actual_variable_range(root, vardata, sortop, min, max))
+    if (get_actual_variable_range(root, vardata, sortop, collation, min, max))
         return true;
 #endif

@@ -5387,7 +5402,7 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
      *
      * If there is a histogram that is sorted with some other operator than
      * the one we want, fail --- this suggests that there is data we can't
-     * use.
+     * use.  XXX consider collation too.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_HISTOGRAM, sortop,
@@ -5434,14 +5449,14 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
                 continue;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                sslot.values[i], tmin)))
             {
                 tmin = sslot.values[i];
                 tmin_is_mcv = true;
             }
             if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               sslot.stacoll,
+                                               collation,
                                                tmax, sslot.values[i])))
             {
                 tmax = sslot.values[i];
@@ -5471,10 +5486,11 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop,
  *        If no data available, return false.
  *
  * sortop is the "<" comparison operator to use.
+ * collation is the required collation.
  */
 static bool
 get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
-                          Oid sortop,
+                          Oid sortop, Oid collation,
                           Datum *min, Datum *max)
 {
     bool        have_data = false;
@@ -5514,9 +5530,11 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
             continue;

         /*
-         * The first index column must match the desired variable and sort
-         * operator --- but we can use a descending-order index.
+         * The first index column must match the desired variable, sortop, and
+         * collation --- but we can use a descending-order index.
          */
+        if (collation != index->indexcollations[0])
+            continue;            /* test first 'cause it's cheapest */
         if (!match_index_to_operand(vardata->var, 0, index))
             continue;
         switch (get_op_opfamily_strategy(sortop, index->sortopfamily[0]))
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 9690b4e486..15d2289024 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -144,24 +144,30 @@ extern void get_join_variables(PlannerInfo *root, List *args,
                                bool *join_is_reversed);
 extern double get_variable_numdistinct(VariableStatData *vardata,
                                        bool *isdefault);
-extern double mcv_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+extern double mcv_selectivity(VariableStatData *vardata,
+                              FmgrInfo *opproc, Oid collation,
                               Datum constval, bool varonleft,
                               double *sumcommonp);
-extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
+extern double histogram_selectivity(VariableStatData *vardata,
+                                    FmgrInfo *opproc, Oid collation,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
-extern double generic_restriction_selectivity(PlannerInfo *root, Oid oproid,
+extern double generic_restriction_selectivity(PlannerInfo *root,
+                                              Oid oproid, Oid collation,
                                               List *args, int varRelid,
                                               double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,
+                                         Oid collation,
                                          Datum constval, Oid consttype);
-extern double var_eq_const(VariableStatData *vardata, Oid oproid,
+extern double var_eq_const(VariableStatData *vardata,
+                           Oid oproid, Oid collation,
                            Datum constval, bool constisnull,
                            bool varonleft, bool negate);
-extern double var_eq_non_const(VariableStatData *vardata, Oid oproid,
+extern double var_eq_non_const(VariableStatData *vardata,
+                               Oid oproid, Oid collation,
                                Node *other,
                                bool varonleft, bool negate);

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index ae5c8f084e..bcfbaa1c3d 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -1217,7 +1217,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
     fmgr_info(get_opcode(geopr), &opproc);

     prefixsel = ineq_histogram_selectivity(root, vardata,
-                                           &opproc, true, true,
+                                           geopr, &opproc, true, true,
                                            collation,
                                            prefixcon->constvalue,
                                            prefixcon->consttype);
@@ -1238,7 +1238,7 @@ prefix_selectivity(PlannerInfo *root, VariableStatData *vardata,
         Selectivity topsel;

         topsel = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, false, false,
+                                            ltopr, &opproc, false, false,
                                             collation,
                                             greaterstrcon->constvalue,
                                             greaterstrcon->consttype);
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 2332277307..208744cd3a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -192,6 +192,10 @@ static void examine_simple_variable(PlannerInfo *root, Var *var,
 static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                                Oid sortop, Oid collation,
                                Datum *min, Datum *max);
+static void get_stats_slot_range(AttStatsSlot *sslot,
+                                 Oid opfuncoid, FmgrInfo *opproc,
+                                 Oid collation, int16 typLen, bool typByVal,
+                                 Datum *min, Datum *max, bool *p_have_data);
 static bool get_actual_variable_range(PlannerInfo *root,
                                       VariableStatData *vardata,
                                       Oid sortop, Oid collation,
@@ -679,7 +683,7 @@ scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
      * compute the resulting contribution to selectivity.
      */
     hist_selec = ineq_histogram_selectivity(root, vardata,
-                                            &opproc, isgt, iseq,
+                                            operator, &opproc, isgt, iseq,
                                             collation,
                                             constval, consttype);

@@ -1019,6 +1023,9 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
  * satisfies the inequality condition, ie, VAR < (or <=, >, >=) CONST.
  * The isgt and iseq flags distinguish which of the four cases apply.
  *
+ * While opproc could be looked up from the operator OID, common callers
+ * also need to call it separately, so we make the caller pass both.
+ *
  * Returns -1 if there is no histogram (valid results will always be >= 0).
  *
  * Note that the result disregards both the most-common-values (if any) and
@@ -1030,7 +1037,7 @@ generic_restriction_selectivity(PlannerInfo *root, Oid oproid, Oid collation,
 double
 ineq_histogram_selectivity(PlannerInfo *root,
                            VariableStatData *vardata,
-                           FmgrInfo *opproc, bool isgt, bool iseq,
+                           Oid opoid, FmgrInfo *opproc, bool isgt, bool iseq,
                            Oid collation,
                            Datum constval, Oid consttype)
 {
@@ -1057,7 +1064,9 @@ ineq_histogram_selectivity(PlannerInfo *root,
                          STATISTIC_KIND_HISTOGRAM, InvalidOid,
                          ATTSTATSSLOT_VALUES))
     {
-        if (sslot.nvalues > 1)
+        if (sslot.nvalues > 1 &&
+            sslot.stacoll == collation &&
+            comparison_ops_are_compatible(sslot.staop, opoid))
         {
             /*
              * Use binary search to find the desired location, namely the
@@ -1332,6 +1341,49 @@ ineq_histogram_selectivity(PlannerInfo *root,
                     hist_selec = 1.0 - cutoff;
             }
         }
+        else if (sslot.nvalues > 1)
+        {
+            /*
+             * If we get here, we have a histogram but it's not sorted the way
+             * we want.  Do a brute-force search to see how many of the
+             * entries satisfy the comparison condition, and take that
+             * fraction as our estimate.  (This is identical to the inner loop
+             * of histogram_selectivity; maybe share code?)
+             */
+            LOCAL_FCINFO(fcinfo, 2);
+            int            nmatch = 0;
+
+            InitFunctionCallInfoData(*fcinfo, opproc, 2, collation,
+                                     NULL, NULL);
+            fcinfo->args[0].isnull = false;
+            fcinfo->args[1].isnull = false;
+            fcinfo->args[1].value = constval;
+            for (int i = 0; i < sslot.nvalues; i++)
+            {
+                Datum        fresult;
+
+                fcinfo->args[0].value = sslot.values[i];
+                fcinfo->isnull = false;
+                fresult = FunctionCallInvoke(fcinfo);
+                if (!fcinfo->isnull && DatumGetBool(fresult))
+                    nmatch++;
+            }
+            hist_selec = ((double) nmatch) / ((double) sslot.nvalues);
+
+            /*
+             * As above, clamp to a hundredth of the histogram resolution.
+             * This case is surely even less trustworthy than the normal one,
+             * so we shouldn't believe exact 0 or 1 selectivity.
+             */
+            {
+                double        cutoff = 0.01 / (double) (sslot.nvalues - 1);
+
+                if (hist_selec < cutoff)
+                    hist_selec = cutoff;
+                else if (hist_selec > 1.0 - cutoff)
+                    hist_selec = 1.0 - cutoff;
+            }
+        }

         free_attstatsslot(&sslot);
     }
@@ -5363,8 +5415,8 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     int16        typLen;
     bool        typByVal;
     Oid            opfuncoid;
+    FmgrInfo    opproc;
     AttStatsSlot sslot;
-    int            i;

     /*
      * XXX It's very tempting to try to use the actual column min and max, if
@@ -5395,20 +5447,19 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
                                        (opfuncoid = get_opcode(sortop))))
         return false;

+    opproc.fn_oid = InvalidOid; /* mark this as not looked up yet */
+
     get_typlenbyval(vardata->atttype, &typLen, &typByVal);

     /*
-     * If there is a histogram, grab the first and last values.
-     *
-     * If there is a histogram that is sorted with some other operator than
-     * the one we want, fail --- this suggests that there is data we can't
-     * use.  XXX consider collation too.
+     * If there is a histogram with the ordering we want, grab the first and
+     * last values.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_HISTOGRAM, sortop,
                          ATTSTATSSLOT_VALUES))
     {
-        if (sslot.nvalues > 0)
+        if (sslot.stacoll == collation && sslot.nvalues > 0)
         {
             tmin = datumCopy(sslot.values[0], typByVal, typLen);
             tmax = datumCopy(sslot.values[sslot.nvalues - 1], typByVal, typLen);
@@ -5416,57 +5467,36 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
         }
         free_attstatsslot(&sslot);
     }
-    else if (get_attstatsslot(&sslot, vardata->statsTuple,
-                              STATISTIC_KIND_HISTOGRAM, InvalidOid,
-                              0))
+
+    /*
+     * Otherwise, if there is a histogram with some other ordering, scan it
+     * and get the min and max values according to the ordering we want.  This
+     * of course may not find values that are really extremal according to our
+     * ordering, but it beats ignoring available data.
+     */
+    if (!have_data &&
+        get_attstatsslot(&sslot, vardata->statsTuple,
+                         STATISTIC_KIND_HISTOGRAM, InvalidOid,
+                         ATTSTATSSLOT_VALUES))
     {
+        get_stats_slot_range(&sslot, opfuncoid, &opproc,
+                             collation, typLen, typByVal,
+                             &tmin, &tmax, &have_data);
         free_attstatsslot(&sslot);
-        return false;
     }

     /*
      * If we have most-common-values info, look for extreme MCVs.  This is
      * needed even if we also have a histogram, since the histogram excludes
-     * the MCVs.  However, usually the MCVs will not be the extreme values, so
-     * avoid unnecessary data copying.
+     * the MCVs.
      */
     if (get_attstatsslot(&sslot, vardata->statsTuple,
                          STATISTIC_KIND_MCV, InvalidOid,
                          ATTSTATSSLOT_VALUES))
     {
-        bool        tmin_is_mcv = false;
-        bool        tmax_is_mcv = false;
-        FmgrInfo    opproc;
-
-        fmgr_info(opfuncoid, &opproc);
-
-        for (i = 0; i < sslot.nvalues; i++)
-        {
-            if (!have_data)
-            {
-                tmin = tmax = sslot.values[i];
-                tmin_is_mcv = tmax_is_mcv = have_data = true;
-                continue;
-            }
-            if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               collation,
-                                               sslot.values[i], tmin)))
-            {
-                tmin = sslot.values[i];
-                tmin_is_mcv = true;
-            }
-            if (DatumGetBool(FunctionCall2Coll(&opproc,
-                                               collation,
-                                               tmax, sslot.values[i])))
-            {
-                tmax = sslot.values[i];
-                tmax_is_mcv = true;
-            }
-        }
-        if (tmin_is_mcv)
-            tmin = datumCopy(tmin, typByVal, typLen);
-        if (tmax_is_mcv)
-            tmax = datumCopy(tmax, typByVal, typLen);
+        get_stats_slot_range(&sslot, opfuncoid, &opproc,
+                             collation, typLen, typByVal,
+                             &tmin, &tmax, &have_data);
         free_attstatsslot(&sslot);
     }

@@ -5475,6 +5505,61 @@ get_variable_range(PlannerInfo *root, VariableStatData *vardata,
     return have_data;
 }

+/*
+ * get_stats_slot_range: scan sslot for min/max values
+ *
+ * Subroutine for get_variable_range.
+ */
+static void
+get_stats_slot_range(AttStatsSlot *sslot, Oid opfuncoid, FmgrInfo *opproc,
+                     Oid collation, int16 typLen, bool typByVal,
+                     Datum *min, Datum *max, bool *p_have_data)
+{
+    Datum        tmin = *min;
+    Datum        tmax = *max;
+    bool        have_data = *p_have_data;
+    bool        found_tmin = false;
+    bool        found_tmax = false;
+
+    /* Look up the comparison function, if we didn't already do so */
+    if (opproc->fn_oid != opfuncoid)
+        fmgr_info(opfuncoid, opproc);
+
+    /* Scan all the slot's values */
+    for (int i = 0; i < sslot->nvalues; i++)
+    {
+        if (!have_data)
+        {
+            tmin = tmax = sslot->values[i];
+            found_tmin = found_tmax = true;
+            *p_have_data = have_data = true;
+            continue;
+        }
+        if (DatumGetBool(FunctionCall2Coll(opproc,
+                                           collation,
+                                           sslot->values[i], tmin)))
+        {
+            tmin = sslot->values[i];
+            found_tmin = true;
+        }
+        if (DatumGetBool(FunctionCall2Coll(opproc,
+                                           collation,
+                                           tmax, sslot->values[i])))
+        {
+            tmax = sslot->values[i];
+            found_tmax = true;
+        }
+    }
+
+    /*
+     * Copy the slot's values, if we found new extreme values.
+     */
+    if (found_tmin)
+        *min = datumCopy(tmin, typByVal, typLen);
+    if (found_tmax)
+        *max = datumCopy(tmax, typByVal, typLen);
+}
+

 /*
  * get_actual_variable_range
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 63d1263502..f3bf413829 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -731,6 +731,55 @@ equality_ops_are_compatible(Oid opno1, Oid opno2)
     return result;
 }

+/*
+ * comparison_ops_are_compatible
+ *        Return true if the two given comparison operators have compatible
+ *        semantics.
+ *
+ * This is trivially true if they are the same operator.  Otherwise,
+ * we look to see if they can be found in the same btree opfamily.
+ * For example, '<' and '>=' ops match if they belong to the same family.
+ *
+ * (This is identical to equality_ops_are_compatible(), except that we
+ * don't bother to examine hash opclasses.)
+ */
+bool
+comparison_ops_are_compatible(Oid opno1, Oid opno2)
+{
+    bool        result;
+    CatCList   *catlist;
+    int            i;
+
+    /* Easy if they're the same operator */
+    if (opno1 == opno2)
+        return true;
+
+    /*
+     * We search through all the pg_amop entries for opno1.
+     */
+    catlist = SearchSysCacheList1(AMOPOPID, ObjectIdGetDatum(opno1));
+
+    result = false;
+    for (i = 0; i < catlist->n_members; i++)
+    {
+        HeapTuple    op_tuple = &catlist->members[i]->tuple;
+        Form_pg_amop op_form = (Form_pg_amop) GETSTRUCT(op_tuple);
+
+        if (op_form->amopmethod == BTREE_AM_OID)
+        {
+            if (op_in_opfamily(opno2, op_form->amopfamily))
+            {
+                result = true;
+                break;
+            }
+        }
+    }
+
+    ReleaseSysCacheList(catlist);
+
+    return result;
+}
+

 /*                ---------- AMPROC CACHES ----------                         */

@@ -3028,19 +3077,6 @@ get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
     sslot->staop = (&stats->staop1)[i];
     sslot->stacoll = (&stats->stacoll1)[i];

-    /*
-     * XXX Hopefully-temporary hack: if stacoll isn't set, inject the default
-     * collation.  This won't matter for non-collation-aware datatypes.  For
-     * those that are, this covers cases where stacoll has not been set.  In
-     * the short term we need this because some code paths involving type NAME
-     * do not pass any collation to prefix_selectivity and related functions.
-     * Even when that's been fixed, it's likely that some add-on typanalyze
-     * functions won't get the word right away about filling stacoll during
-     * ANALYZE, so we'll probably need this for awhile.
-     */
-    if (sslot->stacoll == InvalidOid)
-        sslot->stacoll = DEFAULT_COLLATION_OID;
-
     if (flags & ATTSTATSSLOT_VALUES)
     {
         val = SysCacheGetAttr(STATRELATTINH, statstuple,
diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h
index 91aed1f5a5..fecfe1f4f6 100644
--- a/src/include/utils/lsyscache.h
+++ b/src/include/utils/lsyscache.h
@@ -82,6 +82,7 @@ extern bool get_op_hash_functions(Oid opno,
                                   RegProcedure *lhs_procno, RegProcedure *rhs_procno);
 extern List *get_op_btree_interpretation(Oid opno);
 extern bool equality_ops_are_compatible(Oid opno1, Oid opno2);
+extern bool comparison_ops_are_compatible(Oid opno1, Oid opno2);
 extern Oid    get_opfamily_proc(Oid opfamily, Oid lefttype, Oid righttype,
                               int16 procnum);
 extern char *get_attname(Oid relid, AttrNumber attnum, bool missing_ok);
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 15d2289024..7ac4a06391 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -159,7 +159,8 @@ extern double generic_restriction_selectivity(PlannerInfo *root,
                                               double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
-                                         FmgrInfo *opproc, bool isgt, bool iseq,
+                                         Oid opoid, FmgrInfo *opproc,
+                                         bool isgt, bool iseq,
                                          Oid collation,
                                          Datum constval, Oid consttype);
 extern double var_eq_const(VariableStatData *vardata,
diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out
index c2d037b614..7caf0c9b6b 100644
--- a/src/test/regress/expected/privileges.out
+++ b/src/test/regress/expected/privileges.out
@@ -191,7 +191,10 @@ CREATE TABLE atest12 as
   SELECT x AS a, 10001 - x AS b FROM generate_series(1,10000) x;
 CREATE INDEX ON atest12 (a);
 CREATE INDEX ON atest12 (abs(a));
+-- results below depend on having quite accurate stats for atest12
+SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
+RESET default_statistics_target;
 CREATE FUNCTION leak(integer,integer) RETURNS boolean
   AS $$begin return $1 < $2; end$$
   LANGUAGE plpgsql immutable;
diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql
index 2ba69617dc..0ab5245b1e 100644
--- a/src/test/regress/sql/privileges.sql
+++ b/src/test/regress/sql/privileges.sql
@@ -136,7 +136,10 @@ CREATE TABLE atest12 as
   SELECT x AS a, 10001 - x AS b FROM generate_series(1,10000) x;
 CREATE INDEX ON atest12 (a);
 CREATE INDEX ON atest12 (abs(a));
+-- results below depend on having quite accurate stats for atest12
+SET default_statistics_target = 10000;
 VACUUM ANALYZE atest12;
+RESET default_statistics_target;

 CREATE FUNCTION leak(integer,integer) RETURNS boolean
   AS $$begin return $1 < $2; end$$

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: FailedAssertion("!OidIsValid(def->collOid)", File: "view.c",Line: 89)
Next
From: Thomas Munro
Date:
Subject: Re: Potential G2-item cycles under serializable isolation