Thread: Less-silly selectivity for JSONB matching operators

Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
While looking at a recent complaint about bad planning, I was
reminded that jsonb's @> and related operators use "contsel"
as their selectivity estimator.  This is really bad, because
(a) contsel is only a stub, yielding a fixed default estimate,
and (b) that default is 0.001, meaning we estimate these operators
as five times more selective than equality, which is surely pretty
silly.

There's a good model for improving this in ltree's ltreeparentsel():
for any "var OP constant" query, we can try applying the operator
to all of the column's MCV and histogram values, taking the latter
as being a random sample of the non-MCV values.  That code is
actually 100% generic except for the question of exactly what
default selectivity ought to be plugged in when we don't have stats.

Hence, the attached draft patch moves that logic into a generic
function in selfuncs.c, and then invents "matchsel" and "matchjoinsel"
generic estimators that have a default estimate of twice DEFAULT_EQ_SEL.
(I'm not especially wedded to that number, but it seemed like a
reasonable starting point.)

There were a couple of other operators that seemed to be inappropriately
using contsel, so I changed all of these to use matchsel:

 @>(tsquery,tsquery)    | tsq_mcontains
 <@(tsquery,tsquery)    | tsq_mcontained
 @@(text,text)          | ts_match_tt
 @@(text,tsquery)       | ts_match_tq
 -|-(anyrange,anyrange) | range_adjacent
 @>(jsonb,jsonb)        | jsonb_contains
 ?(jsonb,text)          | jsonb_exists
 ?|(jsonb,text[])       | jsonb_exists_any
 ?&(jsonb,text[])       | jsonb_exists_all
 <@(jsonb,jsonb)        | jsonb_contained
 @?(jsonb,jsonpath)     | jsonb_path_exists_opr
 @@(jsonb,jsonpath)     | jsonb_path_match_opr

Note: you might think that we should just shove this generic logic
into contsel itself, and maybe areasel and patternsel while at it.
However, that would be pretty useless for these functions' intended
usage with the geometric operators, because we collect neither MCV
nor histogram stats for the geometric data types, making the extra
complexity worthless.  Pending somebody putting some effort into
estimation for the geometric data types, I think we should just get
out of the business of having non-geometric types relying on these
estimators.

This patch is not complete, because I didn't look at changing
the contrib modules, and grep says at least some of them are using
contsel for non-geometric data types.  But I thought I'd put it up
for discussion at this stage.

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 070868f..2791037 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -559,8 +559,6 @@ ltree2text(PG_FUNCTION_ARGS)
 }


-#define DEFAULT_PARENT_SEL 0.001
-
 /*
  *    ltreeparentsel - Selectivity of parent relationship for ltree data types.
  */
@@ -571,101 +569,12 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
-    VariableStatData vardata;
-    Node       *other;
-    bool        varonleft;
     double        selec;

-    /*
-     * If expression is not variable <@ something or something <@ variable,
-     * then punt and return a default estimate.
-     */
-    if (!get_restriction_variable(root, args, varRelid,
-                                  &vardata, &other, &varonleft))
-        PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
-
-    /*
-     * If the something is a NULL constant, assume operator is strict and
-     * return zero, ie, operator will never return TRUE.
-     */
-    if (IsA(other, Const) &&
-        ((Const *) other)->constisnull)
-    {
-        ReleaseVariableStats(vardata);
-        PG_RETURN_FLOAT8(0.0);
-    }
-
-    if (IsA(other, Const))
-    {
-        /* Variable is being compared to a known non-null constant */
-        Datum        constval = ((Const *) other)->constvalue;
-        FmgrInfo    contproc;
-        double        mcvsum;
-        double        mcvsel;
-        double        nullfrac;
-        int            hist_size;
-
-        fmgr_info(get_opcode(operator), &contproc);
-
-        /*
-         * Is the constant "<@" to any of the column's most common values?
-         */
-        mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft,
-                                 &mcvsum);
-
-        /*
-         * If the histogram is large enough, see what fraction of it the
-         * constant is "<@" to, and assume that's representative of the
-         * non-MCV population.  Otherwise use the default selectivity for the
-         * non-MCV population.
-         */
-        selec = histogram_selectivity(&vardata, &contproc,
-                                      constval, varonleft,
-                                      10, 1, &hist_size);
-        if (selec < 0)
-        {
-            /* Nope, fall back on default */
-            selec = DEFAULT_PARENT_SEL;
-        }
-        else if (hist_size < 100)
-        {
-            /*
-             * For histogram sizes from 10 to 100, we combine the histogram
-             * and default selectivities, putting increasingly more trust in
-             * the histogram for larger sizes.
-             */
-            double        hist_weight = hist_size / 100.0;
-
-            selec = selec * hist_weight +
-                DEFAULT_PARENT_SEL * (1.0 - hist_weight);
-        }
-
-        /* In any case, don't believe extremely small or large estimates. */
-        if (selec < 0.0001)
-            selec = 0.0001;
-        else if (selec > 0.9999)
-            selec = 0.9999;
-
-        if (HeapTupleIsValid(vardata.statsTuple))
-            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
-        else
-            nullfrac = 0.0;
-
-        /*
-         * Now merge the results from the MCV and histogram calculations,
-         * realizing that the histogram covers only the non-null values that
-         * are not listed in MCV.
-         */
-        selec *= 1.0 - nullfrac - mcvsum;
-        selec += mcvsel;
-    }
-    else
-        selec = DEFAULT_PARENT_SEL;
-
-    ReleaseVariableStats(vardata);
-
-    /* result should be in range, but make sure... */
-    CLAMP_PROBABILITY(selec);
+    /* Use generic restriction selectivity logic, with default 0.001. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            0.001);

     PG_RETURN_FLOAT8((float8) selec);
 }
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml
index 132056f..610545a 100644
--- a/doc/src/sgml/xoper.sgml
+++ b/doc/src/sgml/xoper.sgml
@@ -283,6 +283,18 @@ column OP constant
    </para>

    <para>
+    Another useful built-in selectivity estimation function
+    is <function>matchsel</function>, which will work for almost any
+    binary operator, if standard MCV and/or histogram statistics are
+    collected for the input data type(s).  Its default estimate is set to
+    twice the default estimate used in <function>eqsel</function>, making
+    it most suitable for comparison operators that are somewhat less
+    strict than equality.  (Or you could call the
+    underlying <function>generic_restriction_selectivity</function>
+    function, providing a different default estimate.)
+   </para>
+
+   <para>
     There are additional selectivity estimation functions designed for geometric
     operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>,
<function>positionsel</function>,
     and <function>contsel</function>.  At this writing these are just stubs, but you might want
@@ -319,6 +331,7 @@ table1.column1 OP table2.column2
       <member><function>scalarlejoinsel</function> for <literal><=</literal></member>
       <member><function>scalargtjoinsel</function> for <literal>></literal></member>
       <member><function>scalargejoinsel</function> for <literal>>=</literal></member>
+      <member><function>matchjoinsel</function> for generic matching operators</member>
       <member><function>areajoinsel</function> for 2D area-based comparisons</member>
       <member><function>positionjoinsel</function> for 2D position-based comparisons</member>
       <member><function>contjoinsel</function> for 2D containment-based comparisons</member>
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0be26fe..2e68338 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -829,6 +829,132 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
 }

 /*
+ *    generic_restriction_selectivity        - Selectivity for almost anything
+ *
+ * This function estimates selectivity for operators that we don't have any
+ * special knowledge about, but are on data types that we collect standard
+ * MCV and/or histogram statistics for.  (Additional assumptions are that
+ * the operator is strict and immutable, or at least stable.)
+ *
+ * If we have "VAR OP CONST" or "CONST OP VAR", selectivity is estimated by
+ * applying the operator to each element of the column's MCV and/or histogram
+ * stats, and merging the results using the assumption that the histogram is
+ * a reasonable random sample of the column's non-MCV population.  Note that
+ * if the operator's semantics are related to the histogram ordering, this
+ * might not be such a great assumption; other functions such as
+ * scalarineqsel() are probably a better match in such cases.
+ *
+ * Otherwise, fall back to the default selectivity provided by the caller.
+ */
+double
+generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                List *args, int varRelid,
+                                double default_selectivity)
+{
+    double        selec;
+    VariableStatData vardata;
+    Node       *other;
+    bool        varonleft;
+
+    /*
+     * If expression is not variable OP something or something OP variable,
+     * then punt and return the default estimate.
+     */
+    if (!get_restriction_variable(root, args, varRelid,
+                                  &vardata, &other, &varonleft))
+        return default_selectivity;
+
+    /*
+     * If the something is a NULL constant, assume operator is strict and
+     * return zero, ie, operator will never return TRUE.
+     */
+    if (IsA(other, Const) &&
+        ((Const *) other)->constisnull)
+    {
+        ReleaseVariableStats(vardata);
+        return 0.0;
+    }
+
+    if (IsA(other, Const))
+    {
+        /* Variable is being compared to a known non-null constant */
+        Datum        constval = ((Const *) other)->constvalue;
+        FmgrInfo    opproc;
+        double        mcvsum;
+        double        mcvsel;
+        double        nullfrac;
+        int            hist_size;
+
+        fmgr_info(get_opcode(operator), &opproc);
+
+        /*
+         * Calculate the selectivity for the column's most common values.
+         */
+        mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft,
+                                 &mcvsum);
+
+        /*
+         * If the histogram is large enough, see what fraction of it matches
+         * the query, and assume that's representative of the non-MCV
+         * population.  Otherwise use the default selectivity for the non-MCV
+         * population.
+         */
+        selec = histogram_selectivity(&vardata, &opproc,
+                                      constval, varonleft,
+                                      10, 1, &hist_size);
+        if (selec < 0)
+        {
+            /* Nope, fall back on default */
+            selec = default_selectivity;
+        }
+        else if (hist_size < 100)
+        {
+            /*
+             * For histogram sizes from 10 to 100, we combine the histogram
+             * and default selectivities, putting increasingly more trust in
+             * the histogram for larger sizes.
+             */
+            double        hist_weight = hist_size / 100.0;
+
+            selec = selec * hist_weight +
+                default_selectivity * (1.0 - hist_weight);
+        }
+
+        /* In any case, don't believe extremely small or large estimates. */
+        if (selec < 0.0001)
+            selec = 0.0001;
+        else if (selec > 0.9999)
+            selec = 0.9999;
+
+        /* Don't forget to account for nulls. */
+        if (HeapTupleIsValid(vardata.statsTuple))
+            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
+        else
+            nullfrac = 0.0;
+
+        /*
+         * Now merge the results from the MCV and histogram calculations,
+         * realizing that the histogram covers only the non-null values that
+         * are not listed in MCV.
+         */
+        selec *= 1.0 - nullfrac - mcvsum;
+        selec += mcvsel;
+    }
+    else
+    {
+        /* Comparison value is not constant, so we can't do anything */
+        selec = default_selectivity;
+    }
+
+    ReleaseVariableStats(vardata);
+
+    /* result should be in range, but make sure... */
+    CLAMP_PROBABILITY(selec);
+
+    return selec;
+}
+
+/*
  *    ineq_histogram_selectivity    - Examine the histogram for scalarineqsel
  *
  * Determine the fraction of the variable's histogram population that
@@ -2916,6 +3042,40 @@ fail:


 /*
+ *    matchsel -- generic matching-operator selectivity support
+ *
+ * Use these for any operators that (a) are on data types for which we collect
+ * standard statistics, and (b) have behavior for which the default estimate
+ * of 2*DEFAULT_EQ_SEL is sane.  Typically that is good for match-like
+ * operators.
+ */
+
+Datum
+matchsel(PG_FUNCTION_ARGS)
+{
+    PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+    Oid            operator = PG_GETARG_OID(1);
+    List       *args = (List *) PG_GETARG_POINTER(2);
+    int            varRelid = PG_GETARG_INT32(3);
+    double        selec;
+
+    /* Use generic restriction selectivity logic. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            2 * DEFAULT_EQ_SEL);
+
+    PG_RETURN_FLOAT8((float8) selec);
+}
+
+Datum
+matchjoinsel(PG_FUNCTION_ARGS)
+{
+    /* Just punt, for the moment. */
+    PG_RETURN_FLOAT8(2 * DEFAULT_EQ_SEL);
+}
+
+
+/*
  * Helper routine for estimate_num_groups: add an item to a list of
  * GroupVarInfos, but only if it's not known equal to any of the existing
  * entries.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 7c135da..6c9bbb3 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3016,18 +3016,18 @@
 { oid => '3693', descr => 'contains',
   oprname => '@>', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '<@(tsquery,tsquery)',
-  oprcode => 'tsq_mcontains', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontains', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3694', descr => 'is contained by',
   oprname => '<@', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '@>(tsquery,tsquery)',
-  oprcode => 'tsq_mcontained', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontained', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3762', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'text', oprresult => 'bool',
-  oprcode => 'ts_match_tt', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'ts_match_tt', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3763', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'tsquery',
-  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },

 # generic record comparison operators
 { oid => '2988', oid_symbol => 'RECORD_EQ_OP', descr => 'equal',
@@ -3178,7 +3178,7 @@
 { oid => '3897', descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anyrange)',
-  oprcode => 'range_adjacent', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'range_adjacent', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3898', descr => 'range union',
   oprname => '+', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'anyrange', oprcom => '+(anyrange,anyrange)',
@@ -3258,22 +3258,22 @@
 { oid => '3246', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3247', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3248', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_any', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },
 { oid => '3249', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_all', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },
 { oid => '3250', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
@@ -3292,10 +3292,10 @@
 { oid => '4012', descr => 'jsonpath exists',
   oprname => '@?', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_exists_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '4013', descr => 'jsonpath match',
   oprname => '@@', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_match_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },

 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 07a86c7..088d106 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10574,6 +10574,15 @@
   prosrc => 'shift_jis_2004_to_euc_jis_2004',
   probin => '$libdir/euc2004_sjis2004' },

+{ oid => '8387',
+  descr => 'restriction selectivity for generic matching operators',
+  proname => 'matchsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'matchsel' },
+{ oid => '8388', descr => 'join selectivity for generic matching operators',
+  proname => 'matchjoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'matchjoinsel' },
+
 # replication/origin.h
 { oid => '6003', descr => 'create a replication origin',
   proname => 'pg_replication_origin_create', provolatile => 'v',
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 1c9570f..faa01cf 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -148,6 +148,9 @@ extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
+extern double generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                              List *args, int varRelid,
+                                              double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,

Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
I wrote:
> This patch is not complete, because I didn't look at changing
> the contrib modules, and grep says at least some of them are using
> contsel for non-geometric data types.  But I thought I'd put it up
> for discussion at this stage.

Hearing nothing, I went ahead and hacked on the contrib code.
The attached 0002 updates hstore, ltree, and pg_trgm to get them
out of using contsel/contjoinsel for anything.  (0001 is the same
patch I posted before.)

In ltree, I noted that < <= >= > were using contsel even though
those are part of a btree opclass, meaning they could perfectly
well use scalarltsel and friends.  So now they do.  Everything
else now uses matchsel/matchjoinsel, leaving ltreeparentsel as
an unused backward-compatibility feature.  I didn't think that
the default selectivity in ltreeparentsel was particularly sane,
so having those operators use their own selectivity logic
instead of using matchsel like everything else seemed pointless
(and certainly pairing a custom ltreeparentsel with contjoinsel
isn't something to encourage).

In pg_trgm, the change of default selectivity estimate causes one
plan to change, but I think that's fine; looking at the data hidden
by COSTS OFF shows the new estimate is closer to reality anyway.
(That test is meant to exercise some gist consistent-function logic,
which it still does, so no worries there.)

The cube and seg extensions still make significant use of contsel and
the other geometric estimator stubs.  Although we could in principle
change those operators to use matchsel, I'm hesitant to do so without
closer analysis.  The sort orderings imposed by their default btree
opclasses correlate strongly with cube/seg size, which is related to
overlap/containment outcomes, so I'm not sure that the histogram
entries would provide a plausibly random sample for this purpose.
So those modules are not touched here.

There are a few other random uses of geometric join estimators
paired with non-geometric restriction estimators, including
these in the core core:

 @>(anyrange,anyelement) | range_contains_elem     | rangesel | contjoinsel
 @>(anyrange,anyrange)   | range_contains          | rangesel | contjoinsel
 <@(anyelement,anyrange) | elem_contained_by_range | rangesel | contjoinsel
 <@(anyrange,anyrange)   | range_contained_by      | rangesel | contjoinsel
 &&(anyrange,anyrange)   | range_overlaps          | rangesel | areajoinsel

plus the @@ and ~~ operators in intarray.  While this is ugly,
it's probably not worth changing until somebody creates non-stub
join selectivity code that will work for these cases.

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 070868f..2791037 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -559,8 +559,6 @@ ltree2text(PG_FUNCTION_ARGS)
 }


-#define DEFAULT_PARENT_SEL 0.001
-
 /*
  *    ltreeparentsel - Selectivity of parent relationship for ltree data types.
  */
@@ -571,101 +569,12 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
-    VariableStatData vardata;
-    Node       *other;
-    bool        varonleft;
     double        selec;

-    /*
-     * If expression is not variable <@ something or something <@ variable,
-     * then punt and return a default estimate.
-     */
-    if (!get_restriction_variable(root, args, varRelid,
-                                  &vardata, &other, &varonleft))
-        PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
-
-    /*
-     * If the something is a NULL constant, assume operator is strict and
-     * return zero, ie, operator will never return TRUE.
-     */
-    if (IsA(other, Const) &&
-        ((Const *) other)->constisnull)
-    {
-        ReleaseVariableStats(vardata);
-        PG_RETURN_FLOAT8(0.0);
-    }
-
-    if (IsA(other, Const))
-    {
-        /* Variable is being compared to a known non-null constant */
-        Datum        constval = ((Const *) other)->constvalue;
-        FmgrInfo    contproc;
-        double        mcvsum;
-        double        mcvsel;
-        double        nullfrac;
-        int            hist_size;
-
-        fmgr_info(get_opcode(operator), &contproc);
-
-        /*
-         * Is the constant "<@" to any of the column's most common values?
-         */
-        mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft,
-                                 &mcvsum);
-
-        /*
-         * If the histogram is large enough, see what fraction of it the
-         * constant is "<@" to, and assume that's representative of the
-         * non-MCV population.  Otherwise use the default selectivity for the
-         * non-MCV population.
-         */
-        selec = histogram_selectivity(&vardata, &contproc,
-                                      constval, varonleft,
-                                      10, 1, &hist_size);
-        if (selec < 0)
-        {
-            /* Nope, fall back on default */
-            selec = DEFAULT_PARENT_SEL;
-        }
-        else if (hist_size < 100)
-        {
-            /*
-             * For histogram sizes from 10 to 100, we combine the histogram
-             * and default selectivities, putting increasingly more trust in
-             * the histogram for larger sizes.
-             */
-            double        hist_weight = hist_size / 100.0;
-
-            selec = selec * hist_weight +
-                DEFAULT_PARENT_SEL * (1.0 - hist_weight);
-        }
-
-        /* In any case, don't believe extremely small or large estimates. */
-        if (selec < 0.0001)
-            selec = 0.0001;
-        else if (selec > 0.9999)
-            selec = 0.9999;
-
-        if (HeapTupleIsValid(vardata.statsTuple))
-            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
-        else
-            nullfrac = 0.0;
-
-        /*
-         * Now merge the results from the MCV and histogram calculations,
-         * realizing that the histogram covers only the non-null values that
-         * are not listed in MCV.
-         */
-        selec *= 1.0 - nullfrac - mcvsum;
-        selec += mcvsel;
-    }
-    else
-        selec = DEFAULT_PARENT_SEL;
-
-    ReleaseVariableStats(vardata);
-
-    /* result should be in range, but make sure... */
-    CLAMP_PROBABILITY(selec);
+    /* Use generic restriction selectivity logic, with default 0.001. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            0.001);

     PG_RETURN_FLOAT8((float8) selec);
 }
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml
index 132056f..610545a 100644
--- a/doc/src/sgml/xoper.sgml
+++ b/doc/src/sgml/xoper.sgml
@@ -283,6 +283,18 @@ column OP constant
    </para>

    <para>
+    Another useful built-in selectivity estimation function
+    is <function>matchsel</function>, which will work for almost any
+    binary operator, if standard MCV and/or histogram statistics are
+    collected for the input data type(s).  Its default estimate is set to
+    twice the default estimate used in <function>eqsel</function>, making
+    it most suitable for comparison operators that are somewhat less
+    strict than equality.  (Or you could call the
+    underlying <function>generic_restriction_selectivity</function>
+    function, providing a different default estimate.)
+   </para>
+
+   <para>
     There are additional selectivity estimation functions designed for geometric
     operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>,
<function>positionsel</function>,
     and <function>contsel</function>.  At this writing these are just stubs, but you might want
@@ -319,6 +331,7 @@ table1.column1 OP table2.column2
       <member><function>scalarlejoinsel</function> for <literal><=</literal></member>
       <member><function>scalargtjoinsel</function> for <literal>></literal></member>
       <member><function>scalargejoinsel</function> for <literal>>=</literal></member>
+      <member><function>matchjoinsel</function> for generic matching operators</member>
       <member><function>areajoinsel</function> for 2D area-based comparisons</member>
       <member><function>positionjoinsel</function> for 2D position-based comparisons</member>
       <member><function>contjoinsel</function> for 2D containment-based comparisons</member>
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0be26fe..2e68338 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -829,6 +829,132 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
 }

 /*
+ *    generic_restriction_selectivity        - Selectivity for almost anything
+ *
+ * This function estimates selectivity for operators that we don't have any
+ * special knowledge about, but are on data types that we collect standard
+ * MCV and/or histogram statistics for.  (Additional assumptions are that
+ * the operator is strict and immutable, or at least stable.)
+ *
+ * If we have "VAR OP CONST" or "CONST OP VAR", selectivity is estimated by
+ * applying the operator to each element of the column's MCV and/or histogram
+ * stats, and merging the results using the assumption that the histogram is
+ * a reasonable random sample of the column's non-MCV population.  Note that
+ * if the operator's semantics are related to the histogram ordering, this
+ * might not be such a great assumption; other functions such as
+ * scalarineqsel() are probably a better match in such cases.
+ *
+ * Otherwise, fall back to the default selectivity provided by the caller.
+ */
+double
+generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                List *args, int varRelid,
+                                double default_selectivity)
+{
+    double        selec;
+    VariableStatData vardata;
+    Node       *other;
+    bool        varonleft;
+
+    /*
+     * If expression is not variable OP something or something OP variable,
+     * then punt and return the default estimate.
+     */
+    if (!get_restriction_variable(root, args, varRelid,
+                                  &vardata, &other, &varonleft))
+        return default_selectivity;
+
+    /*
+     * If the something is a NULL constant, assume operator is strict and
+     * return zero, ie, operator will never return TRUE.
+     */
+    if (IsA(other, Const) &&
+        ((Const *) other)->constisnull)
+    {
+        ReleaseVariableStats(vardata);
+        return 0.0;
+    }
+
+    if (IsA(other, Const))
+    {
+        /* Variable is being compared to a known non-null constant */
+        Datum        constval = ((Const *) other)->constvalue;
+        FmgrInfo    opproc;
+        double        mcvsum;
+        double        mcvsel;
+        double        nullfrac;
+        int            hist_size;
+
+        fmgr_info(get_opcode(operator), &opproc);
+
+        /*
+         * Calculate the selectivity for the column's most common values.
+         */
+        mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft,
+                                 &mcvsum);
+
+        /*
+         * If the histogram is large enough, see what fraction of it matches
+         * the query, and assume that's representative of the non-MCV
+         * population.  Otherwise use the default selectivity for the non-MCV
+         * population.
+         */
+        selec = histogram_selectivity(&vardata, &opproc,
+                                      constval, varonleft,
+                                      10, 1, &hist_size);
+        if (selec < 0)
+        {
+            /* Nope, fall back on default */
+            selec = default_selectivity;
+        }
+        else if (hist_size < 100)
+        {
+            /*
+             * For histogram sizes from 10 to 100, we combine the histogram
+             * and default selectivities, putting increasingly more trust in
+             * the histogram for larger sizes.
+             */
+            double        hist_weight = hist_size / 100.0;
+
+            selec = selec * hist_weight +
+                default_selectivity * (1.0 - hist_weight);
+        }
+
+        /* In any case, don't believe extremely small or large estimates. */
+        if (selec < 0.0001)
+            selec = 0.0001;
+        else if (selec > 0.9999)
+            selec = 0.9999;
+
+        /* Don't forget to account for nulls. */
+        if (HeapTupleIsValid(vardata.statsTuple))
+            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
+        else
+            nullfrac = 0.0;
+
+        /*
+         * Now merge the results from the MCV and histogram calculations,
+         * realizing that the histogram covers only the non-null values that
+         * are not listed in MCV.
+         */
+        selec *= 1.0 - nullfrac - mcvsum;
+        selec += mcvsel;
+    }
+    else
+    {
+        /* Comparison value is not constant, so we can't do anything */
+        selec = default_selectivity;
+    }
+
+    ReleaseVariableStats(vardata);
+
+    /* result should be in range, but make sure... */
+    CLAMP_PROBABILITY(selec);
+
+    return selec;
+}
+
+/*
  *    ineq_histogram_selectivity    - Examine the histogram for scalarineqsel
  *
  * Determine the fraction of the variable's histogram population that
@@ -2916,6 +3042,40 @@ fail:


 /*
+ *    matchsel -- generic matching-operator selectivity support
+ *
+ * Use these for any operators that (a) are on data types for which we collect
+ * standard statistics, and (b) have behavior for which the default estimate
+ * of 2*DEFAULT_EQ_SEL is sane.  Typically that is good for match-like
+ * operators.
+ */
+
+Datum
+matchsel(PG_FUNCTION_ARGS)
+{
+    PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+    Oid            operator = PG_GETARG_OID(1);
+    List       *args = (List *) PG_GETARG_POINTER(2);
+    int            varRelid = PG_GETARG_INT32(3);
+    double        selec;
+
+    /* Use generic restriction selectivity logic. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            2 * DEFAULT_EQ_SEL);
+
+    PG_RETURN_FLOAT8((float8) selec);
+}
+
+Datum
+matchjoinsel(PG_FUNCTION_ARGS)
+{
+    /* Just punt, for the moment. */
+    PG_RETURN_FLOAT8(2 * DEFAULT_EQ_SEL);
+}
+
+
+/*
  * Helper routine for estimate_num_groups: add an item to a list of
  * GroupVarInfos, but only if it's not known equal to any of the existing
  * entries.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 7c135da..6c9bbb3 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3016,18 +3016,18 @@
 { oid => '3693', descr => 'contains',
   oprname => '@>', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '<@(tsquery,tsquery)',
-  oprcode => 'tsq_mcontains', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontains', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3694', descr => 'is contained by',
   oprname => '<@', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '@>(tsquery,tsquery)',
-  oprcode => 'tsq_mcontained', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontained', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3762', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'text', oprresult => 'bool',
-  oprcode => 'ts_match_tt', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'ts_match_tt', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3763', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'tsquery',
-  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },

 # generic record comparison operators
 { oid => '2988', oid_symbol => 'RECORD_EQ_OP', descr => 'equal',
@@ -3178,7 +3178,7 @@
 { oid => '3897', descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anyrange)',
-  oprcode => 'range_adjacent', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'range_adjacent', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3898', descr => 'range union',
   oprname => '+', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'anyrange', oprcom => '+(anyrange,anyrange)',
@@ -3258,22 +3258,22 @@
 { oid => '3246', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3247', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists', oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3248', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_any', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },
 { oid => '3249', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_all', oprrest => 'matchsel',
+  oprjoin => 'matchjoinsel' },
 { oid => '3250', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
@@ -3292,10 +3292,10 @@
 { oid => '4012', descr => 'jsonpath exists',
   oprname => '@?', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_exists_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },
 { oid => '4013', descr => 'jsonpath match',
   oprname => '@@', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_match_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchsel', oprjoin => 'matchjoinsel' },

 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 07a86c7..088d106 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10574,6 +10574,15 @@
   prosrc => 'shift_jis_2004_to_euc_jis_2004',
   probin => '$libdir/euc2004_sjis2004' },

+{ oid => '8387',
+  descr => 'restriction selectivity for generic matching operators',
+  proname => 'matchsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'matchsel' },
+{ oid => '8388', descr => 'join selectivity for generic matching operators',
+  proname => 'matchjoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'matchjoinsel' },
+
 # replication/origin.h
 { oid => '6003', descr => 'create a replication origin',
   proname => 'pg_replication_origin_create', provolatile => 'v',
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 1c9570f..faa01cf 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -148,6 +148,9 @@ extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
+extern double generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                              List *args, int varRelid,
+                                              double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,
diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index 24a9b02..872ca03 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -11,6 +11,7 @@ OBJS = \

 EXTENSION = hstore
 DATA = hstore--1.4.sql \
+    hstore--1.6--1.7.sql \
     hstore--1.5--1.6.sql \
     hstore--1.4--1.5.sql \
     hstore--1.3--1.4.sql hstore--1.2--1.3.sql \
diff --git a/contrib/hstore/hstore--1.6--1.7.sql b/contrib/hstore/hstore--1.6--1.7.sql
new file mode 100644
index 0000000..08d7b35
--- /dev/null
+++ b/contrib/hstore/hstore--1.6--1.7.sql
@@ -0,0 +1,19 @@
+/* contrib/hstore/hstore--1.6--1.7.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION hstore UPDATE TO '1.7'" to load this file. \quit
+
+ALTER OPERATOR ? (hstore, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ?| (hstore, text[])
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ?& (hstore, text[])
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @> (hstore, hstore)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <@ (hstore, hstore)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @ (hstore, hstore)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ~ (hstore, hstore)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
diff --git a/contrib/hstore/hstore.control b/contrib/hstore/hstore.control
index e0fbb8b..f0da772 100644
--- a/contrib/hstore/hstore.control
+++ b/contrib/hstore/hstore.control
@@ -1,6 +1,6 @@
 # hstore extension
 comment = 'data type for storing sets of (key, value) pairs'
-default_version = '1.6'
+default_version = '1.7'
 module_pathname = '$libdir/hstore'
 relocatable = true
 trusted = true
diff --git a/contrib/ltree/Makefile b/contrib/ltree/Makefile
index 70c5e37..58a8295 100644
--- a/contrib/ltree/Makefile
+++ b/contrib/ltree/Makefile
@@ -15,7 +15,7 @@ OBJS = \
 PG_CPPFLAGS = -DLOWER_NODE

 EXTENSION = ltree
-DATA = ltree--1.1.sql ltree--1.0--1.1.sql
+DATA = ltree--1.1.sql ltree--1.0--1.1.sql ltree--1.1--1.2.sql
 PGFILEDESC = "ltree - hierarchical label data type"

 HEADERS = ltree.h
diff --git a/contrib/ltree/ltree--1.1--1.2.sql b/contrib/ltree/ltree--1.1--1.2.sql
new file mode 100644
index 0000000..cc70757
--- /dev/null
+++ b/contrib/ltree/ltree--1.1--1.2.sql
@@ -0,0 +1,86 @@
+/* contrib/ltree/ltree--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION ltree UPDATE TO '1.2'" to load this file. \quit
+
+ALTER OPERATOR < (ltree, ltree)
+  SET (RESTRICT = scalarltsel, JOIN = scalarltjoinsel);
+ALTER OPERATOR <= (ltree, ltree)
+  SET (RESTRICT = scalarlesel, JOIN = scalarlejoinsel);
+ALTER OPERATOR >= (ltree, ltree)
+  SET (RESTRICT = scalargesel, JOIN = scalargejoinsel);
+ALTER OPERATOR > (ltree, ltree)
+  SET (RESTRICT = scalargtsel, JOIN = scalargtjoinsel);
+
+ALTER OPERATOR @> (ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@> (ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <@ (ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^<@ (ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ~ (ltree, lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ~ (lquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^~ (ltree, lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^~ (lquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ? (ltree, _lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ? (_lquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^? (ltree, _lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^? (_lquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @ (ltree, ltxtquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @ (ltxtquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@ (ltree, ltxtquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@ (ltxtquery, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @> (_ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <@ (ltree, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <@ (_ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @> (ltree, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ~ (_ltree, lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ~ (lquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ? (_ltree, _lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ? (_lquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @ (_ltree, ltxtquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR @ (ltxtquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@> (_ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^<@ (ltree, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^<@ (_ltree, ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@> (ltree, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^~ (_ltree, lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^~ (lquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^? (_ltree, _lquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^? (_lquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@ (_ltree, ltxtquery)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR ^@ (ltxtquery, _ltree)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
diff --git a/contrib/ltree/ltree.control b/contrib/ltree/ltree.control
index 3118df6..b408d64 100644
--- a/contrib/ltree/ltree.control
+++ b/contrib/ltree/ltree.control
@@ -1,6 +1,6 @@
 # ltree extension
 comment = 'data type for hierarchical tree-like structures'
-default_version = '1.1'
+default_version = '1.2'
 module_pathname = '$libdir/ltree'
 relocatable = true
 trusted = true
diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 2791037..8ad8a1d 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -561,6 +561,9 @@ ltree2text(PG_FUNCTION_ARGS)

 /*
  *    ltreeparentsel - Selectivity of parent relationship for ltree data types.
+ *
+ * This function is not used anymore, if the ltree extension has been
+ * updated to 1.2 or later.
  */
 Datum
 ltreeparentsel(PG_FUNCTION_ARGS)
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index f0a8d9c..d75e9ad 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
     trgm_regexp.o

 EXTENSION = pg_trgm
-DATA = pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
     pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
     pg_trgm--1.0--1.1.sql
 PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 91596f8..19006f2 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4013,16 +4013,15 @@ SELECT similarity('Szczecin', 'Warsaw');
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT city, similarity(city, 'Warsaw'), show_limit()
   FROM restaurants WHERE city % 'Warsaw';
-                         QUERY PLAN
--------------------------------------------------------------
- Unique
-   ->  Sort
-         Sort Key: city, (similarity(city, 'Warsaw'::text))
-         ->  Bitmap Heap Scan on restaurants
-               Recheck Cond: (city % 'Warsaw'::text)
-               ->  Bitmap Index Scan on restaurants_city_idx
-                     Index Cond: (city % 'Warsaw'::text)
-(7 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ HashAggregate
+   Group Key: city, similarity(city, 'Warsaw'::text), show_limit()
+   ->  Bitmap Heap Scan on restaurants
+         Recheck Cond: (city % 'Warsaw'::text)
+         ->  Bitmap Index Scan on restaurants_city_idx
+               Index Cond: (city % 'Warsaw'::text)
+(6 rows)

 SELECT set_limit(0.3);
  set_limit
diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
new file mode 100644
index 0000000..c0d9f6f
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -0,0 +1,15 @@
+/* contrib/pg_trgm/pg_trgm--1.4--1.5.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.5'" to load this file. \quit
+
+ALTER OPERATOR % (text, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <% (text, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR %> (text, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR <<% (text, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
+ALTER OPERATOR %>> (text, text)
+  SET (RESTRICT = matchsel, JOIN = matchjoinsel);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index 831ba23..ed4487e 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,6 +1,6 @@
 # pg_trgm extension
 comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.4'
+default_version = '1.5'
 module_pathname = '$libdir/pg_trgm'
 relocatable = true
 trusted = true

Re: Less-silly selectivity for JSONB matching operators

From
Alexey Bashtanov
Date:
Hi Tom,

The patches look entirely reasonable to me.
The second one needs to be rebased.

I like the idea of stubbing matchjoinsel for now,
as well as being careful with operators that may correlate with sort 
orderings.

The only little thing I can think of is hardcoding it as 2 * DEFAULT_EQ_SEL.
While I don't have any arguments against the value itself I think it 
should be configurable independently.
Sadly DEFAULT_MATCH_SEL name is already taken for text patterns.
Not sure if it's a reason to rename all the stuff.

Best, Alex



Re: Less-silly selectivity for JSONB matching operators

From
Alexey Bashtanov
Date:
Quickly tested like this:

create table t(a jsonb);
insert into t select jsonb_object( array[(random() * 10)::int::text], 
'{" "}') from generate_series(1, 100000);
insert into t select jsonb_object( array[(random() * 10)::int::text], 
array[(random() * 1000)::int::text]) from generate_series(1, 100000);
explain analyze select * from t where a ? '1';
analyze t;
explain analyze select * from t where a ? '1';

Best, Alex



Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
Alexey Bashtanov <bashtanov@imap.cc> writes:
> The only little thing I can think of is hardcoding it as 2 * DEFAULT_EQ_SEL.
> While I don't have any arguments against the value itself I think it 
> should be configurable independently.
> Sadly DEFAULT_MATCH_SEL name is already taken for text patterns.
> Not sure if it's a reason to rename all the stuff.

Yeah, I was going to invent a symbol till I noticed that DEFAULT_MATCH_SEL
was already taken :-(.

There are only about half a dozen uses of that in-core, so maybe we could
get away with renaming that one, but on the whole I'd rather leave it
alone in case some extension is using it.  So that leaves us with needing
to find a better name for this new one.  Any ideas?

            regards, tom lane



Re: Less-silly selectivity for JSONB matching operators

From
Alexey Bashtanov
Date:
> So that leaves us with needing
> to find a better name for this new one.  Any ideas?
I'm thinking of something wide like
opersel, operjoinsel, DEFAULT_OPER_SEL
or maybe even
genericsel, genericjoinsel, DEFAULT_GENERIC_SEL

Best, Alex



Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
Alexey Bashtanov <bashtanov@imap.cc> writes:
>> So that leaves us with needing
>> to find a better name for this new one.  Any ideas?

> I'm thinking of something wide like
> opersel, operjoinsel, DEFAULT_OPER_SEL
> or maybe even
> genericsel, genericjoinsel, DEFAULT_GENERIC_SEL

Seems a little *too* generic :-(

I was wondering about DEFAULT_MATCHING_SEL.  The difference in purpose
from DEFAULT_MATCH_SEL wouldn't be too obvious, but then it probably
wouldn't be anyway.

            regards, tom lane



Re: Less-silly selectivity for JSONB matching operators

From
Alexey Bashtanov
Date:
> I was wondering about DEFAULT_MATCHING_SEL.  The difference in purpose
> from DEFAULT_MATCH_SEL wouldn't be too obvious, but then it probably
> wouldn't be anyway.
Fine with me, especially if both new functions are renamed accordingly.

Best, Alex



Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
Alexey Bashtanov <bashtanov@imap.cc> writes:
>> I was wondering about DEFAULT_MATCHING_SEL.  The difference in purpose
>> from DEFAULT_MATCH_SEL wouldn't be too obvious, but then it probably
>> wouldn't be anyway.

> Fine with me, especially if both new functions are renamed accordingly.

Yup, that would make sense, will do it like that.

            regards, tom lane



Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
Renamed "matchsel" to "matchingsel" etc, added DEFAULT_MATCHING_SEL,
rebased over commit 911e70207.  Since that commit already created
new versions of the relevant contrib modules, I think we can just
redefine what those versions contain, rather than making yet-newer
versions.  (Of course, that assumes we're going to include this in
v13.)

            regards, tom lane

diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index 34e6e4b..f606761 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -566,8 +566,6 @@ ltree2text(PG_FUNCTION_ARGS)
 }


-#define DEFAULT_PARENT_SEL 0.001
-
 /*
  *    ltreeparentsel - Selectivity of parent relationship for ltree data types.
  */
@@ -578,101 +576,12 @@ ltreeparentsel(PG_FUNCTION_ARGS)
     Oid            operator = PG_GETARG_OID(1);
     List       *args = (List *) PG_GETARG_POINTER(2);
     int            varRelid = PG_GETARG_INT32(3);
-    VariableStatData vardata;
-    Node       *other;
-    bool        varonleft;
     double        selec;

-    /*
-     * If expression is not variable <@ something or something <@ variable,
-     * then punt and return a default estimate.
-     */
-    if (!get_restriction_variable(root, args, varRelid,
-                                  &vardata, &other, &varonleft))
-        PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
-
-    /*
-     * If the something is a NULL constant, assume operator is strict and
-     * return zero, ie, operator will never return TRUE.
-     */
-    if (IsA(other, Const) &&
-        ((Const *) other)->constisnull)
-    {
-        ReleaseVariableStats(vardata);
-        PG_RETURN_FLOAT8(0.0);
-    }
-
-    if (IsA(other, Const))
-    {
-        /* Variable is being compared to a known non-null constant */
-        Datum        constval = ((Const *) other)->constvalue;
-        FmgrInfo    contproc;
-        double        mcvsum;
-        double        mcvsel;
-        double        nullfrac;
-        int            hist_size;
-
-        fmgr_info(get_opcode(operator), &contproc);
-
-        /*
-         * Is the constant "<@" to any of the column's most common values?
-         */
-        mcvsel = mcv_selectivity(&vardata, &contproc, constval, varonleft,
-                                 &mcvsum);
-
-        /*
-         * If the histogram is large enough, see what fraction of it the
-         * constant is "<@" to, and assume that's representative of the
-         * non-MCV population.  Otherwise use the default selectivity for the
-         * non-MCV population.
-         */
-        selec = histogram_selectivity(&vardata, &contproc,
-                                      constval, varonleft,
-                                      10, 1, &hist_size);
-        if (selec < 0)
-        {
-            /* Nope, fall back on default */
-            selec = DEFAULT_PARENT_SEL;
-        }
-        else if (hist_size < 100)
-        {
-            /*
-             * For histogram sizes from 10 to 100, we combine the histogram
-             * and default selectivities, putting increasingly more trust in
-             * the histogram for larger sizes.
-             */
-            double        hist_weight = hist_size / 100.0;
-
-            selec = selec * hist_weight +
-                DEFAULT_PARENT_SEL * (1.0 - hist_weight);
-        }
-
-        /* In any case, don't believe extremely small or large estimates. */
-        if (selec < 0.0001)
-            selec = 0.0001;
-        else if (selec > 0.9999)
-            selec = 0.9999;
-
-        if (HeapTupleIsValid(vardata.statsTuple))
-            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
-        else
-            nullfrac = 0.0;
-
-        /*
-         * Now merge the results from the MCV and histogram calculations,
-         * realizing that the histogram covers only the non-null values that
-         * are not listed in MCV.
-         */
-        selec *= 1.0 - nullfrac - mcvsum;
-        selec += mcvsel;
-    }
-    else
-        selec = DEFAULT_PARENT_SEL;
-
-    ReleaseVariableStats(vardata);
-
-    /* result should be in range, but make sure... */
-    CLAMP_PROBABILITY(selec);
+    /* Use generic restriction selectivity logic, with default 0.001. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            0.001);

     PG_RETURN_FLOAT8((float8) selec);
 }
diff --git a/doc/src/sgml/xoper.sgml b/doc/src/sgml/xoper.sgml
index 132056f..56b0849 100644
--- a/doc/src/sgml/xoper.sgml
+++ b/doc/src/sgml/xoper.sgml
@@ -283,6 +283,18 @@ column OP constant
    </para>

    <para>
+    Another useful built-in selectivity estimation function
+    is <function>matchingsel</function>, which will work for almost any
+    binary operator, if standard MCV and/or histogram statistics are
+    collected for the input data type(s).  Its default estimate is set to
+    twice the default estimate used in <function>eqsel</function>, making
+    it most suitable for comparison operators that are somewhat less
+    strict than equality.  (Or you could call the
+    underlying <function>generic_restriction_selectivity</function>
+    function, providing a different default estimate.)
+   </para>
+
+   <para>
     There are additional selectivity estimation functions designed for geometric
     operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>,
<function>positionsel</function>,
     and <function>contsel</function>.  At this writing these are just stubs, but you might want
@@ -319,6 +331,7 @@ table1.column1 OP table2.column2
       <member><function>scalarlejoinsel</function> for <literal><=</literal></member>
       <member><function>scalargtjoinsel</function> for <literal>></literal></member>
       <member><function>scalargejoinsel</function> for <literal>>=</literal></member>
+      <member><function>matchingjoinsel</function> for generic matching operators</member>
       <member><function>areajoinsel</function> for 2D area-based comparisons</member>
       <member><function>positionjoinsel</function> for 2D position-based comparisons</member>
       <member><function>contjoinsel</function> for 2D containment-based comparisons</member>
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e62b69d..4fdcb07 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -830,6 +830,132 @@ histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
 }

 /*
+ *    generic_restriction_selectivity        - Selectivity for almost anything
+ *
+ * This function estimates selectivity for operators that we don't have any
+ * special knowledge about, but are on data types that we collect standard
+ * MCV and/or histogram statistics for.  (Additional assumptions are that
+ * the operator is strict and immutable, or at least stable.)
+ *
+ * If we have "VAR OP CONST" or "CONST OP VAR", selectivity is estimated by
+ * applying the operator to each element of the column's MCV and/or histogram
+ * stats, and merging the results using the assumption that the histogram is
+ * a reasonable random sample of the column's non-MCV population.  Note that
+ * if the operator's semantics are related to the histogram ordering, this
+ * might not be such a great assumption; other functions such as
+ * scalarineqsel() are probably a better match in such cases.
+ *
+ * Otherwise, fall back to the default selectivity provided by the caller.
+ */
+double
+generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                List *args, int varRelid,
+                                double default_selectivity)
+{
+    double        selec;
+    VariableStatData vardata;
+    Node       *other;
+    bool        varonleft;
+
+    /*
+     * If expression is not variable OP something or something OP variable,
+     * then punt and return the default estimate.
+     */
+    if (!get_restriction_variable(root, args, varRelid,
+                                  &vardata, &other, &varonleft))
+        return default_selectivity;
+
+    /*
+     * If the something is a NULL constant, assume operator is strict and
+     * return zero, ie, operator will never return TRUE.
+     */
+    if (IsA(other, Const) &&
+        ((Const *) other)->constisnull)
+    {
+        ReleaseVariableStats(vardata);
+        return 0.0;
+    }
+
+    if (IsA(other, Const))
+    {
+        /* Variable is being compared to a known non-null constant */
+        Datum        constval = ((Const *) other)->constvalue;
+        FmgrInfo    opproc;
+        double        mcvsum;
+        double        mcvsel;
+        double        nullfrac;
+        int            hist_size;
+
+        fmgr_info(get_opcode(operator), &opproc);
+
+        /*
+         * Calculate the selectivity for the column's most common values.
+         */
+        mcvsel = mcv_selectivity(&vardata, &opproc, constval, varonleft,
+                                 &mcvsum);
+
+        /*
+         * If the histogram is large enough, see what fraction of it matches
+         * the query, and assume that's representative of the non-MCV
+         * population.  Otherwise use the default selectivity for the non-MCV
+         * population.
+         */
+        selec = histogram_selectivity(&vardata, &opproc,
+                                      constval, varonleft,
+                                      10, 1, &hist_size);
+        if (selec < 0)
+        {
+            /* Nope, fall back on default */
+            selec = default_selectivity;
+        }
+        else if (hist_size < 100)
+        {
+            /*
+             * For histogram sizes from 10 to 100, we combine the histogram
+             * and default selectivities, putting increasingly more trust in
+             * the histogram for larger sizes.
+             */
+            double        hist_weight = hist_size / 100.0;
+
+            selec = selec * hist_weight +
+                default_selectivity * (1.0 - hist_weight);
+        }
+
+        /* In any case, don't believe extremely small or large estimates. */
+        if (selec < 0.0001)
+            selec = 0.0001;
+        else if (selec > 0.9999)
+            selec = 0.9999;
+
+        /* Don't forget to account for nulls. */
+        if (HeapTupleIsValid(vardata.statsTuple))
+            nullfrac = ((Form_pg_statistic) GETSTRUCT(vardata.statsTuple))->stanullfrac;
+        else
+            nullfrac = 0.0;
+
+        /*
+         * Now merge the results from the MCV and histogram calculations,
+         * realizing that the histogram covers only the non-null values that
+         * are not listed in MCV.
+         */
+        selec *= 1.0 - nullfrac - mcvsum;
+        selec += mcvsel;
+    }
+    else
+    {
+        /* Comparison value is not constant, so we can't do anything */
+        selec = default_selectivity;
+    }
+
+    ReleaseVariableStats(vardata);
+
+    /* result should be in range, but make sure... */
+    CLAMP_PROBABILITY(selec);
+
+    return selec;
+}
+
+/*
  *    ineq_histogram_selectivity    - Examine the histogram for scalarineqsel
  *
  * Determine the fraction of the variable's histogram population that
@@ -2917,6 +3043,40 @@ fail:


 /*
+ *    matchingsel -- generic matching-operator selectivity support
+ *
+ * Use these for any operators that (a) are on data types for which we collect
+ * standard statistics, and (b) have behavior for which the default estimate
+ * (twice DEFAULT_EQ_SEL) is sane.  Typically that is good for match-like
+ * operators.
+ */
+
+Datum
+matchingsel(PG_FUNCTION_ARGS)
+{
+    PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+    Oid            operator = PG_GETARG_OID(1);
+    List       *args = (List *) PG_GETARG_POINTER(2);
+    int            varRelid = PG_GETARG_INT32(3);
+    double        selec;
+
+    /* Use generic restriction selectivity logic. */
+    selec = generic_restriction_selectivity(root, operator,
+                                            args, varRelid,
+                                            DEFAULT_MATCHING_SEL);
+
+    PG_RETURN_FLOAT8((float8) selec);
+}
+
+Datum
+matchingjoinsel(PG_FUNCTION_ARGS)
+{
+    /* Just punt, for the moment. */
+    PG_RETURN_FLOAT8(DEFAULT_MATCHING_SEL);
+}
+
+
+/*
  * Helper routine for estimate_num_groups: add an item to a list of
  * GroupVarInfos, but only if it's not known equal to any of the existing
  * entries.
diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 7c135da..65c7fed 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -3016,18 +3016,21 @@
 { oid => '3693', descr => 'contains',
   oprname => '@>', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '<@(tsquery,tsquery)',
-  oprcode => 'tsq_mcontains', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontains', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3694', descr => 'is contained by',
   oprname => '<@', oprleft => 'tsquery', oprright => 'tsquery',
   oprresult => 'bool', oprcom => '@>(tsquery,tsquery)',
-  oprcode => 'tsq_mcontained', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'tsq_mcontained', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3762', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'text', oprresult => 'bool',
-  oprcode => 'ts_match_tt', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'ts_match_tt', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3763', descr => 'text search match',
   oprname => '@@', oprleft => 'text', oprright => 'tsquery',
-  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprresult => 'bool', oprcode => 'ts_match_tq', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },

 # generic record comparison operators
 { oid => '2988', oid_symbol => 'RECORD_EQ_OP', descr => 'equal',
@@ -3178,7 +3181,8 @@
 { oid => '3897', descr => 'is adjacent to',
   oprname => '-|-', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'bool', oprcom => '-|-(anyrange,anyrange)',
-  oprcode => 'range_adjacent', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'range_adjacent', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3898', descr => 'range union',
   oprname => '+', oprleft => 'anyrange', oprright => 'anyrange',
   oprresult => 'anyrange', oprcom => '+(anyrange,anyrange)',
@@ -3258,22 +3262,23 @@
 { oid => '3246', descr => 'contains',
   oprname => '@>', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '<@(jsonb,jsonb)', oprcode => 'jsonb_contains',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
 { oid => '3247', descr => 'key exists',
   oprname => '?', oprleft => 'jsonb', oprright => 'text', oprresult => 'bool',
-  oprcode => 'jsonb_exists', oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3248', descr => 'any key exists',
   oprname => '?|', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_any', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_any', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3249', descr => 'all keys exist',
   oprname => '?&', oprleft => 'jsonb', oprright => '_text', oprresult => 'bool',
-  oprcode => 'jsonb_exists_all', oprrest => 'contsel',
-  oprjoin => 'contjoinsel' },
+  oprcode => 'jsonb_exists_all', oprrest => 'matchingsel',
+  oprjoin => 'matchingjoinsel' },
 { oid => '3250', descr => 'is contained by',
   oprname => '<@', oprleft => 'jsonb', oprright => 'jsonb', oprresult => 'bool',
   oprcom => '@>(jsonb,jsonb)', oprcode => 'jsonb_contained',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
 { oid => '3284', descr => 'concatenate',
   oprname => '||', oprleft => 'jsonb', oprright => 'jsonb',
   oprresult => 'jsonb', oprcode => 'jsonb_concat' },
@@ -3292,10 +3297,10 @@
 { oid => '4012', descr => 'jsonpath exists',
   oprname => '@?', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_exists_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },
 { oid => '4013', descr => 'jsonpath match',
   oprname => '@@', oprleft => 'jsonb', oprright => 'jsonpath',
   oprresult => 'bool', oprcode => 'jsonb_path_match_opr(jsonb,jsonpath)',
-  oprrest => 'contsel', oprjoin => 'contjoinsel' },
+  oprrest => 'matchingsel', oprjoin => 'matchingjoinsel' },

 ]
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a6a708c..fe3df44 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10628,6 +10628,15 @@
   prosrc => 'shift_jis_2004_to_euc_jis_2004',
   probin => '$libdir/euc2004_sjis2004' },

+{ oid => '8387',
+  descr => 'restriction selectivity for generic matching operators',
+  proname => 'matchingsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int4', prosrc => 'matchingsel' },
+{ oid => '8388', descr => 'join selectivity for generic matching operators',
+  proname => 'matchingjoinsel', provolatile => 's', prorettype => 'float8',
+  proargtypes => 'internal oid internal int2 internal',
+  prosrc => 'matchingjoinsel' },
+
 # replication/origin.h
 { oid => '6003', descr => 'create a replication origin',
   proname => 'pg_replication_origin_create', provolatile => 'v',
diff --git a/src/include/utils/selfuncs.h b/src/include/utils/selfuncs.h
index 1c9570f..1dd3ac1 100644
--- a/src/include/utils/selfuncs.h
+++ b/src/include/utils/selfuncs.h
@@ -42,6 +42,9 @@
 /* default selectivity estimate for pattern-match operators such as LIKE */
 #define DEFAULT_MATCH_SEL    0.005

+/* default selectivity estimate for other matching operators */
+#define DEFAULT_MATCHING_SEL    0.010
+
 /* default number of distinct values in a table */
 #define DEFAULT_NUM_DISTINCT  200

@@ -148,6 +151,9 @@ extern double histogram_selectivity(VariableStatData *vardata, FmgrInfo *opproc,
                                     Datum constval, bool varonleft,
                                     int min_hist_size, int n_skip,
                                     int *hist_size);
+extern double generic_restriction_selectivity(PlannerInfo *root, Oid operator,
+                                              List *args, int varRelid,
+                                              double default_selectivity);
 extern double ineq_histogram_selectivity(PlannerInfo *root,
                                          VariableStatData *vardata,
                                          FmgrInfo *opproc, bool isgt, bool iseq,
diff --git a/contrib/hstore/hstore--1.6--1.7.sql b/contrib/hstore/hstore--1.6--1.7.sql
index 0d126ef..3e5cb67 100644
--- a/contrib/hstore/hstore--1.6--1.7.sql
+++ b/contrib/hstore/hstore--1.6--1.7.sql
@@ -10,3 +10,18 @@ LANGUAGE C IMMUTABLE PARALLEL SAFE;

 ALTER OPERATOR FAMILY gist_hstore_ops USING gist
 ADD FUNCTION 10 (hstore) ghstore_options (internal);
+
+ALTER OPERATOR ? (hstore, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ?| (hstore, text[])
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ?& (hstore, text[])
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @> (hstore, hstore)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <@ (hstore, hstore)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @ (hstore, hstore)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ~ (hstore, hstore)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
diff --git a/contrib/ltree/ltree--1.1--1.2.sql b/contrib/ltree/ltree--1.1--1.2.sql
index 7b4ea99..186381e 100644
--- a/contrib/ltree/ltree--1.1--1.2.sql
+++ b/contrib/ltree/ltree--1.1--1.2.sql
@@ -19,3 +19,84 @@ ADD FUNCTION 10 (ltree) ltree_gist_options (internal);
 ALTER OPERATOR FAMILY gist__ltree_ops USING gist
 ADD FUNCTION 10 (_ltree) _ltree_gist_options (internal);

+ALTER OPERATOR < (ltree, ltree)
+  SET (RESTRICT = scalarltsel, JOIN = scalarltjoinsel);
+ALTER OPERATOR <= (ltree, ltree)
+  SET (RESTRICT = scalarlesel, JOIN = scalarlejoinsel);
+ALTER OPERATOR >= (ltree, ltree)
+  SET (RESTRICT = scalargesel, JOIN = scalargejoinsel);
+ALTER OPERATOR > (ltree, ltree)
+  SET (RESTRICT = scalargtsel, JOIN = scalargtjoinsel);
+
+ALTER OPERATOR @> (ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@> (ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <@ (ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^<@ (ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ~ (ltree, lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ~ (lquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^~ (ltree, lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^~ (lquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ? (ltree, _lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ? (_lquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^? (ltree, _lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^? (_lquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @ (ltree, ltxtquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @ (ltxtquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@ (ltree, ltxtquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@ (ltxtquery, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @> (_ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <@ (ltree, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <@ (_ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @> (ltree, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ~ (_ltree, lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ~ (lquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ? (_ltree, _lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ? (_lquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @ (_ltree, ltxtquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR @ (ltxtquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@> (_ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^<@ (ltree, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^<@ (_ltree, ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@> (ltree, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^~ (_ltree, lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^~ (lquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^? (_ltree, _lquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^? (_lquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@ (_ltree, ltxtquery)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR ^@ (ltxtquery, _ltree)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
diff --git a/contrib/ltree/ltree_op.c b/contrib/ltree/ltree_op.c
index f606761..4ac2ed5 100644
--- a/contrib/ltree/ltree_op.c
+++ b/contrib/ltree/ltree_op.c
@@ -568,6 +568,9 @@ ltree2text(PG_FUNCTION_ARGS)

 /*
  *    ltreeparentsel - Selectivity of parent relationship for ltree data types.
+ *
+ * This function is not used anymore, if the ltree extension has been
+ * updated to 1.2 or later.
  */
 Datum
 ltreeparentsel(PG_FUNCTION_ARGS)
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 5746be0..923c326 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -5170,16 +5170,15 @@ SELECT similarity('Szczecin', 'Warsaw');
 EXPLAIN (COSTS OFF)
 SELECT DISTINCT city, similarity(city, 'Warsaw'), show_limit()
   FROM restaurants WHERE city % 'Warsaw';
-                         QUERY PLAN
--------------------------------------------------------------
- Unique
-   ->  Sort
-         Sort Key: city, (similarity(city, 'Warsaw'::text))
-         ->  Bitmap Heap Scan on restaurants
-               Recheck Cond: (city % 'Warsaw'::text)
-               ->  Bitmap Index Scan on restaurants_city_idx
-                     Index Cond: (city % 'Warsaw'::text)
-(7 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ HashAggregate
+   Group Key: city, similarity(city, 'Warsaw'::text), show_limit()
+   ->  Bitmap Heap Scan on restaurants
+         Recheck Cond: (city % 'Warsaw'::text)
+         ->  Bitmap Index Scan on restaurants_city_idx
+               Index Cond: (city % 'Warsaw'::text)
+(6 rows)

 SELECT set_limit(0.3);
  set_limit
diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
index 3804c3b..284f88d 100644
--- a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -10,3 +10,14 @@ LANGUAGE C IMMUTABLE PARALLEL SAFE;

 ALTER OPERATOR FAMILY gist_trgm_ops USING gist
 ADD FUNCTION 10 (text) gtrgm_options (internal);
+
+ALTER OPERATOR % (text, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <% (text, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR %> (text, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR <<% (text, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);
+ALTER OPERATOR %>> (text, text)
+  SET (RESTRICT = matchingsel, JOIN = matchingjoinsel);

Re: Less-silly selectivity for JSONB matching operators

From
Alexey Bashtanov
Date:
On 31/03/2020 18:53, Tom Lane wrote:
> Renamed "matchsel" to "matchingsel" etc, added DEFAULT_MATCHING_SEL,
> rebased over commit 911e70207.  Since that commit already created
> new versions of the relevant contrib modules, I think we can just
> redefine what those versions contain, rather than making yet-newer
> versions.  (Of course, that assumes we're going to include this in
> v13.)

Looks good to me.

Best, Alex



Re: Less-silly selectivity for JSONB matching operators

From
Tom Lane
Date:
Alexey Bashtanov <bashtanov@imap.cc> writes:
> On 31/03/2020 18:53, Tom Lane wrote:
>> Renamed "matchsel" to "matchingsel" etc, added DEFAULT_MATCHING_SEL,
>> rebased over commit 911e70207.  Since that commit already created
>> new versions of the relevant contrib modules, I think we can just
>> redefine what those versions contain, rather than making yet-newer
>> versions.  (Of course, that assumes we're going to include this in
>> v13.)

> Looks good to me.

Pushed, thanks for reviewing!

            regards, tom lane