Home > mailing lists
Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column
Date	September 9 02:37:01
Msg-id	987464.1757374621@sss.pgh.pa.us Whole thread Raw
In response to	Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column (Matt Long <matt@mattlong.org>)
List	pgsql-performance
Tree view
Matt Long <matt@mattlong.org> writes:
> Not to let perfect be the enemy of better, but we're facing a variant of
> this issue that would not be addressed by the proposed patch.
> ...
> In this case, the effects of the proposed patch are not applied since the
> most_common_elems array is not empty. I'm not a statistician, so maybe this
> wouldn't be valid, but it seems like using the highest frequency of an
> element that did not qualify for the mce list instead of the 0.5% default
> frequency could be an elegant, but more invasive solution.

Yeah, I think you are quite right: we can apply this idea not only
when the MCE list is empty, but whenever we didn't have to truncate
the MCE list.  In that case we know there are no additional element
values that exceed the cutoff frequency, and that's what the
selectivity functions want to know.

Nosing around in the code that uses STATISTIC_KIND_MCELEM entries,
I spotted two additional issues that the attached v2 patch addresses:

* ts_typanalyze/ts_selfuncs have code essentially identical to the
array case, and should receive the same treatment.

* The selectivity functions believe that the upper bound on the
frequency of non-MCEs is minfreq / 2, not the stored minfreq.
This seems like complete brain fade: there could easily be
elements with frequency just less than minfreq, and probably are
if the data distribution follows Zipf's law.  I did not dig into
the git history, but I wonder if the divide-by-two business
predates the introduction of the lossy-counting algorithm, and
if so whether it was less insane with the original collection
algorithm.  In any case, this patch removes the divisions by 2,
and makes some nearby cosmetic improvements.

Many thanks for the suggestion!

            regards, tom lane

From b2c31c0d31c1edb931d4613b9706efe7ce04edbf Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 8 Sep 2025 19:19:00 -0400
Subject: [PATCH v2] Track the maximum possible frequency of non-MCE array
 elements.

The lossy-counting algorithm that ANALYZE uses to identify most-common
array elements has a notion of cutoff frequency: elements with
frequency greater than that are guaranteed to be collected, elements
with smaller frequencies are not.  In cases where we find fewer MCEs
than the stats target would permit us to store, the cutoff frequency
provides valuable additional information, to wit that there are no
non-MCEs with frequency greater than that.  What the selectivity
estimation functions actually use the "minfreq" entry for is as a
ceiling on the possible frequency of non-MCEs, so using the cutoff
rather than the lowest stored MCE frequency provides a tighter bound
and more accurate estimates.

Therefore, instead of redundantly storing the minimum observed MCE
frequency, store the cutoff frequency when there are fewer tracked
values than we want.  (When there are more, then of course we cannot
assert that no non-stored elements are above the cutoff frequency,
since we're throwing away some that are; so we still use the
minimum stored frequency in that case.)

Notably, this works even when none of the values are common enough
to be called MCEs.  In such cases we stored nothing in the
STATISTIC_KIND_MCELEM pg_statistic slot, which resulted in the
selectivity functions falling back to default estimates.  So in that
case we want to construct a STATISTIC_KIND_MCELEM entry that contains
no "values" but does have "numbers", to wit the three extra numbers
that the MCELEM entry type defines.  A small obstacle is that
update_attstats() has traditionally stored a null, not an empty array,
when passed zero "values" for a slot.  That gives rise to an MCELEM
entry that get_attstatsslot() will spit up on.  The least risky
solution seems to be to adjust update_attstats() so that it will emit
a non-null (but possibly empty) array when the passed stavalues array
pointer isn't NULL, rather than conditioning that on numvalues > 0.
In other existing cases I don't believe that that changes anything.
For consistency, handle the stanumbers array the same way.

In addition to adjusting the typanalyze routines that collect
STATISTIC_KIND_MCELEM data, adjust the routines that use the data
to assume that minfreq is the ceiling on the frequency of non-MCE
values.  Previously they assumed that the ceiling is minfreq / 2,
which seems obviously wrong.  (Perhaps it was correct with the
original implementation of MCE collection, but surely it is not
with the lossy-counting algorithm.)

Thanks to Matt Long for the suggestion that we could apply this
idea even when there are more than zero MCEs.

Reported-by: Mark Frost <FROSTMAR@uk.ibm.com>
Reported-by: Matt Long <matt@mattlong.org>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/PH3PPF1C905D6E6F24A5C1A1A1D8345B593E16FA@PH3PPF1C905D6E6.namprd15.prod.outlook.com
---
 contrib/intarray/_int_selfuncs.c         |  8 +++----
 src/backend/commands/analyze.c           |  7 +++---
 src/backend/tsearch/ts_selfuncs.c        | 10 ++++-----
 src/backend/tsearch/ts_typanalyze.c      | 27 +++++++++++++++++++-----
 src/backend/utils/adt/array_selfuncs.c   | 14 ++++++------
 src/backend/utils/adt/array_typanalyze.c | 27 +++++++++++++++++++-----
 src/include/catalog/pg_statistic.h       |  4 ++++
 7 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
index 9bf64486242..c983de998dc 100644
--- a/contrib/intarray/_int_selfuncs.c
+++ b/contrib/intarray/_int_selfuncs.c
@@ -210,8 +210,8 @@ _int_matchsel(PG_FUNCTION_ARGS)
              */
             if (sslot.nnumbers == sslot.nvalues + 3)
             {
-                /* Grab the lowest frequency. */
-                minfreq = sslot.numbers[sslot.nnumbers - (sslot.nnumbers - sslot.nvalues)];
+                /* Grab the minimal MCE frequency. */
+                minfreq = sslot.numbers[sslot.nvalues];

                 mcelems = sslot.values;
                 mcefreqs = sslot.numbers;
@@ -270,9 +270,9 @@ int_query_opr_selec(ITEM *item, Datum *mcelems, float4 *mcefreqs,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
-            selec = Min(DEFAULT_EQ_SEL, minfreq / 2);
+            selec = Min(DEFAULT_EQ_SEL, minfreq);
         }
     }
     else if (item->type == OPR)
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8ea2913d906..12b4f3fd36e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1711,10 +1711,9 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stanumbers1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            int            nnum = stats->numnumbers[k];
-
-            if (nnum > 0)
+            if (stats->stanumbers[k] != NULL)
             {
+                int            nnum = stats->numnumbers[k];
                 Datum       *numdatums = (Datum *) palloc(nnum * sizeof(Datum));
                 ArrayType  *arry;

@@ -1732,7 +1731,7 @@ update_attstats(Oid relid, bool inh, int natts, VacAttrStats **vacattrstats)
         i = Anum_pg_statistic_stavalues1 - 1;
         for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
         {
-            if (stats->numvalues[k] > 0)
+            if (stats->stavalues[k] != NULL)
             {
                 ArrayType  *arry;

diff --git a/src/backend/tsearch/ts_selfuncs.c b/src/backend/tsearch/ts_selfuncs.c
index 453a5e5c2ea..f108ab85f27 100644
--- a/src/backend/tsearch/ts_selfuncs.c
+++ b/src/backend/tsearch/ts_selfuncs.c
@@ -239,8 +239,8 @@ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
     }

     /*
-     * Grab the lowest frequency. compute_tsvector_stats() stored it for us in
-     * the one before the last cell of the Numbers array. See ts_typanalyze.c
+     * Grab the lowest MCE frequency. compute_tsvector_stats() stored it for
+     * us in the one before the last cell of the Numbers array.
      */
     minfreq = numbers[nnumbers - 2];

@@ -348,7 +348,7 @@ tsquery_opr_selec(QueryItem *item, char *operand,
              * preserves the property that "word:*" should be estimated to
              * match at least as many rows as "word" would be.
              */
-            selec = Max(Min(DEFAULT_TS_MATCH_SEL, minfreq / 2), selec);
+            selec = Max(Min(DEFAULT_TS_MATCH_SEL, minfreq), selec);
         }
         else
         {
@@ -375,9 +375,9 @@ tsquery_opr_selec(QueryItem *item, char *operand,
             {
                 /*
                  * The element is not in MCELEM.  Punt, but assume that the
-                 * selectivity cannot be more than minfreq / 2.
+                 * selectivity cannot be more than minfreq.
                  */
-                selec = Min(DEFAULT_TS_MATCH_SEL, minfreq / 2);
+                selec = Min(DEFAULT_TS_MATCH_SEL, minfreq);
             }
         }
     }
diff --git a/src/backend/tsearch/ts_typanalyze.c b/src/backend/tsearch/ts_typanalyze.c
index c5a71331ce8..c8dbea96c22 100644
--- a/src/backend/tsearch/ts_typanalyze.c
+++ b/src/backend/tsearch/ts_typanalyze.c
@@ -312,7 +312,7 @@ compute_tsvector_stats(VacAttrStats *stats,
         /*
          * Construct an array of the interesting hashtable items, that is,
          * those meeting the cutoff frequency (s - epsilon)*N.  Also identify
-         * the minimum and maximum frequencies among these items.
+         * the maximum frequency among these items.
          *
          * Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
          * frequency is 9*N / bucket_width.
@@ -324,14 +324,12 @@ compute_tsvector_stats(VacAttrStats *stats,

         hash_seq_init(&scan_status, lexemes_tab);
         track_len = 0;
-        minfreq = lexeme_no;
         maxfreq = 0;
         while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
         {
             if (item->frequency > cutoff_freq)
             {
                 sort_table[track_len++] = item;
-                minfreq = Min(minfreq, item->frequency);
                 maxfreq = Max(maxfreq, item->frequency);
             }
         }
@@ -346,19 +344,38 @@ compute_tsvector_stats(VacAttrStats *stats,
          * If we obtained more lexemes than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * If we did not find more elements than we want, then it is safe to
+         * assume that the stored MCE array will contain every element with
+         * frequency above the cutoff.  In that case, rather than storing the
+         * smallest frequency we are keeping, we want to store the minimum
+         * frequency that would have been accepted as a valid MCE.  The
+         * selectivity functions can assume that that is an upper bound on the
+         * frequency of elements not present in the array.
+         *
+         * If we found no candidate MCEs at all, we still want to record the
+         * cutoff frequency, since it's still valid to assume that no element
+         * has frequency more than that.
          */
         if (num_mcelem < track_len)
         {
             qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
                                 trackitem_compare_frequencies_desc, NULL);
-            /* reset minfreq to the smallest frequency we're keeping */
+            /* set minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            /* set minfreq to the minimum frequency above the cutoff */
+            minfreq = cutoff_freq + 1;
+            /* ensure maxfreq is nonzero, too */
+            if (track_len == 0)
+                maxfreq = minfreq;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;
diff --git a/src/backend/utils/adt/array_selfuncs.c b/src/backend/utils/adt/array_selfuncs.c
index a69a84c2aee..9a9936e5eda 100644
--- a/src/backend/utils/adt/array_selfuncs.c
+++ b/src/backend/utils/adt/array_selfuncs.c
@@ -544,13 +544,13 @@ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,

     if (numbers)
     {
-        /* Grab the lowest observed frequency */
+        /* Grab the minimal MCE frequency */
         minfreq = numbers[nmcelem];
     }
     else
     {
         /* Without statistics make some default assumptions */
-        minfreq = 2 * (float4) DEFAULT_CONTAIN_SEL;
+        minfreq = (float4) DEFAULT_CONTAIN_SEL;
     }

     /* Decide whether it is faster to use binary search or not. */
@@ -622,9 +622,9 @@ mcelem_array_contain_overlap_selec(Datum *mcelem, int nmcelem,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
-            elem_selec = Min(DEFAULT_CONTAIN_SEL, minfreq / 2);
+            elem_selec = Min(DEFAULT_CONTAIN_SEL, minfreq);
         }

         /*
@@ -728,7 +728,7 @@ mcelem_array_contained_selec(Datum *mcelem, int nmcelem,

     /*
      * Grab some of the summary statistics that compute_array_stats() stores:
-     * lowest frequency, frequency of null elements, and average distinct
+     * lowest MCE frequency, frequency of null elements, and average distinct
      * element count.
      */
     minfreq = numbers[nmcelem];
@@ -803,10 +803,10 @@ mcelem_array_contained_selec(Datum *mcelem, int nmcelem,
         {
             /*
              * The element is not in MCELEM.  Punt, but assume that the
-             * selectivity cannot be more than minfreq / 2.
+             * selectivity cannot be more than minfreq.
              */
             elem_selec[unique_nitems] = Min(DEFAULT_CONTAIN_SEL,
-                                            minfreq / 2);
+                                            minfreq);
         }

         unique_nitems++;
diff --git a/src/backend/utils/adt/array_typanalyze.c b/src/backend/utils/adt/array_typanalyze.c
index 6f61629b977..560b27f3ca7 100644
--- a/src/backend/utils/adt/array_typanalyze.c
+++ b/src/backend/utils/adt/array_typanalyze.c
@@ -461,7 +461,7 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
         /*
          * Construct an array of the interesting hashtable items, that is,
          * those meeting the cutoff frequency (s - epsilon)*N.  Also identify
-         * the minimum and maximum frequencies among these items.
+         * the maximum frequency among these items.
          *
          * Since epsilon = s/10 and bucket_width = 1/epsilon, the cutoff
          * frequency is 9*N / bucket_width.
@@ -473,14 +473,12 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,

         hash_seq_init(&scan_status, elements_tab);
         track_len = 0;
-        minfreq = element_no;
         maxfreq = 0;
         while ((item = (TrackItem *) hash_seq_search(&scan_status)) != NULL)
         {
             if (item->frequency > cutoff_freq)
             {
                 sort_table[track_len++] = item;
-                minfreq = Min(minfreq, item->frequency);
                 maxfreq = Max(maxfreq, item->frequency);
             }
         }
@@ -497,19 +495,38 @@ compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
          * If we obtained more elements than we really want, get rid of those
          * with least frequencies.  The easiest way is to qsort the array into
          * descending frequency order and truncate the array.
+         *
+         * If we did not find more elements than we want, then it is safe to
+         * assume that the stored MCE array will contain every element with
+         * frequency above the cutoff.  In that case, rather than storing the
+         * smallest frequency we are keeping, we want to store the minimum
+         * frequency that would have been accepted as a valid MCE.  The
+         * selectivity functions can assume that that is an upper bound on the
+         * frequency of elements not present in the array.
+         *
+         * If we found no candidate MCEs at all, we still want to record the
+         * cutoff frequency, since it's still valid to assume that no element
+         * has frequency more than that.
          */
         if (num_mcelem < track_len)
         {
             qsort_interruptible(sort_table, track_len, sizeof(TrackItem *),
                                 trackitem_compare_frequencies_desc, NULL);
-            /* reset minfreq to the smallest frequency we're keeping */
+            /* set minfreq to the smallest frequency we're keeping */
             minfreq = sort_table[num_mcelem - 1]->frequency;
         }
         else
+        {
             num_mcelem = track_len;
+            /* set minfreq to the minimum frequency above the cutoff */
+            minfreq = cutoff_freq + 1;
+            /* ensure maxfreq is nonzero, too */
+            if (track_len == 0)
+                maxfreq = minfreq;
+        }

         /* Generate MCELEM slot entry */
-        if (num_mcelem > 0)
+        if (num_mcelem >= 0)
         {
             MemoryContext old_context;
             Datum       *mcelem_values;
diff --git a/src/include/catalog/pg_statistic.h b/src/include/catalog/pg_statistic.h
index 4216e27a8a4..444dc27dcad 100644
--- a/src/include/catalog/pg_statistic.h
+++ b/src/include/catalog/pg_statistic.h
@@ -240,6 +240,10 @@ DECLARE_FOREIGN_KEY((starelid, staattnum), pg_attribute, (attrelid, attnum));
  * the fraction of non-null rows that contain at least one null element).  If
  * this member is omitted, the column is presumed to contain no null elements.
  *
+ * Starting in v19, the first extra member can be smaller than the smallest
+ * frequency of any stored MCE, indicating that it's known that no element
+ * not present in the MCE array has frequency greater than that value.
+ *
  * Note: in current usage for tsvector columns, the stavalues elements are of
  * type text, even though their representation within tsvector is not
  * exactly text.
--
2.43.7
pgsql-performance by date:
From: Matt Long
Date: 08 September, 22:13:41
Subject: Re: Poor row estimates from planner, stat `most_common_elems` sometimes missing for a text[] column