Thread: extend VacAttrStats to allow stavalues of different types

extend VacAttrStats to allow stavalues of different types

From
Jan Urbański
Date:
Following the conclusion here:
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00273.php
here's a patch that extends VacAttrStats to allow typanalyze functions
to store statistic values of different types than the underlying column.

The XXX comment can be taken into consideration or just dropped as
unimportant.

Cheers,
--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
*** src/backend/commands/analyze.c
--- src/backend/commands/analyze.c    2008-05-15 20:08:27.000000000 +0200
***************
*** 1319,1350 ****
              {
                  ArrayType  *arry;

!                 /*
!                  * XXX horrible hack - we're creating a pg_statistic tuple for
!                  * a tsvector, but need to store an array of cstrings.
!                  *
!                  * Temporary measures...
!                  */
!                 if (stats->stakind[0] == STATISTIC_KIND_MCL)
!                 {
!                     elog(NOTICE, "severly breaking stuff by brute force hackage");
!                     arry = construct_array(stats->stavalues[k],
!                                            stats->numvalues[k],
!                                            CSTRINGOID,
!                                            -2, /* typlen, -2 for cstring, per
!                                                 * comment from pg_type.h */
!                                            false,
!                                            'c');
!                 }
!                 else
!                 {
!                     arry = construct_array(stats->stavalues[k],
!                                            stats->numvalues[k],
!                                            stats->attr->atttypid,
!                                            stats->attrtype->typlen,
!                                            stats->attrtype->typbyval,
!                                            stats->attrtype->typalign);
!                 }
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
--- 1319,1330 ----
              {
                  ArrayType  *arry;

!                 arry = construct_array(stats->stavalues[k],
!                                        stats->numvalues[k],
!                                        stats->statypid[k],
!                                        stats->statyplen[k],
!                                        stats->statypbyval[k],
!                                        stats->statypalign[k]);
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
***************
*** 1874,1879 ****
--- 1854,1867 ----
              stats->numnumbers[0] = num_mcv;
              stats->stavalues[0] = mcv_values;
              stats->numvalues[0] = num_mcv;
+             /*
+              * MCV entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[0] = stats->attr->atttypid;
+             stats->statyplen[0] = stats->attrtype->typlen;
+             stats->statypbyval[0] = stats->attrtype->typbyval;
+             stats->statypalign[0] = stats->attrtype->typalign;
          }
      }
      else if (null_cnt > 0)
***************
*** 2217,2222 ****
--- 2205,2218 ----
              stats->numnumbers[slot_idx] = num_mcv;
              stats->stavalues[slot_idx] = mcv_values;
              stats->numvalues[slot_idx] = num_mcv;
+             /*
+              * MCV entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[slot_idx] = stats->attr->atttypid;
+             stats->statyplen[slot_idx] = stats->attrtype->typlen;
+             stats->statypbyval[slot_idx] = stats->attrtype->typbyval;
+             stats->statypalign[slot_idx] = stats->attrtype->typalign;
              slot_idx++;
          }

***************
*** 2301,2306 ****
--- 2297,2310 ----
              stats->staop[slot_idx] = mystats->ltopr;
              stats->stavalues[slot_idx] = hist_values;
              stats->numvalues[slot_idx] = num_hist;
+             /*
+              * Histogram entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[slot_idx] = stats->attr->atttypid;
+             stats->statyplen[slot_idx] = stats->attrtype->typlen;
+             stats->statypbyval[slot_idx] = stats->attrtype->typbyval;
+             stats->statypalign[slot_idx] = stats->attrtype->typalign;
              slot_idx++;
          }

*** src/include/commands/vacuum.h
--- src/include/commands/vacuum.h    2008-05-16 00:11:20.000000000 +0200
***************
*** 94,99 ****
--- 94,119 ----
      Datum       *stavalues[STATISTIC_NUM_SLOTS];

      /*
+      * These fields describe the stavalues[n] element types. They will
+      * typically be the same as the column's that's underlying the slot, but
+      * sometimes a custom typanalyze function might want to store an array of
+      * something other that the analyzed column's elements. This must be filled
+      * in by the compute_stats routine.
+      *
+      * XXX or maybe fall back on attrtype-> stuff when these are NULL? That way
+      * we won't break other people's custom typanalyze functions. Not sure if
+      * any exist, though.
+      *
+      * Another concern is that typlen, typbyval and typalign are reduntant
+      * given the OID. But the caller is in better position to cache this so
+      * maybe we shouldn't worry about it?
+      */
+     Oid            statypid[STATISTIC_NUM_SLOTS];
+     int2        statyplen[STATISTIC_NUM_SLOTS];
+     bool        statypbyval[STATISTIC_NUM_SLOTS];
+     char        statypalign[STATISTIC_NUM_SLOTS];
+
+     /*
       * These fields are private to the main ANALYZE code and should not be
       * looked at by type-specific functions.
       */

Re: extend VacAttrStats to allow stavalues of different types

From
Jan Urbański
Date:
Jan Urbański wrote:
> Following the conclusion here:
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00273.php
> here's a patch that extends VacAttrStats to allow typanalyze functions
> to store statistic values of different types than the underlying column.
>
> The XXX comment can be taken into consideration or just dropped as
> unimportant.

Doh, this time against HEAD, not my branch ...

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
*** src/backend/commands/analyze.c
--- src/backend/commands/analyze.c    2008-05-16 00:47:00.000000000 +0200
***************
*** 1321,1330 ****

                  arry = construct_array(stats->stavalues[k],
                                         stats->numvalues[k],
!                                        stats->attr->atttypid,
!                                        stats->attrtype->typlen,
!                                        stats->attrtype->typbyval,
!                                        stats->attrtype->typalign);
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
--- 1321,1330 ----

                  arry = construct_array(stats->stavalues[k],
                                         stats->numvalues[k],
!                                        stats->statypid[k],
!                                        stats->statyplen[k],
!                                        stats->statypbyval[k],
!                                        stats->statypalign[k]);
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
***************
*** 1854,1859 ****
--- 1854,1867 ----
              stats->numnumbers[0] = num_mcv;
              stats->stavalues[0] = mcv_values;
              stats->numvalues[0] = num_mcv;
+             /*
+              * MCV entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[0] = stats->attr->atttypid;
+             stats->statyplen[0] = stats->attrtype->typlen;
+             stats->statypbyval[0] = stats->attrtype->typbyval;
+             stats->statypalign[0] = stats->attrtype->typalign;
          }
      }
      else if (null_cnt > 0)
***************
*** 2197,2202 ****
--- 2205,2218 ----
              stats->numnumbers[slot_idx] = num_mcv;
              stats->stavalues[slot_idx] = mcv_values;
              stats->numvalues[slot_idx] = num_mcv;
+             /*
+              * MCV entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[slot_idx] = stats->attr->atttypid;
+             stats->statyplen[slot_idx] = stats->attrtype->typlen;
+             stats->statypbyval[slot_idx] = stats->attrtype->typbyval;
+             stats->statypalign[slot_idx] = stats->attrtype->typalign;
              slot_idx++;
          }

***************
*** 2281,2286 ****
--- 2297,2310 ----
              stats->staop[slot_idx] = mystats->ltopr;
              stats->stavalues[slot_idx] = hist_values;
              stats->numvalues[slot_idx] = num_hist;
+             /*
+              * Histogram entries have the same element type as the analyzed
+              * attribute.
+              */
+             stats->statypid[slot_idx] = stats->attr->atttypid;
+             stats->statyplen[slot_idx] = stats->attrtype->typlen;
+             stats->statypbyval[slot_idx] = stats->attrtype->typbyval;
+             stats->statypalign[slot_idx] = stats->attrtype->typalign;
              slot_idx++;
          }

*** src/include/commands/vacuum.h
--- src/include/commands/vacuum.h    2008-05-16 00:47:09.000000000 +0200
***************
*** 94,99 ****
--- 94,119 ----
      Datum       *stavalues[STATISTIC_NUM_SLOTS];

      /*
+      * These fields describe the stavalues[n] element types. They will
+      * typically be the same as the column's that's underlying the slot, but
+      * sometimes a custom typanalyze function might want to store an array of
+      * something other that the analyzed column's elements. This must be filled
+      * in by the compute_stats routine.
+      *
+      * XXX or maybe fall back on attrtype-> stuff when these are NULL? That way
+      * we won't break other people's custom typanalyze functions. Not sure if
+      * any exist, though.
+      *
+      * Another concern is that typlen, typbyval and typalign are reduntant
+      * given the OID. But the caller is in better position to cache this so
+      * maybe we shouldn't worry about it?
+      */
+     Oid            statypid[STATISTIC_NUM_SLOTS];
+     int2        statyplen[STATISTIC_NUM_SLOTS];
+     bool        statypbyval[STATISTIC_NUM_SLOTS];
+     char        statypalign[STATISTIC_NUM_SLOTS];
+
+     /*
       * These fields are private to the main ANALYZE code and should not be
       * looked at by type-specific functions.
       */

Re: extend VacAttrStats to allow stavalues of different types

From
"Heikki Linnakangas"
Date:
Jan Urbański wrote:
> Following the conclusion here:
> http://archives.postgresql.org/pgsql-hackers/2008-05/msg00273.php
> here's a patch that extends VacAttrStats to allow typanalyze functions
> to store statistic values of different types than the underlying column.

Looks good to me at first glance.

About this comment:

> +      * XXX or maybe fall back on attrtype-> stuff when these are NULL? That way
> +      * we won't break other people's custom typanalyze functions. Not sure if
> +      * any exist, though.

I tried to google for a user defined data type with a custom typanalyze
function but didn't find anything, so I don't think it's an issue. It's
a bit nasty, though, because if one exists, it will compile and run just
fine, until you run ANALYZE. In general it's better to break an old API
obviously rather than silently, so that the old code doesn't even compile.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: extend VacAttrStats to allow stavalues of different types

From
Jan Urbański
Date:
Heikki Linnakangas wrote:
> I tried to google for a user defined data type with a custom typanalyze
> function but didn't find anything, so I don't think it's an issue. It's
> a bit nasty, though, because if one exists, it will compile and run just
> fine, until you run ANALYZE. In general it's better to break an old API
> obviously rather than silently, so that the old code doesn't even compile.

I can't think of a way to make the compilation fail if your old
typanalyze function fails to set the extra VacAttrStats fields. As I see
it, there are three options:
1. fall back on the old behaviour at the price of code uglification
2. put an Assert after the OidFunctionCall that calls a typanalyze
function if it exists, so people will get a nicer error message when
they test their code
3. put an ereport(ERROR, ...) after the OidFunctionCall, so people will
get a nicer error message if they don't use --enable-cassert and they'll
avoid segfaulting the backend on an ANALYZE call.

I think I like 2. best, but I just thought that you could check if the
custom typanalyze routine did set the fields right after it's been
called, and if it didn't then set them for it and thus get 1. in an
arguably clean way. Only that encourages sloppy programming (not setting
the fields and relying on someone to set them).

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin

Re: extend VacAttrStats to allow stavalues of different types

From
Tom Lane
Date:
"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Jan Urbański wrote:
>> +      * XXX or maybe fall back on attrtype-> stuff when these are NULL? That way
>> +      * we won't break other people's custom typanalyze functions. Not sure if
>> +      * any exist, though.

> I tried to google for a user defined data type with a custom typanalyze
> function but didn't find anything, so I don't think it's an issue.

Oh, it absolutely is an issue.  PostGIS has them, if no one else does.

I think the correct solution is to initialize the fields to match the
column type before calling the typanalyze function.  Then you don't
break compatibility for existing typanalyze functions.  It's also less
code, since the standard typanalyze functions can rely on those preset
values.

            regards, tom lane

Re: extend VacAttrStats to allow stavalues of different types

From
Jan Urbański
Date:
Tom Lane wrote:
> I think the correct solution is to initialize the fields to match the
> column type before calling the typanalyze function.  Then you don't
> break compatibility for existing typanalyze functions.  It's also less
> code, since the standard typanalyze functions can rely on those preset
> values.

Right. Updated patch attached.

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
*** src/backend/commands/analyze.c
--- /tmp/.diff_sIxSwv    2008-06-02 17:55:58.000000000 +0200
***************
*** 683,688 ****
--- 683,689 ----
      Form_pg_attribute attr = onerel->rd_att->attrs[attnum - 1];
      HeapTuple    typtuple;
      VacAttrStats *stats;
+     int            i;
      bool        ok;

      /* Never analyze dropped columns */
***************
*** 711,716 ****
--- 712,730 ----
      stats->tupattnum = attnum;

      /*
+      * Initialize fields describing the stats->stavalues[n] element types, so
+      * the typanalyze function will have some defaults in case it neglects to
+      * set them itself.
+      */
+     for (i = 0; i < STATISTIC_NUM_SLOTS; i++)
+     {
+         stats->statypid[i] = stats->attr->atttypid;
+         stats->statyplen[i] = stats->attrtype->typlen;
+         stats->statypbyval[i] = stats->attrtype->typbyval;
+         stats->statypalign[i] = stats->attrtype->typalign;
+     }
+
+     /*
       * Call the type-specific typanalyze function.    If none is specified, use
       * std_typanalyze().
       */
***************
*** 1321,1330 ****

                  arry = construct_array(stats->stavalues[k],
                                         stats->numvalues[k],
!                                        stats->attr->atttypid,
!                                        stats->attrtype->typlen,
!                                        stats->attrtype->typbyval,
!                                        stats->attrtype->typalign);
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
--- 1335,1344 ----

                  arry = construct_array(stats->stavalues[k],
                                         stats->numvalues[k],
!                                        stats->statypid[k],
!                                        stats->statyplen[k],
!                                        stats->statypbyval[k],
!                                        stats->statypalign[k]);
                  values[i++] = PointerGetDatum(arry);    /* stavaluesN */
              }
              else
***************
*** 1854,1859 ****
--- 1868,1877 ----
              stats->numnumbers[0] = num_mcv;
              stats->stavalues[0] = mcv_values;
              stats->numvalues[0] = num_mcv;
+             /*
+              * Accept the defaults for stats->statypid and others.
+              * They have been set before we were called (see vacuum.h)
+              */
          }
      }
      else if (null_cnt > 0)
***************
*** 2197,2202 ****
--- 2215,2224 ----
              stats->numnumbers[slot_idx] = num_mcv;
              stats->stavalues[slot_idx] = mcv_values;
              stats->numvalues[slot_idx] = num_mcv;
+             /*
+              * Accept the defaults for stats->statypid and others.
+              * They have been set before we were called (see vacuum.h)
+              */
              slot_idx++;
          }

***************
*** 2281,2286 ****
--- 2303,2312 ----
              stats->staop[slot_idx] = mystats->ltopr;
              stats->stavalues[slot_idx] = hist_values;
              stats->numvalues[slot_idx] = num_hist;
+             /*
+              * Accept the defaults for stats->statypid and others.
+              * They have been set before we were called (see vacuum.h)
+              */
              slot_idx++;
          }

*** src/include/commands/vacuum.h
--- /tmp/.diff_YRgK4O    2008-06-02 17:55:58.000000000 +0200
***************
*** 94,99 ****
--- 94,111 ----
      Datum       *stavalues[STATISTIC_NUM_SLOTS];

      /*
+      * These fields describe the stavalues[n] element types. They will
+      * be initialized to be the same as the column's that's underlying the slot,
+      * but a custom typanalyze function might want to store an array of
+      * something other that the analyzed column's elements. It should then
+      * overwrite these fields.
+      */
+     Oid            statypid[STATISTIC_NUM_SLOTS];
+     int2        statyplen[STATISTIC_NUM_SLOTS];
+     bool        statypbyval[STATISTIC_NUM_SLOTS];
+     char        statypalign[STATISTIC_NUM_SLOTS];
+
+     /*
       * These fields are private to the main ANALYZE code and should not be
       * looked at by type-specific functions.
       */

Re: extend VacAttrStats to allow stavalues of different types

From
"Heikki Linnakangas"
Date:
Jan Urbański wrote:
> Tom Lane wrote:
>> I think the correct solution is to initialize the fields to match the
>> column type before calling the typanalyze function.  Then you don't
>> break compatibility for existing typanalyze functions.  It's also less
>> code, since the standard typanalyze functions can rely on those preset
>> values.
>
> Right. Updated patch attached.

Thanks, committed.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com