Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Dean Rasheed
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id CAEZATCW_7nfeHmLtV6TzSfmFG+i=io_Qo0NL8BiBnmEqq3CQnQ@mail.gmail.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
Looking over the statscmds.c changes, there are a few XXX's and
FIXME's that need resolving, and I had a couple of other minor
comments:

+           /*
+            * An expression using mutable functions is probably wrong,
+            * since if you aren't going to get the same result for the
+            * same data every time, it's not clear what the index entries
+            * mean at all.
+            */
+           if (CheckMutability((Expr *) expr))
+               ereport(ERROR,

That comment is presumably copied from the index code, so needs updating.


+           /*
+            * Disallow data types without a less-than operator
+            *
+            * XXX Maybe allow this, but only for EXPRESSIONS stats and
+            * prevent building e.g. MCV etc.
+            */
+           atttype = exprType(expr);
+           type = lookup_type_cache(atttype, TYPECACHE_LT_OPR);
+           if (type->lt_opr == InvalidOid)
+               ereport(ERROR,
+                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                        errmsg("expression cannot be used in
statistics because its type %s has no default btree operator class",
+                               format_type_be(atttype))));

As the comment suggests, it's probably worth skipping this check if
numcols is 1 so that single-column stats can be built for more types
of expressions. (I'm assuming that it's basically no more effort to
make that work, so I think it falls into the might-as-well-do-it
category.)


+   /*
+    * Parse the statistics kinds.  Firstly, check that this is not the
+    * variant building statistics for a single expression, in which case
+    * we don't allow specifying any statistis kinds.  The simple variant
+    * only has one expression, and does not allow statistics kinds.
+    */
+   if ((list_length(stmt->exprs) == 1) && (list_length(stxexprs) == 1))
+   {

Typo: "statistis"
Nit-picking, this test could just be:

+   if ((numcols == 1) && (list_length(stxexprs) == 1))

which IMO is a little more readable, and matches a similar test a
little further down.


+   /*
+    * If there are no simply-referenced columns, give the statistics an
+    * auto dependency on the whole table.  In most cases, this will
+    * be redundant, but it might not be if the statistics expressions
+    * contain no Vars (which might seem strange but possible).
+    *
+    * XXX This is copied from index_create, not sure if it's applicable
+    * to extended statistics too.
+    */

Seems right to me.


+       /*
+        * FIXME use 'expr' for expressions, which have empty column names.
+        * For indexes this is handled in ChooseIndexColumnNames, but we
+        * have no such function for stats.
+        */
+       if (!name)
+           name = "expr";

In theory, this function could be made to duplicate the logic used for
indexes, creating names like "expr1", "expr2", etc. To be honest
though, I don't think it's worth the effort. The code for indexes
isn't really bulletproof anyway -- for example there might be a column
called "expr" that is or isn't included in the index, which would make
the generated name ambiguous. And in any case, a name like
"tbl_cola_expr_colb_expr1_colc_stat" isn't really any more useful than
"tbl_cola_expr_colb_expr_colc_stat". So I'd be tempted to leave that
code as it is.


+
+/*
+ * CheckMutability
+ *     Test whether given expression is mutable
+ *
+ * FIXME copied from indexcmds.c, maybe use some shared function?
+ */
+static bool
+CheckMutability(Expr *expr)
+{

As the comment says, it's quite messy duplicating this code, but I'm
wondering whether it would be OK to just skip this check entirely. I
think someone else suggested that elsewhere, and I think it might not
be a bad idea.

For indexes, it could easily lead to wrong query results, but for
stats the most likely problem is that the stats would get out of date
(which they tend to do all by themselves anyway) and need rebuilding.

If you ignore intentionally crazy examples (which are still possible
even with this check), then there are probably many legitimate cases
where someone might want to use non-immutable functions in stats, and
this check just forces them to create an immutable wrapper function.

Regards,
Dean



pgsql-hackers by date:

Previous
From: "Tang, Haiying"
Date:
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Stephen Frost
Date:
Subject: Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)