Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id 20201124162320.GR24052@telsasoft.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PoC/WIP: Extended statistics on expressions
List pgsql-hackers
On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
> 0004 - Seems fine. IMHO not really "silly errors" but OK.

This is one of the same issues you pointed out - shadowing a variable.
Could be backpatched.

On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
> > +                                errmsg("statistics expressions and predicates can refer only to the table being
indexed")));
> > +        * partial-index predicates.  Create it in the per-index context to be
> > 
> > I think these are copied and shouldn't mention "indexes" or "predicates".  Or
> > should statistics support predicates, too ?
> > 
> 
> Right. Stupid copy-pasto.

Right, but then I was wondering if CREATE STATS should actually support
predicates, since one use case is to do what indexes do without their overhead.
I haven't thought about it enough yet.

> 0006 - Not sure. I think CreateStatistics can be fixed with less code,
> keeping it more like PG13 (good for backpatching). Not sure why rename
> extended statistics to multi-variate statistics - we use "extended"
> everywhere.

-       if (build_expressions && (list_length(stxexprs) == 0))
+       if (!build_expressions_only && (list_length(stmt->exprs) < 2))
                ereport(ERROR,  
                                (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-                                errmsg("extended expression statistics require at least one expression")));
+                                errmsg("multi-variate statistics require at least two columns")));

I think all of "CREATE STATISTICS" has been known as "extended stats", so I
think it may be confusing to say that it requires two columns for the general
facility.

> Not sure what's the point of serialize_expr_stats changes,
> that's code is mostly copy-paste from update_attstats.

Right.  I think "i" is poor variable name when it isn't a loop variable and not
of limited scope.

> 0007 - I suspect this makes the pg_stats_ext too complex to work with,
> IMHO we should move this to a separate view.

Right - then unnest() the whole thing and return one row per expression rather
than array, as you've done.  Maybe the docs should say that this returns one
row per expression.

Looking quickly at your new patch: I guess you know there's a bunch of
lingering references to "indexes" and "predicates":

I don't know if you want to go to the effort to prohibit this.
postgres=# CREATE STATISTICS asf ON (tableoid::int+1) FROM t;
CREATE STATISTICS

I think a lot of people will find this confusing:

postgres=# CREATE STATISTICS asf ON i FROM t;
ERROR:  extended statistics require at least 2 columns
postgres=# CREATE STATISTICS asf ON (i) FROM t;
CREATE STATISTICS

postgres=# CREATE STATISTICS asf (expressions) ON i FROM t;
ERROR:  extended expression statistics require at least one expression
postgres=# CREATE STATISTICS asf (expressions) ON (i) FROM t;
CREATE STATISTICS

I haven't looked, but is it possible to make it work without parens ?

-- 
Justin



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: libpq compression
Next
From: Stephen Frost
Date:
Subject: Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)