Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id e5beb5d5-28b4-7aad-3a16-9a6abf57f897@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: PoC/WIP: Extended statistics on expressions  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: PoC/WIP: Extended statistics on expressions  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers

On 11/23/20 3:26 AM, Justin Pryzby wrote:
> On Sun, Nov 22, 2020 at 08:03:51PM +0100, Tomas Vondra wrote:
>> attached is a significantly improved version of the patch, allowing
>> defining extended statistics on expressions. This fixes most of the
>> problems in the previous WIP version and AFAICS it does pass all
>> regression tests (including under valgrind). There's a bunch of FIXMEs
>> and a couple loose ends, but overall I think it's ready for reviews.
> 
> I was looking at the previous patch, so now read this one instead, and attach
> some proposed fixes.
> 
> +  * This matters especially for * expensive expressions, of course.
> 

The point this was trying to make is that we evaluate the expressions
only once, and use the results to build all extended statistics. Instead
of leaving it up to every "build" to re-evaluate it.

> +   The expression can refer only to columns of the underlying table, but
> +   it can use all columns, not just the ones the statistics is defined
> +   on.
> 
> I don't know what these are trying to say?
> 

D'oh. That's bogus para, copied from the CREATE INDEX docs (where it
talked about the index predicate, which is irrelevant here).

> +                                errmsg("statistics expressions and predicates can refer only to the table being
indexed")));
> +        * partial-index predicates.  Create it in the per-index context to be
> 
> I think these are copied and shouldn't mention "indexes" or "predicates".  Or
> should statistics support predicates, too ?
> 

Right. Stupid copy-pasto.

> Idea: if a user specifies no stakinds, and there's no expression specified,
> then you automatically build everything except for expressional stats.  But if
> they specify only one statistics "column", it gives an error.  If that's a
> non-simple column reference, should that instead build *only* expressional
> stats (possibly with a NOTICE, since the user might be intending to make MV
> stats).
> 

Right, that was the intention - but I messed up and it works only if you
specify the "expressions" kind explicitly (and I also added the ERROR
message to expected output by mistake). I agree we should handle this
automatically, so that

   CREATE STATISTICS s ON (a+b) FROM t

works and only creates statistics for the expression.


> I think pg_stats_ext should allow inspecting the pg_statistic data in
> pg_statistic_ext_data.stxdexprs.  I guess array_agg() should be ordered by
> something, so maybe it should use ORDINALITY (?)
> 

I agree we should expose the expression statistics, but I'm not
convinced we should do that in the pg_stats_ext view itself. The problem
is that it's a table bested in a table, essentially, with non-trivial
structure, so I was thinking about adding a separate view exposing just
this one part. Something like pg_stats_ext_expressions, with about the
same structure as pg_stats, or something.

> I hacked more on bootstrap.c so included that here.

Thanks. As for the 0004-0007 patches:

0004 - Seems fine. IMHO not really "silly errors" but OK.

0005 - Mostly OK. The docs wording mostly comes from CREATE INDEX docs,
though. The paragraph about "t1" is old, so if we want to reword it then
maybe we should backpatch too.

0006 - Not sure. I think CreateStatistics can be fixed with less code,
keeping it more like PG13 (good for backpatching). Not sure why rename
extended statistics to multi-variate statistics - we use "extended"
everywhere. Not sure what's the point of serialize_expr_stats changes,
that's code is mostly copy-paste from update_attstats.

0007 - I suspect this makes the pg_stats_ext too complex to work with,
IMHO we should move this to a separate view.


Thanks for the review! I'll try to look more closely at those patches
sometime next week, and merge most of it.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: RE: Disable WAL logging to speed up data loading
Next
From: Greg Nancarrow
Date:
Subject: Re: Parallel plans and "union all" subquery