Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id 28bfe53b-0e75-79c1-3403-ebf10714a94e@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers

On 11/24/20 5:23 PM, Justin Pryzby wrote:
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>> 0004 - Seems fine. IMHO not really "silly errors" but OK.
> 
> This is one of the same issues you pointed out - shadowing a variable.
> Could be backpatched.
> 
> On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote:
>>> +                                errmsg("statistics expressions and predicates can refer only to the table being
indexed")));
>>> +        * partial-index predicates.  Create it in the per-index context to be
>>>
>>> I think these are copied and shouldn't mention "indexes" or "predicates".  Or
>>> should statistics support predicates, too ?
>>>
>>
>> Right. Stupid copy-pasto.
> 
> Right, but then I was wondering if CREATE STATS should actually support
> predicates, since one use case is to do what indexes do without their overhead.
> I haven't thought about it enough yet.
> 

Well, it's not supported now, so the message is bogus. I'm not against
supporting "partial statistics" with predicates in the future, but it's
going to be non-trivial project on it's own. It's not something I can
bolt onto the current patch easily.

>> 0006 - Not sure. I think CreateStatistics can be fixed with less code,
>> keeping it more like PG13 (good for backpatching). Not sure why rename
>> extended statistics to multi-variate statistics - we use "extended"
>> everywhere.
> 
> -       if (build_expressions && (list_length(stxexprs) == 0))
> +       if (!build_expressions_only && (list_length(stmt->exprs) < 2))
>                 ereport(ERROR,  
>                                 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> -                                errmsg("extended expression statistics require at least one expression")));
> +                                errmsg("multi-variate statistics require at least two columns")));
> 
> I think all of "CREATE STATISTICS" has been known as "extended stats", so I
> think it may be confusing to say that it requires two columns for the general
> facility.
> 
>> Not sure what's the point of serialize_expr_stats changes,
>> that's code is mostly copy-paste from update_attstats.
> 
> Right.  I think "i" is poor variable name when it isn't a loop variable and not
> of limited scope.
> 

OK, I understand. I'll consider tweaking that.

>> 0007 - I suspect this makes the pg_stats_ext too complex to work with,
>> IMHO we should move this to a separate view.
> 
> Right - then unnest() the whole thing and return one row per expression rather
> than array, as you've done.  Maybe the docs should say that this returns one
> row per expression.
> 
> Looking quickly at your new patch: I guess you know there's a bunch of
> lingering references to "indexes" and "predicates":
> 
> I don't know if you want to go to the effort to prohibit this.
> postgres=# CREATE STATISTICS asf ON (tableoid::int+1) FROM t;
> CREATE STATISTICS
> 

Hmm, we're already rejecting system attributes, I suppose we should do
the same thing for expressions on system attributes.

> I think a lot of people will find this confusing:
> 
> postgres=# CREATE STATISTICS asf ON i FROM t;
> ERROR:  extended statistics require at least 2 columns
> postgres=# CREATE STATISTICS asf ON (i) FROM t;
> CREATE STATISTICS
> 
> postgres=# CREATE STATISTICS asf (expressions) ON i FROM t;
> ERROR:  extended expression statistics require at least one expression
> postgres=# CREATE STATISTICS asf (expressions) ON (i) FROM t;
> CREATE STATISTICS
> 
> I haven't looked, but is it possible to make it work without parens ?
> 

Hmm, you're right that may be surprising. I suppose we could walk the
expressions while creating the statistics, and replace such trivial
expressions with the nested variable, but I haven't tried. I wonder what
the CREATE INDEX behavior would be in these cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Tom Lane
Date:
Subject: Re: Strange behavior with polygon and NaN