Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PoC/WIP: Extended statistics on expressions
Date
Msg-id 329203a4-7cad-a0e3-d5a8-ce70872c8156@enterprisedb.com
Whole thread Raw
In response to Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers
On 1/4/21 4:34 PM, Dean Rasheed wrote:
>
> ... 
> 
> Some other comments:
> 
> * I'm not sure I understand the need for 0001. Wasn't there an earlier
> version of this patch that just did it by re-populating the type
> array, but which still had it as an array rather than turning it into
> a list? Making it a list falsifies some of the comments and
> function/variable name choices in that file.
> 

That's a bit done to Justin - I think it's fine to use the older version 
repopulating the type array, but that question is somewhat unrelated to 
this patch.

> * There's a comment typo in catalog/Makefile -- "are are reputedly
> other...", should be "there are reputedly other...".
> 
> * Looking at the pg_stats_ext view, I think perhaps expressions stats
> should be omitted entirely from that view, since it doesn't show any
> useful information about them. So it could remove "e" from the "kinds"
> array, and exclude rows whose only kind is "e", since such rows have
> no interesting data in them. Essentially, the new view
> pg_stats_ext_exprs makes having any expression stats in pg_stats_ext
> redundant. Hiding this data in pg_stats_ext would also be consistent
> with making the "expressions" stats kind hidden from the user.
> 

Hmmm, not sure. I'm not sure removing 'e' from the array is a good idea. 
On the one hand it's internal detail, on the other hand most of that 
view is internal detail too. Excluding rows with only 'e' seems 
reasonable, though. I need to think about this.

> * In gram.y, it wasn't quite obvious why you converted the column list
> for CREATE STATISTICS from an expr_list to a stats_params list. I
> figured it out, and it makes sense, but I think it could use a
> comment, perhaps something along the lines of the one for index_elem,
> e.g.:
> 
> /*
>   * Statistics attributes can be either simple column references, or arbitrary
>   * expressions in parens.  For compatibility with index attributes permitted
>   * in CREATE INDEX, we allow an expression that's just a function call to be
>   * written without parens.
>   */
> 

OH, right. I'd have trouble figuring this myself, and I wrote that code 
myself only one or two months ago.

> * In parse_func.c and parse_agg.c, there are a few new error strings
> that use the abbreviation "stats expressions", whereas most other
> errors refer to "statistics expressions". For consistency, I think
> they should all be the latter.
> 

OK, will fix.

> * In generateClonedExtStatsStmt(), I think the "expressions" stats
> kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE
> ...) fails if the source table has expression stats.
> 

Yeah, will fix. I guess this also means we're missing some tests.

> * CreateStatistics() uses ShareUpdateExclusiveLock, but in
> tcop/utility.c the relation is opened with a ShareLock. ISTM that the
> latter lock mode should be made to match CreateStatistics().
> 

Not sure, will check.

> * Why does the new code in tcop/utility.c not use
> RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation?
> That seems preferable to doing the ACL check in CreateStatistics().
> For one thing, as it stands, it allows the lock to be taken even if
> the user doesn't own the table. Is it intentional that the current
> code allows extended stats to be created on system catalogs? That
> would be one thing that using RangeVarCallbackOwnsRelation would
> change, but I can't see a reason to allow it.
> 

I think I copied the code from somewhere - probably expression indexes, 
or something like that. Not a proof that it's the right/better way to do 
this, though.

> * In src/bin/psql/describe.c, I think the \d output should also
> exclude the "expressions" stats kind and just list the other kinds (or
> have no kinds list at all, if there are no other kinds), to make it
> consistent with the CREATE STATISTICS syntax.
> 

Not sure I understand. Why would this make it consistent with CREATE 
STATISTICS? Can you elaborate?

> * The pg_dump output for a stats object whose only kind is
> "expressions" is broken -- it includes a spurious "()" for the kinds
> list.
> 

Will fix. Again, this suggests there are TAP tests missing.

> That's it for now. I'll look at the optimiser changes next, and try to
> post more comments later this week.
> 

Thanks!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Subject: Re: language cleanups in code and docs
Next
From: "Hou, Zhijie"
Date:
Subject: fix typo in ReorderBufferProcessTXN