Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | CAEZATCU9uPo7JYdx4k0-ufXXZH8t7itodibUwCva+s+AvAKcnw@mail.gmail.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
On Thu, 3 Dec 2020 at 15:23, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Attached is a patch series rebased on top of 25a9e54d2d. After reading this thread and [1], I think I prefer the name "standard" rather than "expressions", because it is meant to describe the kind of statistics being built rather than what they apply to, but maybe that name doesn't actually need to be exposed to the end user: Looking at the current behaviour, there are a couple of things that seem a little odd, even though they are understandable. For example, the fact that CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; fails, but CREATE STATISTICS s (expressions, mcv) ON (expr), col FROM tbl; succeeds and creates both "expressions" and "mcv" statistics. Also, the syntax CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl; tends to suggest that it's going to create statistics on the pair of expressions, describing their correlation, when actually it builds 2 independent statistics. Also, this error text isn't entirely accurate: CREATE STATISTICS s ON col FROM tbl; ERROR: extended statistics require at least 2 columns because extended statistics don't always require 2 columns, they can also just have an expression, or multiple expressions and 0 or 1 columns. I think a lot of this stems from treating "expressions" in the same way as the other (multi-column) stats kinds, and it might actually be neater to have separate documented syntaxes for single- and multi-column statistics: CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ON (expression) FROM table_name CREATE STATISTICS [ IF NOT EXISTS ] statistics_name [ ( statistics_kind [, ... ] ) ] ON { column_name | (expression) } , { column_name | (expression) } [, ...] FROM table_name The first syntax would create single-column stats, and wouldn't accept a statistics_kind argument, because there is only one kind of single-column statistic. Maybe that might change in the future, but if so, it's likely that the kinds of single-column stats will be different from the kinds of multi-column stats. In the second syntax, the only accepted kinds would be the current multi-column stats kinds (ndistinct, dependencies, and mcv), and it would always build stats describing the correlations between the columns listed. It would continue to build standard/expression stats on any expressions in the list, but that's more of an implementation detail. It would no longer be possible to do "CREATE STATISTICS s (expressions) ON (expr1), (expr2) FROM tbl". Instead, you'd have to issue 2 separate "CREATE STATISTICS" commands, but that seems more logical, because they're independent stats. The parsing code might not change much, but some of the errors would be different. For example, the errors "building only extended expression statistics on simple columns not allowed" and "extended expression statistics require at least one expression" would go away, and the error "extended statistics require at least 2 columns" might become more specific, depending on the stats kind. Regards, Dean [1] https://www.postgresql.org/message-id/flat/1009.1579038764%40sss.pgh.pa.us#8624792a20ae595683b574f5933dae53
pgsql-hackers by date: