Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Zhihong Yu |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | CALNJ-vRz-hsCFVNQLm7Amxse+J=dLwSuwbJNeYOk-9mwvh_AyA@mail.gmail.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
Hi,
+ * Check that only the base rel is mentioned. (This should be dead code
+ * now that add_missing_from is history.)
+ */
+ if (list_length(pstate->p_rtable) != 1)
+ * now that add_missing_from is history.)
+ */
+ if (list_length(pstate->p_rtable) != 1)
If it is dead code, it can be removed, right ?
For statext_mcv_clauselist_selectivity:
+ // bms_free(list_attnums[listidx]);
The commented line can be removed.
+bool
+examine_clause_args2(List *args, Node **exprp, Const **cstp, bool *expronleftp)
+examine_clause_args2(List *args, Node **exprp, Const **cstp, bool *expronleftp)
Better add some comment for examine_clause_args2 since there is examine_clause_args() already.
+ if (!ok || stats->compute_stats == NULL || stats->minrows <= 0)
When would stats->minrows have negative value ?
For serialize_expr_stats():
+ sd = table_open(StatisticRelationId, RowExclusiveLock);
+
+ /* lookup OID of composite type for pg_statistic */
+ typOid = get_rel_type_id(StatisticRelationId);
+ if (!OidIsValid(typOid))
+ ereport(ERROR,
+
+ /* lookup OID of composite type for pg_statistic */
+ typOid = get_rel_type_id(StatisticRelationId);
+ if (!OidIsValid(typOid))
+ ereport(ERROR,
Looks like the table_open() call can be made after the typOid check.
+ Datum values[Natts_pg_statistic];
+ bool nulls[Natts_pg_statistic];
+ HeapTuple stup;
+
+ if (!stats->stats_valid)
+ bool nulls[Natts_pg_statistic];
+ HeapTuple stup;
+
+ if (!stats->stats_valid)
It seems the local arrays can be declared after the validity check.
+ if (enabled[i] == STATS_EXT_NDISTINCT)
+ ndistinct_enabled = true;
+ if (enabled[i] == STATS_EXT_DEPENDENCIES)
+ dependencies_enabled = true;
+ if (enabled[i] == STATS_EXT_MCV)
+ ndistinct_enabled = true;
+ if (enabled[i] == STATS_EXT_DEPENDENCIES)
+ dependencies_enabled = true;
+ if (enabled[i] == STATS_EXT_MCV)
the second and third if should be preceded with 'else'
+ReleaseDummy(HeapTuple tuple)
+{
+ pfree(tuple);
+{
+ pfree(tuple);
Since ReleaseDummy() is just a single pfree call, maybe you don't need this method - call pfree in its place.
Cheers
On Sat, Jan 16, 2021 at 4:24 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 1/17/21 12:22 AM, Justin Pryzby wrote:
> On Sat, Jan 16, 2021 at 05:48:43PM +0100, Tomas Vondra wrote:
>> + <entry role="catalog_table_entry"><para role="column_definition">
>> + <structfield>expr</structfield> <type>text</type>
>> + </para>
>> + <para>
>> + Expression the extended statistics is defined on
>> + </para></entry>
>
> Expression the extended statistics ARE defined on
> Or maybe say "on which the extended statistics are defined"
>
I'm pretty sure "is" is correct because "expression" is singular.
>> + <para>
>> + The <command>CREATE STATISTICS</command> command has two basic forms. The
>> + simple variant allows to build statistics for a single expression, does
>
> .. ALLOWS BUILDING statistics for a single expression, AND does (or BUT does)
>
>> + Expression statistics are per-expression and are similar to creating an
>> + index on the expression, except that they avoid the overhead of the index.
>
> Maybe say "overhead of index maintenance"
>
Yeah, that sounds better.
>> + All functions and operators used in a statistics definition must be
>> + <quote>immutable</quote>, that is, their results must depend only on
>> + their arguments and never on any outside influence (such as
>> + the contents of another table or the current time). This restriction
>
> say "outside factor" or "external factor"
>
In fact, we've removed the immutability restriction, so this paragraph
should have been removed.
>> + results of those expression, and uses default estimates as illustrated
>> + by the first query. The planner also does not realize the value of the
>
> realize THAT
>
OK, changed.
>> + second column fully defines the value of the other column, because date
>> + truncated to day still identifies the month. Then expression and
>> + ndistinct statistics are built on those two columns:
>
> I got an error doing this:
>
> CREATE TABLE t AS SELECT generate_series(1,9) AS i;
> CREATE STATISTICS s ON (i+1) ,(i+1+0) FROM t;
> ANALYZE t;
> SELECT i+1 FROM t GROUP BY 1;
> ERROR: corrupt MVNDistinct entry
>
Thanks. There was a thinko in estimate_multivariate_ndistinct, resulting
in mismatching the ndistinct coefficient items. The attached patch fixes
that, but I've realized the way we pick the "best" statistics may need
some improvements (I added an XXX comment about that).
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date: