Thread: BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition
BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16521 Logged by: klaudie willis Email address: klaudie.willis@protonmail.com PostgreSQL version: 13beta1 Operating system: Windows Description: From documentation: "while n_distinct_inherited affects the statistics gathered for the table plus its inheritance children" However, as my own experience has shown me, and as illustrated here: https://www.db-fiddle.com/f/d2ckNd9eyrGAwvdkr8H5i/0 I cannot see that "_n_distinct_inherited" affects child partitions when set on the main partition table. Am I missing something really simple here? best regards Klaudie
Re: BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition
From
David Rowley
Date:
On Wed, 1 Jul 2020 at 08:43, PG Bug reporting form <noreply@postgresql.org> wrote: > Bug reference: 16521 > Logged by: klaudie willis > Email address: klaudie.willis@protonmail.com > PostgreSQL version: 13beta1 > Operating system: Windows > Description: > > From documentation: "while n_distinct_inherited affects the statistics > gathered for the table plus its inheritance children" > > However, as my own experience has shown me, and as illustrated here: > https://www.db-fiddle.com/f/d2ckNd9eyrGAwvdkr8H5i/0 I cannot see that > "_n_distinct_inherited" affects child partitions when set on the main > partition table. Am I missing something really simple here? This is not a bug. n_distinct_inherited was invented for table inheritance to allow the n_distinct value to be overwritten when the inherited tables are also being queried. Consider the difference between: create table t (a int); create table t1 (a int) inherits(t); insert into t select generate_Series(1,2000); alter table t alter column a set (n_distinct_inherited = 1000); alter table t alter column a set (n_distinct = 123); analyze t; explain select distinct a from t; QUERY PLAN ---------------------------------------------------------------------- HashAggregate (cost=98.63..108.63 rows=1000 width=4) Group Key: t.a -> Append (cost=0.00..87.25 rows=4550 width=4) -> Seq Scan on t t_1 (cost=0.00..29.00 rows=2000 width=4) -> Seq Scan on t1 t_2 (cost=0.00..35.50 rows=2550 width=4) (5 rows) explain select distinct a from only t; QUERY PLAN ----------------------------------------------------------- HashAggregate (cost=34.00..35.23 rows=123 width=4) Group Key: a -> Seq Scan on t (cost=0.00..29.00 rows=2000 width=4) (3 rows) Notice the HashAggregate's row estimate in each case. The inherited version applies when the inherited tables are also being queried. The documents [1] mention: " n_distinct affects the statistics for the table itself, while n_distinct_inherited affects the statistics gathered for the table plus its inheritance children.". I think that explains this behaviour. Perhaps you just assumed what it does based on the name of the option? Declarative partitioning reuses much of how table inheritance works, although, much of that has been getting replaced since declarative partitioning was added in PG10. Since there's not much sense in performing an FROM ONLY on a declarative partitioned table, then perhaps there's some room to have these behave differently for partitioned tables, but changing their meaning now might confuse people who are used to what they currently already do. David [1] https://www.postgresql.org/docs/12/sql-altertable.html
Re: BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition
From
Klaudie Willis
Date:
Thanks for you thorough answer! "n_distinct affects the statistics for the table itself, while n_distinct_inherited affects the statistics gathered for the table plus its inheritance children." I see that this text has multiple interpretations. I read it more like: while n_distinct_inherited affects the statisticsgathered [by analyze] for the table plus [affects the statistics of] its inheritance children. The core problem for me was that I had to set n_distinct manually on my individual partitions for the planner to choose asane plan for queries on the declared partition. The data for a column was so skewed that the analyzers n_distinct wasoff by a factor of 1000X, which made all partitions do a table scan instead of index scan. In my mind, setting the declarativepartition variable n_distinct_inherited would solve that; but obviously I am wrong on that. Klaudie
Re: BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition
From
David Rowley
Date:
On Thu, 2 Jul 2020 at 05:16, Klaudie Willis <Klaudie.Willis@protonmail.com> wrote: > "n_distinct affects the statistics for the table itself, while n_distinct_inherited affects the statistics > gathered for the table plus its inheritance children." > > I see that this text has multiple interpretations. I read it more like: while n_distinct_inherited affects the statisticsgathered [by analyze] for the table plus [affects the statistics of] its inheritance children. I see. I didn't read it that way before, but it's fairly obvious now that that's how you interpreted it. I think there's room here to change the documentation so that future readers are not misled. How about? diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index cbfb4828e5..2495564b07 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -328,8 +328,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM number-of-distinct-values estimates made by subsequent <xref linkend="sql-analyze"/> operations. <literal>n_distinct</literal> affects the statistics for the table - itself, while <literal>n_distinct_inherited</literal> affects the statistics - gathered for the table plus its inheritance children. When set to a + itself, while <literal>n_distinct_inherited</literal> affects the table's + inheritance statistics. When set to a positive value, <command>ANALYZE</command> will assume that the column contains exactly the specified number of distinct nonnull values. When set to a negative value, which must be greater David