Thread: BUG #16521: n_distinct_inherited does not affect child partitions when set on main partition

The following bug has been logged on the website:

Bug reference:      16521
Logged by:          klaudie willis
Email address:      klaudie.willis@protonmail.com
PostgreSQL version: 13beta1
Operating system:   Windows
Description:

From documentation: "while n_distinct_inherited affects the statistics
gathered for the table plus its inheritance children"

However, as my own experience has shown me, and as illustrated here:
https://www.db-fiddle.com/f/d2ckNd9eyrGAwvdkr8H5i/0  I cannot see that
"_n_distinct_inherited" affects child partitions when set on the main
partition table.  Am I missing something really simple here?

best regards
Klaudie


On Wed, 1 Jul 2020 at 08:43, PG Bug reporting form
<noreply@postgresql.org> wrote:
> Bug reference:      16521
> Logged by:          klaudie willis
> Email address:      klaudie.willis@protonmail.com
> PostgreSQL version: 13beta1
> Operating system:   Windows
> Description:
>
> From documentation: "while n_distinct_inherited affects the statistics
> gathered for the table plus its inheritance children"
>
> However, as my own experience has shown me, and as illustrated here:
> https://www.db-fiddle.com/f/d2ckNd9eyrGAwvdkr8H5i/0  I cannot see that
> "_n_distinct_inherited" affects child partitions when set on the main
> partition table.  Am I missing something really simple here?

This is not a bug. n_distinct_inherited was invented for table
inheritance to allow the n_distinct value to be overwritten when the
inherited tables are also being queried.

Consider the difference between:

create table t (a int);
create table t1 (a int) inherits(t);
insert into t select generate_Series(1,2000);
alter table t alter column a set (n_distinct_inherited = 1000);
alter table t alter column a set (n_distinct = 123);
analyze t;
explain select distinct a from t;
                              QUERY PLAN
----------------------------------------------------------------------
 HashAggregate  (cost=98.63..108.63 rows=1000 width=4)
   Group Key: t.a
   ->  Append  (cost=0.00..87.25 rows=4550 width=4)
         ->  Seq Scan on t t_1  (cost=0.00..29.00 rows=2000 width=4)
         ->  Seq Scan on t1 t_2  (cost=0.00..35.50 rows=2550 width=4)
(5 rows)

explain select distinct a from only t;
                        QUERY PLAN
-----------------------------------------------------------
 HashAggregate  (cost=34.00..35.23 rows=123 width=4)
   Group Key: a
   ->  Seq Scan on t  (cost=0.00..29.00 rows=2000 width=4)
(3 rows)

Notice the HashAggregate's row estimate in each case. The inherited
version applies when the inherited tables are also being queried.

The documents [1] mention: " n_distinct affects the statistics for the
table itself, while n_distinct_inherited affects the statistics
gathered for the table plus its inheritance children.". I think that
explains this behaviour.  Perhaps you just assumed what it does based
on the name of the option?

Declarative partitioning reuses much of how table inheritance works,
although, much of that has been getting replaced since declarative
partitioning was added in PG10.  Since there's not much sense in
performing an FROM ONLY on a declarative partitioned table, then
perhaps there's some room to have these behave differently for
partitioned tables, but changing their meaning now might confuse
people who are used to what they currently already do.

David

[1] https://www.postgresql.org/docs/12/sql-altertable.html



Thanks for you thorough answer!

 "n_distinct affects the statistics for the table itself, while n_distinct_inherited affects the statistics
 gathered for the table plus its inheritance children."

I see that this text has multiple interpretations. I read it more like: while n_distinct_inherited affects the
statisticsgathered [by analyze] for the table plus [affects the statistics of] its inheritance children. 

The core problem for me was that I had to set n_distinct manually on my individual partitions for the planner to choose
asane plan for queries on the declared partition.  The data for a column was so skewed that the analyzers n_distinct
wasoff by a factor of 1000X, which made all partitions do a table scan instead of index scan.  In my mind, setting the
declarativepartition variable n_distinct_inherited would solve that; but obviously I am wrong on that. 

Klaudie




On Thu, 2 Jul 2020 at 05:16, Klaudie Willis
<Klaudie.Willis@protonmail.com> wrote:
>  "n_distinct affects the statistics for the table itself, while n_distinct_inherited affects the statistics
>  gathered for the table plus its inheritance children."
>
> I see that this text has multiple interpretations. I read it more like: while n_distinct_inherited affects the
statisticsgathered [by analyze] for the table plus [affects the statistics of] its inheritance children.
 

I see. I didn't read it that way before, but it's fairly obvious now
that that's how you interpreted it.  I think there's room here to
change the documentation so that future readers are not misled.

How about?

diff --git a/doc/src/sgml/ref/alter_table.sgml
b/doc/src/sgml/ref/alter_table.sgml
index cbfb4828e5..2495564b07 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -328,8 +328,8 @@ WITH ( MODULUS <replaceable
class="parameter">numeric_literal</replaceable>, REM
       number-of-distinct-values estimates made by subsequent
       <xref linkend="sql-analyze"/>
       operations.  <literal>n_distinct</literal> affects the
statistics for the table
-      itself, while <literal>n_distinct_inherited</literal> affects
the statistics
-      gathered for the table plus its inheritance children.  When set to a
+      itself, while <literal>n_distinct_inherited</literal> affects the table's
+      inheritance statistics.  When set to a
       positive value, <command>ANALYZE</command> will assume that the
column contains
       exactly the specified number of distinct nonnull values.  When set to a
       negative value, which must be greater

David