Re: Improve docs for n_distinct_inherited - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Improve docs for n_distinct_inherited |
Date | |
Msg-id | CAApHDvrRqVB6OesPq9Z0S_872u1+qq+FLyEFU9FVdiPUET66YA@mail.gmail.com Whole thread Raw |
In response to | Improve docs for n_distinct_inherited (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Improve docs for n_distinct_inherited
|
List | pgsql-hackers |
On Thu, 8 May 2025 at 15:23, David G. Johnston <david.g.johnston@gmail.com> wrote: > Not liking the proposal, not sure it is even correct. Somehow "children of inheritance parent tables" are omitted. I don't see the quoted text anywhere in this area, so I'm not sure I follow what you mean with the omission. > In both cases the only statistics affected are those of the table upon which the option is being read. That can go implied. Leaving "affects the statistics for the table itself, while" is causing most of the harm. So the text, when it says "affects the statistics for the table itself" is talking about stainherit==false statistics and the "the statistics gathered which include the descendant tables of partitioned tables and inheritance parent tables" which I added is trying to describe the stainherits==true statistics. You may have misinterpreted that to think it was talking which table the option was being applied to rather than where the sample rows were being gathered from. > In terms of wording - n_distinct must be used for all tables except partitioned tables and inheritance parent tables. Those two acquire their override value from n_distinct_inherited. So something like: > > diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml > index 24316bb816a..657e4c835ac 100644 > --- a/doc/src/sgml/ref/alter_table.sgml > +++ b/doc/src/sgml/ref/alter_table.sgml > @@ -339,9 +339,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM > <literal>n_distinct_inherited</literal>, which override the > number-of-distinct-values estimates made by subsequent > <link linkend="sql-analyze"><command>ANALYZE</command></link> > - operations. <literal>n_distinct</literal> affects the statistics for the table > - itself, while <literal>n_distinct_inherited</literal> affects the statistics > - gathered for the table plus its inheritance children. When set to a > + operations. > + For partitioned tables and inheritance parents <command>ANALYZE</command> > + consults the <literal>n_distinct_inherited</literal> option and, if applicable, > + includes all descendants when calculating the table size for use in the > + number-of-distinct-values estimate. For all other tables, <literal>n_distinct</literal> > + is used and the table size estimate is based on the table itself. > + When set to a > positive value, <command>ANALYZE</command> will assume that the column contains > exactly the specified number of distinct nonnull values. When set to a > negative value, which must be greater > > It has a bit of forward-referencing to the following paragraph, but nothing surprising or novel. The existing wordingis already doing the same by alluding to descendants/children - but its just one data point that is involved so letscall it out. I can't follow your proposed text. It seems to indicate that ANALYZE uses n_distinct_inherits to calculate the table size. ANALYZE does nothing of the sort. I think you might have been confused by the following text which talks about how *the planner* proportions *negative* stadistinct values over the "table size", which really should say estimated row count and not table size. table size has nothing to do with this, but that seems like another issue. As for my proposed patch and the text it changes, what this text is meant to be talking about is which of ANALYZE's sample sets the stadistinct estimate is overridden for. The two options for sample sets are 1) stainherits==true and 2) stainherits==false. With inheritance parent tables, ANALYZE gathers both sets and records both sets in pg_statistic. Which one the planner uses depends on what it's doing and whether you specify ONLY or not in your query. For partitioned tables, since they're empty, we don't bother with stainherits==false stats. I've attached another patch which adjusts the text to detail that there are two sample row sets gathered for some cases, and then mentions the two options that can be used to selectively overwrite each sample's n_distinct estimate. I'd rather find a shorter set of words to better explain this than what I'm proposing in v2, but the current text has presumably confused one person (in the bug report) and the v1 patch still seems to be a source of confusion. David
Attachment
pgsql-hackers by date: