Re: Improve docs for n_distinct_inherited - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Improve docs for n_distinct_inherited
Date
Msg-id CAKFQuwZ60bW_1qN=H0h4r_qN2eNNekKtEqxssRV=NqLNV_PYUg@mail.gmail.com
Whole thread Raw
In response to Re: Improve docs for n_distinct_inherited  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Improve docs for n_distinct_inherited
List pgsql-hackers
On Sun, Oct 12, 2025 at 7:42 PM David Rowley <dgrowleyml@gmail.com> wrote:
Just picking this one up again. I forgot to come back to this after PGConf.dev.

On Fri, 9 May 2025 at 02:50, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> I was missing this key piece of knowledge which invalidated my entire attempt.
>
> Here's an attempt at shortening this now that I understand the mechanics better.
>
>       Separate options exist because an inheritance parent table has two
>       different sets of statistics: one considering only itself and one which
>       also includes its children (<literal>n_distinct_inherited</literal>).
>       Partitioned tables, which only have rows in the children, likewise uses
>       the inherited option while everyone else uses <literal>n_distinct</literal>.

I wasn't quite happy with that as the text indicates that
n_distinct_inherited is the statistics.  But, it's not, it's just the
option that allows some modification of the gathered statistics.

I came up with:

      Ordinarily <literal>n_distinct</literal> is used.
      <literal>n_distinct_inherited</literal> exists to allow the distinct
      estimate to be overwritten for the statistics gathered for inheritance
      parent tables and for partitioned tables.

How about:

"n_disinct is used for normal tables while n_distinct_inherited is used for partitioned tables.  Both are usable (selected via the ONLY modifier) for an inheritance parent table since it has both storage and children."

The use of both "Ordinarily" and "overwritten" is bothering me here.  And it implies that n_distinct doesn't work for inheritance parent tables or, conversely, that n_distinct does work for partitioned tables.


I also fixed what I thought was some misleading text about ANALYZE
using this value to calculate things.

values in the column is linear with the estimated number
>      of rows in the table; the exact count is to be computed by multiplying the estimated
>      rows in the table by the absolute value of the given number.

"...is proportional to the estimated number of rows in the table at planning time."

(The final "at planning time" substitutes for the sentence you pondered removing.)

(The rest, including the examples, seem a bit self-explanatory given the definition, though I do get reader inexperience with the terminology.  But proportional implies a linear relationship, and the positive/negative bifurcation seems straight-forward here.)

I'm thinking everything else below is better incorporated into 14.2 which should be linking back to this section.  That way the crux of the usage is defined in syntax while the details about setting a specific value are located in the section covering the overall topic.

 the exact count is to be computed by multiplying the estimated
+      rows in the table by the absolute value of the given number.  For example,
       a value of -1 implies that all values in the column are distinct, while
-      a value of -0.5 implies that each value appears twice on the average.
+      a value of -0.5 implies that each value appears twice on average.
       This can be useful when the size of the table changes over time, since
       the multiplication by the number of rows in the table is not performed
       until query planning time.  

(Leave: Specify a value of 0 to revert to estimating...)

That said, this rework would be OK as-is.

Also, looking at stadistinct, the multiplier stored there accounts for the presence of null.  This attribute-option does not.  Is that difference worth noting?


David J.

pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE
Next
From: jian he
Date:
Subject: Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it