Re: Improve docs for n_distinct_inherited - Mailing list pgsql-hackers

From Chao Li
Subject Re: Improve docs for n_distinct_inherited
Date
Msg-id E93C6658-A4E8-420B-8AFD-7E4EFFF81EDB@gmail.com
Whole thread Raw
In response to Re: Improve docs for n_distinct_inherited  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Improve docs for n_distinct_inherited
List pgsql-hackers
Hi David,

I think your revision is good and accurate.

On Oct 13, 2025, at 07:42, David Rowley <dgrowleyml@gmail.com> wrote:

Just picking this one up again. I forgot to come back to this after PGConf.dev.

I came up with:

     Ordinarily <literal>n_distinct</literal> is used.
     <literal>n_distinct_inherited</literal> exists to allow the distinct
     estimate to be overwritten for the statistics gathered for inheritance
     parent tables and for partitioned tables.

This clarifies that n_distinct_inherited applies to both inheritance parents and partitioned tabled, that’s accurate and better than the original wording.


I also fixed what I thought was some misleading text about ANALYZE
using this value to calculate things. That's not true. It's the query
planner that uses this value. ANALYZE just stores whatever this is set
to into pg_statistic. I also adjusted the text that was talking about
"the size of the table", which, as I mentioned earlier isn't correct.
It's all related to the estimated number rows in the table, per
"ntuples = vardata->rel->tuples;" in get_variable_numdistinct().


When I read the diff, I thought “table size” is usually used in PG docs, so maybe “estimated table size”. But as you explicitly explained why you made the change, I am fine with your change.

Also fixed a typo; "twice on the average" shouldn't contain "the".


Correct grammer fix.

I wonder if ", since the multiplication by the number of rows in the
table is not performed until query planning time" should be deleted
since I modified the text earlier to talk about "the query planner”.


Yeah, with your rewrite, that clause now feels a little redundant. I think it can be removed entirely.

The other thing that doesn’t belong to your change but as you are touching here:

“When set to a negative value, which must be greater than or equal to -1"

When I first time read the doc, I was confused. Because no easier sentence indicated “n_distinct” is of float type. I thought “greater than” was a typo. When I read through, the later example (0.5) resolved my confusion. To avoid the same confusion to other readers, maybe change to “when set to a negative value between -1 and 0 (inclusive of -1)” or “when set to a negative value, which must be in the range -1<= value < 0”. 

But as I said, this doesn’t belong to your change. If you don’t want to enhance this place, or you don’t consider this is a problem, I am absolutely fine.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: finish TODOs in to_json_is_immutable, to_jsonb_is_immutable also add tests on it
Next
From: Tom Lane
Date:
Subject: Re: Failure building libpq v18.0 on old aarch64