On Thu Jun 9, 2022 at 11:57 AM EDT, David G. Johnston wrote:
> Reposting this to its own thread.
>
> https://www.postgresql.org/message-id/flat/CAKFQuwby1aMsJDMeibaBaohgoaZhivAo4WcqHC1%3D9-GDZ3TSng%40mail.gmail.com
>
> doc: make unique non-null join selectivity example match the prose
>
> The description of the computation for the unique, non-null,
> join selectivity describes a division by the maximum of two values,
> while the example shows a multiplication by their reciprocal. While
> equivalent the max phrasing is easier to understand; which seems
> more important here than precisely adhering to the formula used
> in the code (for which either variant is still an approximation).
>
> While both num_distinct and num_rows are equal for a unique column
> both the concept and formula use row count (10,000) and the
> field num_distinct has already been set to mean the specific value
> present in the pg_stats table (i.e, -1), so use num_rows here.
Pointing out that n_distinct = -1 is helpful but changing "because" to
"and" suggests that the missing MCV info is coincidental or a side
effect. Is there any case in which the stronger "because" wouldn't be
appropriate?
The second parenthetical (num_rows, not shown, but "tenk") took me a
minute to get since the row counts are only apparent on looking somewhat
closely at the other examples in the chapter. num_rows also isn't a
column in pg_stats which the "not shown" could be taken to imply; it's
sourced from somewhere else and only given as num_rows in this example.
How's '(as num_rowsN, 10,000 for both "tenk" example tables)'?
By "this value does get scaled in the non-unique case" do you mean it
relies on n_distinct as in the uncorrected algorithm listing? If so I
think it'd help to specify that.
You didn't take this line on but "This is, subtract the null
fraction..." omits the step of multiplying the complements of the null
fractions together before dividing.
Should n_distinct and num_rows be <structname>d in the text?