Re: incorrect information in documentation - Mailing list pgsql-docs

From David G. Johnston
Subject Re: incorrect information in documentation
Date
Msg-id CAKFQuwaV87e4sdFdZab4+zcUdza+EpveQi_98=f9wJ1+QLUALw@mail.gmail.com
Whole thread Raw
In response to Re: incorrect information in documentation  (Bruce Momjian <bruce@momjian.us>)
Responses Re: incorrect information in documentation  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-docs
On Mon, Aug 9, 2021 at 11:05 AM Bruce Momjian <bruce@momjian.us> wrote:

>         selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/
>         num_distinct1,
>         1/num_distinct2)
>                     = (1 - 0) * (1 - 0) / max(10000, 10000)
>                     = 0.0001

Nice, can you provide a patch please?


Change the line:

selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)

to be:

selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_distinct1, num_distinct2)

The wording already talks about "divide by max".

Though:

"so we use an algorithm that relies only on the number of distinct values for both relations together with their null fractions:"

maybe adds a parenthetical note:

"so we use an algorithm that relies only on the number of distinct values (the row count estimate for the whole table, not the -1 in the column statistics) for both relations together with their null fractions:"

Just note I haven't tried to absorb that whole page, let alone the implementation, and am not all that familiar with this part of PostgreSQL.  Its seems right, though, in isolation.

David J.

pgsql-docs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: incorrect information in documentation
Next
From: PG Doc comments form
Date:
Subject: Potential vuln in example for "F.25.1.1. digest()"