Thread: Should 'sum(mvf)' read 'sum(mcv)'...?

Should 'sum(mvf)' read 'sum(mcv)'...?

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
Description:

About halfway down this page
https://www.postgresql.org/docs/current/row-estimation-examples.html we see
the following formula for calculating selectivity:

    > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)

And just below the formula we see the explanatory sentence saying:

   > That is, add up all the frequencies for the MCVs and subtract them from
one, ...

It appears the above sentence is referring to the "(1 - sum(mvf))" portion
of the formula, however I am not sure what "mvf" is referring to
there...shouldn't it be "(1 - sum(mcv))" in order to match what the
explanatory sentence is saying?

Many thanks,
Eric Mutta.

Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Julien Rouhaud
Date:
Hi,

On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
> Description:
> 
> About halfway down this page
> https://www.postgresql.org/docs/current/row-estimation-examples.html we see
> the following formula for calculating selectivity:
> 
>     > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
> 
> And just below the formula we see the explanatory sentence saying:
> 
>    > That is, add up all the frequencies for the MCVs and subtract them from
> one, ...
> 
> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> of the formula, however I am not sure what "mvf" is referring to
> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
that survived until now.



Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Daniel Gustafsson
Date:
> On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
> On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:

>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?
>
> It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> that survived until now.

That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.

Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?

--
Daniel Gustafsson        https://vmware.com/


Attachment

Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Julien Rouhaud
Date:
Hi,

On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote:
> > On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
> 
> >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> >> of the formula, however I am not sure what "mvf" is referring to
> >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> >> explanatory sentence is saying?
> > 
> > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> > that survived until now.
> 
> That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
> CC:ing Tom for a trip down memory lane.

That was actually introduced 2 years before in 234d50812c8 by Bruce.

> Looking at this I noticed that we mark up MCV and MCF as acronyms but they
> aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
> between markup and content, so we should probably do that as per the attached?

Agreed, although MCF is only used in planstats.sgml and the acronym defined
locally.



Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Daniel Gustafsson
Date:
> On 22 Aug 2022, at 12:08, Julien Rouhaud <rjuju123@gmail.com> wrote:

> That was actually introduced 2 years before in 234d50812c8 by Bruce.

Yes, I was unclear, I meant that the second use was by Tom (whom I also missed
to CC as I said I would so doing that now).

--
Daniel Gustafsson        https://vmware.com/




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?

> It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
> that survived until now.

I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies".  (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)

I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".

            regards, tom lane



Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Daniel Gustafsson
Date:
> On 22 Aug 2022, at 14:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Julien Rouhaud <rjuju123@gmail.com> writes:
>> On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote:
>>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>>> of the formula, however I am not sure what "mvf" is referring to
>>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>>> explanatory sentence is saying?
>
>> It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
>> that survived until now.
>
> I don't think it's a typo exactly, but an odd abbreviation for "Most
> common Values' Frequencies".  (Summing the MCVs themselves isn't
> sensible; they might not even be numeric.)
>
> I'd vote for replacing mvf in both places with something a bit more
> spelled-out, perhaps "mcv_freqs".

I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better.  The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as <acronym>).

--
Daniel Gustafsson


Attachment

Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
> I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
> elsewhere on the same page so keeping it consistent seems better.  The attached
> does this as well as adding mcf/mcv as acronyms as previously mentioned (since
> they are both tagged as <acronym>).

mcv_freqs looks good.  I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs").  Also, just
expanding the acronym doesn't seem that helpful.  Maybe more like

    MCF

    Most Common Frequency, that is the frequency associated
    with some Most Common Value

    MCV

    Most Common Value, one of the values appearing most often
    within a particular table column

            regards, tom lane



Re: Should 'sum(mvf)' read 'sum(mcv)'...?

From
Daniel Gustafsson
Date:
> On 12 Apr 2023, at 14:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>> I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
>> elsewhere on the same page so keeping it consistent seems better.  The attached
>> does this as well as adding mcf/mcv as acronyms as previously mentioned (since
>> they are both tagged as <acronym>).
>
> mcv_freqs looks good.  I'd write the glossary entries as singular
> (Most Common Frequency, Most Common Value) since our typical usage
> is to pluralize them at the point of use ("MCVs").  Also, just
> expanding the acronym doesn't seem that helpful.  Maybe more like

Pushed with your suggested changes, thanks!

--
Daniel Gustafsson