Thread: Should 'sum(mvf)' read 'sum(mcv)'...?
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/14/row-estimation-examples.html Description: About halfway down this page https://www.postgresql.org/docs/current/row-estimation-examples.html we see the following formula for calculating selectivity: > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv) And just below the formula we see the explanatory sentence saying: > That is, add up all the frequencies for the MCVs and subtract them from one, ... It appears the above sentence is referring to the "(1 - sum(mvf))" portion of the formula, however I am not sure what "mvf" is referring to there...shouldn't it be "(1 - sum(mcv))" in order to match what the explanatory sentence is saying? Many thanks, Eric Mutta.
Hi, On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/14/row-estimation-examples.html > Description: > > About halfway down this page > https://www.postgresql.org/docs/current/row-estimation-examples.html we see > the following formula for calculating selectivity: > > > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv) > > And just below the formula we see the explanatory sentence saying: > > > That is, add up all the frequencies for the MCVs and subtract them from > one, ... > > It appears the above sentence is referring to the "(1 - sum(mvf))" portion > of the formula, however I am not sure what "mvf" is referring to > there...shouldn't it be "(1 - sum(mcv))" in order to match what the > explanatory sentence is saying? It should be mcf, ie. Most Common Frequencies. It looks like a very old typo that survived until now.
> On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote: > On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote: >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion >> of the formula, however I am not sure what "mvf" is referring to >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the >> explanatory sentence is saying? > > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > that survived until now. That seems plausible, but it does seem introduced on purpose in f5678e8e075 so CC:ing Tom for a trip down memory lane. Looking at this I noticed that we mark up MCV and MCF as acronyms but they aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping between markup and content, so we should probably do that as per the attached? -- Daniel Gustafsson https://vmware.com/
Attachment
Hi, On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote: > > On 22 Aug 2022, at 09:48, Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote: > > >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion > >> of the formula, however I am not sure what "mvf" is referring to > >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the > >> explanatory sentence is saying? > > > > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > > that survived until now. > > That seems plausible, but it does seem introduced on purpose in f5678e8e075 so > CC:ing Tom for a trip down memory lane. That was actually introduced 2 years before in 234d50812c8 by Bruce. > Looking at this I noticed that we mark up MCV and MCF as acronyms but they > aren't defined in acronyms.sgml. ISTM it's a good idea to keep a 1:1 mapping > between markup and content, so we should probably do that as per the attached? Agreed, although MCF is only used in planstats.sgml and the acronym defined locally.
> On 22 Aug 2022, at 12:08, Julien Rouhaud <rjuju123@gmail.com> wrote: > That was actually introduced 2 years before in 234d50812c8 by Bruce. Yes, I was unclear, I meant that the second use was by Tom (whom I also missed to CC as I said I would so doing that now). -- Daniel Gustafsson https://vmware.com/
Julien Rouhaud <rjuju123@gmail.com> writes: > On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote: >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion >> of the formula, however I am not sure what "mvf" is referring to >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the >> explanatory sentence is saying? > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo > that survived until now. I don't think it's a typo exactly, but an odd abbreviation for "Most common Values' Frequencies". (Summing the MCVs themselves isn't sensible; they might not even be numeric.) I'd vote for replacing mvf in both places with something a bit more spelled-out, perhaps "mcv_freqs". regards, tom lane
> On 22 Aug 2022, at 14:58, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Julien Rouhaud <rjuju123@gmail.com> writes: >> On Sun, Aug 21, 2022 at 11:02:04PM +0000, PG Doc comments form wrote: >>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion >>> of the formula, however I am not sure what "mvf" is referring to >>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the >>> explanatory sentence is saying? > >> It should be mcf, ie. Most Common Frequencies. It looks like a very old typo >> that survived until now. > > I don't think it's a typo exactly, but an odd abbreviation for "Most > common Values' Frequencies". (Summing the MCVs themselves isn't > sensible; they might not even be numeric.) > > I'd vote for replacing mvf in both places with something a bit more > spelled-out, perhaps "mcv_freqs". I was inclined to spell it out as mcv_frequencies but we use xxx_freqs elsewhere on the same page so keeping it consistent seems better. The attached does this as well as adding mcf/mcv as acronyms as previously mentioned (since they are both tagged as <acronym>). -- Daniel Gustafsson
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: > I was inclined to spell it out as mcv_frequencies but we use xxx_freqs > elsewhere on the same page so keeping it consistent seems better. The attached > does this as well as adding mcf/mcv as acronyms as previously mentioned (since > they are both tagged as <acronym>). mcv_freqs looks good. I'd write the glossary entries as singular (Most Common Frequency, Most Common Value) since our typical usage is to pluralize them at the point of use ("MCVs"). Also, just expanding the acronym doesn't seem that helpful. Maybe more like MCF Most Common Frequency, that is the frequency associated with some Most Common Value MCV Most Common Value, one of the values appearing most often within a particular table column regards, tom lane
> On 12 Apr 2023, at 14:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Daniel Gustafsson <daniel@yesql.se> writes: >> I was inclined to spell it out as mcv_frequencies but we use xxx_freqs >> elsewhere on the same page so keeping it consistent seems better. The attached >> does this as well as adding mcf/mcv as acronyms as previously mentioned (since >> they are both tagged as <acronym>). > > mcv_freqs looks good. I'd write the glossary entries as singular > (Most Common Frequency, Most Common Value) since our typical usage > is to pluralize them at the point of use ("MCVs"). Also, just > expanding the acronym doesn't seem that helpful. Maybe more like Pushed with your suggested changes, thanks! -- Daniel Gustafsson