Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 66ba9c45-3b07-e5b6-fd11-15e75f6ee8e1@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers

On 03/26/2018 09:01 PM, Dean Rasheed wrote:
> On 18 March 2018 at 23:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Attached is an updated version of the patch series, addressing issues
>> pointed out by Alvaro.
> 
> I've just been reading the new code in
> statext_clauselist_selectivity() and mcv_clauselist_selectivity(), and
> I'm having a hard time convincing myself that it's correct.
> 
> This code in statext_clauselist_selectivity() looks a bit odd:
> 
>     /*
>      * Evaluate the MCV selectivity. See if we got a full match and the
>      * minimal selectivity.
>      */
>     if (stat->kind == STATS_EXT_MCV)
>         s1 = mcv_clauselist_selectivity(root, stat, clauses, varRelid,
>                                         jointype, sjinfo, rel,
>                                         &fullmatch, &mcv_lowsel);
> 
>     /*
>      * If we got a full equality match on the MCV list, we're done (and the
>      * estimate is likely pretty good).
>      */
>     if (fullmatch && (s1 > 0.0))
>         return s1;
> 
>     /*
>      * If it's a full match (equalities on all columns) but we haven't found
>      * it in the MCV, then we limit the selectivity by frequency of the last
>      * MCV item. Otherwise reset it to 1.0.
>      */
>     if (!fullmatch)
>         mcv_lowsel = 1.0;
> 
>     return Min(s1, mcv_lowsel);
> 
> So if fullmatch is true and s1 is greater than 0, it will return s1. 
> If fullmatch is true and s1 equals 0, it will return Min(s1, 
> mcv_lowsel) which will also be s1. If fullmatch is false, mcv_lowsel 
> will be set to 1 and it will return Min(s1, mcv_lowsel) which will 
> also be s1. So it always just returns s1, no? Maybe there's no point 
> in computing fullmatch.
> 

Hmmm, I think you're right. It probably got broken in the last rebase,
when I moved a bunch of code from the histogram part to the MCV one.
I'll take a look.

> Also, wouldn't mcv_lowsel potentially be a significant overestimate
> anyway? Perhaps 1 minus the sum of the MCV frequencies might be
> closer, but even that ought to take into account the number of
> distinct values remaining, although that information may not always be
> available.
> 

That is definitely true. 1 minus the sum of the MCV frequencies, and I
suppose we might even improve that if we had some ndistinct estimate on
those columns to compute an average.

> Also, just above that, in statext_clauselist_selectivity(), it
> computes the list stat_clauses, then doesn't appear to use it
> anywhere. I think that would have been the appropriate thing to pass
> to mcv_clauselist_selectivity(). Otherwise, passing unrelated clauses
> into mcv_clauselist_selectivity() will cause it to fail to find any
> matches and then underestimate.
> 

Will check.

> I've also come across a few incorrect/out-of-date comments:
> 
> /*
>  * mcv_clauselist_selectivity
>  *      Return the estimated selectivity of the given clauses using MCV list
>  *      statistics, or 1.0 if no useful MCV list statistic exists.
>  */
> 
> -- I can't see any code path that returns 1.0 if there are no MCV
> stats. The last part of that comment is probably more appropriate to
> statext_clauselist_selectivity()
> 
> 
> /*
>  * mcv_update_match_bitmap
>  * [snip]
>  * The function returns the number of items currently marked as 'match', and
>  * ...
> 
> -- it doesn't seem to return the number of items marked as 'match'.
> 
> Then inside that function, this comment is wrong (copied from the
> preceding comment):
> 
>                 /* AND clauses assume nothing matches, initially */
>                 memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
> mcvlist->nitems);
> 
> Still reading...
> 
> Regards,
> Dean
> 

Yeah, sorry about that - I forgot to fix those comments after removing
the match counting to simplify the patches.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Next
From: Tom Lane
Date:
Subject: Re: Why does load_external_function() return PGFunction?