Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date | |
Msg-id | CAEZATCVG+xCQc8NGx_LCY8xr-RcwaxQLfibkMnTT5A60DVturg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
List | pgsql-hackers |
On Sat, 9 Mar 2019 at 18:33, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Thu, 28 Feb 2019 at 19:56, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Attached is an updated version of this patch series. > > Here are some random review comments. I'll add more later, but I'm out > of energy for today. > Here are some more comments: 11). In dependency_degree(): - /* sort the items so that we can detect the groups */ - qsort_arg((void *) items, numrows, sizeof(SortItem), - multi_sort_compare, mss); + /* + * build an array of SortItem(s) sorted using the multi-sort support + * + * XXX This relies on all stats entries pointing to the same tuple + * descriptor. Not sure if that might not be the case. + */ + items = build_sorted_items(numrows, rows, stats[0]->tupDesc, + mss, k, attnums_dep); That XXX comment puzzled me for a while. Actually it's OK though, unless/until we try to support stats across multiple relations, which will require a much larger refactoring of this code. For now though, The stats entries all point to the same tuple descriptor from the onerel passed to BuildRelationExtStatistics(), so it's OK to just use the first tuple descriptor in this way. The comment should be updated to explain that. 12). bms_member_index() should surely be in bitmapset.c. It could be more efficient by just traversing the bitmap words and making use of bmw_popcount(). Also, its second argument should be of type 'int' for consistency with other bms_* functions. 13). estimate_ndistinct() has been moved from mvdistinct.c to extended_stats.c and changed from static to extern, but it is only called from mvdistinct.c, so that change is unnecessary (at least as far as this patch is concerned). 14). The attnums Bitmapset passed to statext_is_compatible_clause_internal() is an input/output argument that it updates. That should probably be documented. When it calls itself recursively for AND/OR/NOT clauses, it could just pass the original Bitmapset through to be updated (rather than creating a new one and merging it), as it does for other types of clause. On the other hand, the outer function statext_is_compatible_clause() does need to return a new bitmap, which may or may not be used by its caller, so it would be cleaner to make that a strictly "out" parameter and initialise it to NULL in that function, rather than in its caller. 15). As I said yesterday, I don't think that there is a clean separator of concerns between the functions clauselist_selectivity(), statext_clauselist_selectivity(), dependencies_clauselist_selectivity() and mcv_clauselist_selectivity(), I think things could be re-arranged as follows: statext_clauselist_selectivity() - as the name suggests - should take care of *all* extended stats estimation, not just MCVs and histograms. So make it a fairly small function, calling mcv_clauselist_selectivity() and dependencies_clauselist_selectivity(), and histograms when that gets added. Most of the current code in statext_clauselist_selectivity() is really MCV-specific, so move that to mcv_clauselist_selectivity(). Amongst other things, that would move the call to choose_best_statistics() to mcv_clauselist_selectivity() (just as dependencies_clauselist_selectivity() calls choose_best_statistics() to get the best dependencies statistics). Then, when histograms are added later, you won't have the problem pointed out before where it can't apply both MCV and histogram stats if they're on different STATISTICS objects. Most of the comments for statext_clauselist_selectivity() are also MCV-related. Those too would move to mcv_clauselist_selectivity(). Regards, Dean
pgsql-hackers by date: