Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date | |
Msg-id | 20190313.131951.178195776.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
List | pgsql-hackers |
Hello. At Wed, 13 Mar 2019 02:25:40 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <19f76496-dcf3-ccea-dd82-26fbed57b8f5@2ndquadrant.com> > Hi, > > attached is an updated version of the patch, addressing most of the > issues raised in the recent reviews. There are two main exceptions: > > 1) I haven't reworked the regression tests to use a function to check > cardinality estimates and making them faster. > > 2) Review handling of bitmap in statext_is_compatible_clause_internal > when processing AND/OR/NOT clauses. > > I plan to look into those items next, but I don't want block review of > other parts of the patch unnecessarily. I briefly looked it and have some comments. 0001-multivariate-MCV-lists-20190312.patch +/* + * bms_member_index + * determine 0-based index of the varattno in the bitmap + * + * Returns (-1) when the value is not a member. I think the comment should be more generic. "determine 0-based index of member x among the bitmap members" " Returns -1 when x is not a member." (cont'ed) + if (a == NULL) + return 0; Isn't the case of "not a member"? bms_member_index seems working differently than maybe expected. bms_member_index((2, 4), 0) => 0, (I think) should be -1 bms_member_index((2, 4), 1) => 0, should be -1 bms_member_index((2, 4), 2) => 0, should be 0 bms_member_index((2, 4), 3) => 1, should be -1 bms_member_index((2, 4), 4) => 1, should be 1 bms_member_index((2, 4), 5) => 2, should be -1 bms_member_index((2, 4), 6) => 2, should be -1 ... bms_member_index((2, 4), 63) => 2, should be -1 bms_member_index((2, 4), 64) => -1, correct It works correctly only when x is a member - the way the function is maybe actually used in this patch -, or needs to change the specifiction (or the comment) of the function. + if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) + { + /* + * Estimate selectivity on any clauses applicable by stats tracking + * actual values first, then apply functional dependencies on the + * remaining clauses. The comment doesn't seem needed since it is mentioning the detail of statext_clauselist_selectivity() called just below. + if (statext_is_kind_built(htup, STATS_EXT_MCV)) + { + StatisticExtInfo *info = makeNode(StatisticExtInfo); + + info->statOid = statOid; + info->rel = rel; + info->kind = STATS_EXT_MCV; + info->keys = bms_copy(keys); + + stainfos = lcons(info, stainfos); + } We are to have four kinds of extended statistics, at worst we have a list containing four StatisticExtInfos with the same statOid, rel, keys and only different kind. Couldn't we reverse the structure so that StatisticExtIbfo be something like: > struct StatsticExtInfo > { > NodeTag type; > Oid statOid; > RelOptInfo *rel; ! char kind[8]; /* arbitrary.. */ > Bitmapset *keys; +OBJS = extended_stats.o dependencies.o mcv.o mvdistinct.o The module for MV distinctness is named 'mvdistinct', but mcv doesn't have the prefix. I'm not sure we need to unify the names, though. +Multivariate MCV (most-common values) lists are a straightforward extension of "lists are *a*" is wrong? @@ -223,26 +220,16 @@ dependency_degree(int numrows, HeapTuple *rows, int k, AttrNumber *dependency, I haven't read it in datil, but why MV-MCV patch contains (maybe) improvement of functional dependency code? +int +compare_scalars_simple(const void *a, const void *b, void *arg) Seems to need a comment. "compare_scalars without tupnoLink maininance"? +int +compare_datums_simple(Datum a, Datum b, SortSupport ssup) +{ + return ApplySortComparator(a, false, b, false, ssup); +} This wrapper function doesn't seem to me required. +/* simple counterpart to qsort_arg */ +void * +bsearch_arg(const void *key, const void *base, size_t nmemb, size_t size, We have some functions named *_bsearch. If it is really qsort_arg's bsearch versoin, it might be better to be placed in qsort_arg.c or new file bsearch_arg.c? +int * +build_attnums_array(Bitmapset *attrs) If the attrs is not offset, I'd like that it is named differently, say, attrs_nooffset or something. + int i, + j, + len; I'm not sure but is it following our coding convention? + items[i].values[j] = heap_getattr(rows[i], items is needed by qsort_arg and as return value. It seems to me that using just values[] and isnull[] make the code simpler there. + /* Look inside any binary-compatible relabeling (as in examine_variable) */ + if (IsA(clause, RelabelType)) + clause = (Node *) ((RelabelType *) clause)->arg; This is quite a common locution so it's enough that the comment just mention what it does, like "Remove any relabel decorations". And relabelling can happen recursively so the 'if' should be 'while'? + /* we also better ensure the Var is from the current level */ + if (var->varlevelsup > 0) + return false; I don't get the meaning of the "better". If it cannot/don't accept subquery's output, it would be "we refuse Vars from ...", or if the function is not assumed to receive such Vars, it should be an assertion. + /* see if it actually has the right shape (one Var, one Const) */ + ok = (NumRelids((Node *) expr) == 1) && + (is_pseudo_constant_clause(lsecond(expr->args)) || + (varonleft = false, + is_pseudo_constant_clause(linitial(expr->args)))); I don't think such "expression" with unidentifiable side-effect is a good thing. Counldn't it in more plain code? (Yeah, it is already used in clauselist_selectivity so I don't insist on that.) + * This uses the function for estimating selectivity, not the operator + * directly (a bit awkward, but well ...). Not only it is the right thing but actually the operators for the type path don't have operrst. + * statext_is_compatible_clause + * Determines if the clause is compatible with MCV lists. I think the name should contain the word "mcv". Isn't the name better to be "staext_clause_is_mcv_compatibe"? (Sorry, further comments may come later..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: