Re: multivariate statistics / patch v7 - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: multivariate statistics / patch v7 |
Date | |
Msg-id | 20150716.205157.170797028.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: multivariate statistics / patch v7 (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: multivariate statistics / patch v7
|
List | pgsql-hackers |
Hi, I'd like to show you the modified constitution of multivariate statistics application logic. Please find the attached. They apply on your v7 patch. The code to find mv-applicable clause is moved out of the main flow of clauselist_selectivity. As I said in the previous mail, the new function transformRestrictInfoForEstimate (too bad name but just for PoC:) scans clauselist and generates RestrictStatsData struct which drives mv-aware selectivity calculation. This struct isolates MV and non-MV estimation. The struct RestrictStatData mainly consists of the following three parts, - clause to be estimated by current logic (MV is not applicable) - clause to be estimated by MV-staistics. - list of childRestrictStatDatas, which are to be run recursively. mvclause_selectivty() is the topmost function where mv stats works. This structure effectively prevents main estimation flow from being broken by modifying mvstats part. Although I haven't measured but I'm positive the code is far reduced from yours. I attached two patches to this message. The first one is to rebase v7 patch to current(maybe) master and the second applies the refactoring. I'm a little anxious about performance but I think this makes the process to apply mv-stats far clearer. Regtests for mvstats succeeded asis except for fdep, which is not implememted in this patch. What do you think about this? regards, > Hi, Thanks for the detailed explaination. I misunderstood the > code (more honest speaking, din't look so close there). Then I > looked it closer. > > > At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <559C76D4.2030805@2ndquadrant.com> > > FWIW this was a stupid bug in update_match_bitmap_histogram(), which > > initially handled only AND clauses, and thus assumed the "match" of a > > bucket can only decrease. But for OR clauses this is exactly the > > opposite (we assume no buckets match and add buckets matching at least > > one of the clauses). > > > > With this fixed, the estimates look like this: > > > > > IMHO pretty accurate estimates - no issue with OR clauses. > > Ok, I understood the diferrence between what I thought and what > you say. The code is actually concious of OR clause but is looks > somewhat confused. > > Currently choosing mv stats in clauselist_selectivity can be > outlined as following, > > 1. find_stats finds candidate mv stats containing *all* > attributes appeared in the whole clauses regardless of and/or > exprs by walking whole the clause tree. > > Perhaps this is the measure to early bailout. > > 2.1. Within every disjunction elements, collect mv-related > attributes while checking whether the all leaf nodes (binop or > ifnull) are compatible by (eventually) walking whole the > clause tree. > > 2.2. Check if all the collected attribute are contained in > mv-stats columns. > > 3. Finally, clauseset_mv_selectivity_histogram() (and others). > > This funciton applies every ExprOp onto every attribute in > every histogram backes and (tries to) make the boolean > operation of the result bitmaps. > > I have some comments on the implement and I also try to find the > solution for them. > > > 1. The flow above looks doing very similiar thins repeatedly. > > 2. I believe what the current code does can be simplified. > > 3. As you mentioned in comments, some additional infrastructure > needed. > > After all, I think what we should do after this are as follows, > as the first step. > > - Add the means to judge the selectivity operator(?) by other > than oprrest of the op of ExprOp. (You missed neqsel already) > > I suppose one solution for this is adding oprmvstats taking > 'm', 'h' and 'f' and their combinations. Or for the > convenience, it would be a fixed-length string like this. > > oprname | oprmvstats > = | 'mhf' > <> | 'mhf' > < | 'mh-' > > | 'mh-' > >= | 'mh-' > <= | 'mh-' > > This would make the code in clause_is_mv_compatible like this. > > > oprmvstats = get_mvstatsset(expr->opno); /* bitwise representation */ > > if (oprmvstats & types) > > { > > *attnums = bms_add_member(*attnums, var->varattno); > > return true; > > } > > return false; > > - Current design just manage to work but it is too complicated > and hardly have affinity with the existing estimation > framework. I proposed separation of finding stats phase and > calculation phase, but I would like to propose transforming > RestrictInfo(and finding mvstat) phase and running the > transformed RestrictInfo phase after looking close to the > patch. > > I think transforing RestrictInfo makes the situnation > better. Since it nedds different information, maybe it is > better to have new struct, say, RestrictInfoForEstimate > (boo!). Then provide mvstatssel() to use in the new struct. > The rough looking of the code would be like below. > > clauselist_selectivity() > { > ... > RestrictInfoForEstmate *esclause = > transformClauseListForEstimation(root, clauses, varRelid); > ... > > return clause_selectivity(esclause): > } > > clause_selectivity(RestrictInfoForEstmate *esclause) > { > if (IsA(clause, RestrictInfo))... > if (IsA(clause, RestrictInfoForEstimate)) > { > RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause; > if (ecl->selfunc) > { > sx = ecl->selfunc(root, ecl); > } > } > if (IsA(clause, Var))... > } > > > transformClauseListForEstimation(...) > { > ... > > relid = collect_mvstats_info(root, clause, &attlist); > if (!relid) return; > if (get_mvstats_hook) > mvstats = (*get_mvstats_hoook) (root, relid, attset); > else > mvstats = find_mv_stats(root, relid, attset)) > } > ... > > > I've pushed this to github [1] but I need to do some additional > > fixes. I also had to remove some optimizations while fixing this, and > > will have to reimplement those. > > > > That's not to say that the handling of OR-clauses is perfectly > > correct. After looking at clauselist_selectivity_or(), I believe it's > > a bit broken and will need a bunch of fixes, as explained in the > > FIXMEs I pushed to github. > > > > [1] https://github.com/tvondra/postgres/tree/mvstats > > I don't see whether it is doable or not, and I suppose you're > unwilling to change the big picture, so I will consider the idea > and will show you the result, if it turns out to be possible and > promising. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: