Re: multivariate statistics / patch v7 - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: multivariate statistics / patch v7 |
Date | |
Msg-id | 20150703.143034.132752108.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: multivariate statistics / patch v7 (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: multivariate statistics / patch v7
|
List | pgsql-hackers |
Hello, I started to work on this patch. > attached is v7 of the multivariate stats patch. The main improvement > is major refactoring of the clausesel.c portion - splitting the > awfully long spaghetti-style functions into smaller pieces, making it > much more understandable etc. Thank you, it looks clearer. I have some comment for the brief look at this. This patchset is relatively large so I will comment on "per-notice" basis.. which means I'll send comment before examining the entire of this patchset. Sorry in advance for the desultory comments. ======= General comments: - You included unnecessary stuffs such like regression.diffs in these patches. - Now OID 3307 is used by pg_stat_file. I moved pg_mv_stats_dependencies_info/show to 3311/3312. - Single-variate stats have a mechanism to inject arbitrary values as statistics, that is, get_relation_stats_hook and thesimilar stuffs. I want the similar mechanism for multivariate statistics, too. 0001: - I also don't think it is right thing for expression_tree_walker to recognize RestrictInfo since it is not a part of expression. 0003: - In clauselist_selectivity, find_stats is uselessly called for single clause. This should be called after the clauselistfound to consist more than one clause. - Searching vars to be compared with mv-stat columns which find_stats does should stop at disjunctions. But this patch doesn'tbehave so and it should be an unwanted behavior. The following steps shows that. =====# CREATE TABLE t1 (a int, b int, c int);=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0, 9999) a);=#EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3; Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)=# ALTERTABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);=# ANALZYE t1;=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 ORc = 3; Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12) ====Rows changed unwantedly. It seems not so simple thing as your code assumes. > I do assume some of those pieces are unnecessary because there already > is a helper function with the same purpose (but I'm not aware of > that). But IMHO this piece of code begins to look reasonable > (especially when compared to the previous state). Year, such kind of work should be done later:p This patch is not-so-invasive so as to make it undoable. > The other major improvement it review of the comments (including > FIXMEs and TODOs), and removal of the obsolete / misplaced ones. And > there was plenty of those ... > > These changes made this version ~20k smaller than v6. > > The patch also rebases to current master, which I assume shall be > quite stable - so hopefully no more duplicate OIDs for a while. > > There are 6 files attached, but only 0002-0006 are actually part of > the multivariate statistics patch itself. The first part makes it > possible to use pull_varnos() with expression trees containing > RestrictInfo nodes, but maybe this is not the right way to fix this > (there's another thread where this was discussed). As mentioned above, checking if mv stats can be applied would be more complex matter than now you are assuming. I also will consider that. > Also, the regression tests testing plan choice with multivariate stats > (e.g. that a bitmap index scan is chosen instead of index scan) fail > from time to time. I suppose this happens because the invalidation > after ANALYZE is not processed before executing the query, so the > optimizer does not see the stats, or something like that. I saw that occurs, but have no idea how it occurs so far.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: