Re: [HACKERS] multivariate statistics (v24) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] multivariate statistics (v24)
Date
Msg-id 20170302.154237.217300143.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] multivariate statistics (v24)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] multivariate statistics (v25)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hello,

At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in
<a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com>
> OK,
> 
> attached is v24 of the patch series, addressing most of the reported
> issues and comments (at least I believe so). The main changes are:

Unfortunately, 0002 conflicts with the current master
(4461a9b). Could you rebase them or tell us the commit where this
patches stand on?

I only saw the patch files but have some comments.

> 1) I've mostly abandoned the "multivariate" name in favor of
> "extended", particularly in places referring to stats stored in the
> pg_statistic_ext in general. "Multivariate" is now used only in places
> talking about particular types (e.g. multivariate histograms).
> 
> The "extended" name is more widely used for this type of statistics,
> and the assumption is that we'll also add other (non-multivariate)
> types of statistics - e.g. statistics on custom expressions, or some
> for of join statistics.

In 0005, and 

@@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root,     * If there are no such stats or not enough
attributes,don't waste time     * simply skip to estimation using the plain per-column stats.     */
 
+    if (has_stats(stats, STATS_TYPE_MCV) &&
...
+            /* compute the multivariate stats */
+            s1 *= clauselist_ext_selectivity(root, mvclauses, stat);
====
@@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid,}/*
+ * estimate selectivity of clauses using multivariate statistic

These comment is left unchanged?  or on purpose? 0007 adds very
similar texts.

> 2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and
> pg_mv_stats view renamed to pg_stats_ext).

FWIW, "extended statistic" would be abbreviated as
"ext_statistic" or "extended_stats". Why have you exchanged the
words?

> 3) The structure of pg_statistic_ext was changed as proposed by
> Alvaro, i.e. the boolean flags were removed and instead we have just a
> single "char[]" column with list of enabled statistics.
> 
> 4) I also got rid of the "mv" part in most variable/function/constant
> names, replacing it by "ext" or something similar. Also mvstats.h got
> renamed to stats.h.
> 
> 5) Moved the files from src/backend/utils/mvstats to
> backend/statistics.
> 
> 6) Fixed the n_choose_k() overflow issues by using the algorithm
> proposed by Dean. Also, use the simple formula for num_combinations().
> 
> 7) I've tweaked data types for a few struct members (in stats.h). I've
> kept most of the uint32 fields at the top level though, because int16
> might not be large enough for large statistics and the overhead is
> minimal (compared to the space needed e.g. for histogram buckets).

Some formulated proof or boundary value test cases might be
needed (to prevent future trouble). Or any defined behavior on
overflow of them might be enough. I belive all (or most) of
overflow-able data has such behavior.

> The renames/changes were quite widespread, but I've done my best to
> fix all the comments and various other places.
> 
> regards

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] PATCH: two slab-like memory allocators
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] SCRAM authentication, take three