Re: multivariate statistics (v25) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: multivariate statistics (v25)
Date
Msg-id 20170331.171821.165659478.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: multivariate statistics (v25)  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: multivariate statistics (v25)  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Hello,

At Fri, 31 Mar 2017 03:03:06 +1300, David Rowley <david.rowley@2ndquadrant.com> wrote in
<CAKJS1f-fqo97jasVF57yfVyG+=T5JLce5ynCi1vvezXxX=wgoA@mail.gmail.com>
> On 25 March 2017 at 07:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> 
> > As I said in another thread, I pushed parts 0002,0003,0004.  Tomas said
> > he would try to rebase patches 0001,0005,0006 on top of what was
> > committed.  My intention is to give that one a look as soon as it is
> > available.  So we will have n-distinct and functional dependencies in
> > PG10.  It sounds unlikely that we will get MCVs and histograms in, since
> > they're each a lot of code.
> >
> 
> I've been working on the MV functional dependencies part of the patch to
> polish it up a bit. Tomas has been busy with a few other duties.
> 
> I've made some changes around how clauselist_selectivity() determines if it
> should try to apply any extended stats. The solution I came up with was to
> add two parameters to this function, one for the RelOptInfo in question,
> and one a bool to control if we should try to apply any extended stats.
> For clauselist_selectivity() usage involving join rels we just pass the rel
> as NULL, that way we can skip all the extended stats stuff with very low
> overhead. When we actually have a base relation to pass along we can do so,
> along with a true tryextstats value to have the function attempt to use any
> extended stats to assist with the selectivity estimation.
> 
> When adding these two parameters I had 2nd thoughts that the "tryextstats"
> was required at all. We could just have this controlled by if the rel is a
> base rel of kind RTE_RELATION. I ended up having to pass these parameters
> further, down to clauselist_selectivity's singleton couterpart,
> clause_selectivity(). This was due to clause_selectivity() calling
> clauselist_selectivity() for some clause types. I'm not entirely sure if
> this is actually required, but I can't see any reason for it to cause
> problems.

I understand that the reason for tryextstats is that the two are
perfectly correlating but caluse_selectivity requires the
RelOptInfo anyway. Some comment about that may be reuiqred in the
function comment.

> I've also attempted to simplify some of the logic within
> clauselist_selectivity and some other parts of clausesel.c to remove some
> unneeded code and make it a bit more efficient. For example, we no longer
> count the attributes in the clause list before calling a similar function
> to retrieve the actual attnums. This is now done as a single step.
> 
> I've not yet quite gotten as far as I'd like with this. I'd quite like to
> see clauselist_ext_split() gone, and instead we could build up a bitmapset
> of clause list indexes to ignore when applying the selectivity of clauses
> that couldn't use any extended stats. I'm planning on having a bit more of
> a look at this tomorrow.
> 
> The attached patch should apply to master as
> of f90d23d0c51895e0d7db7910538e85d3d38691f0.

FWIW, I tries this. This cleanly applied on it but make ends with
the following error.

$ make -s
Writing postgres.bki
Writing schemapg.h
Writing postgres.description
Writing postgres.shdescription
Writing fmgroids.h
Writing fmgrprotos.h
Writing fmgrtab.c
make[3]: *** No rule to make target `dependencies.o', needed by `objfiles.txt'.  Stop.
make[2]: *** [statistics-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2


Some random comments by just looking on the patch:

======
The name of the function "collect_ext_attnums", and
"clause_is_ext_compatible" seems odd since "ext" doesn't seem to
be a part of "extended statistics". Some other names looks the
same, too.

Something like "collect_e(xt)stat_compatible_attnums" and
"clause_is_e(xt)stat_compatible" seem better to me.

======
The following comment seems something wrong.

+ * When applying functional dependencies, we start with the strongest ones
+ * strongest dependencies. That is, we select the dependency that:

======
dependency_is_fully_matched() is not found. Maybe some other
patches are assumed?

======
+        /* see if it actually has the right */
+        ok = (NumRelids((Node *) expr) == 1) &&
+            (is_pseudo_constant_clause(lsecond(expr->args)) ||
+             (varonleft = false,
+              is_pseudo_constant_clause(linitial(expr->args))));
+
+        /* unsupported structure (two variables or so) */
+        if (!ok)
+            return true;

Ok is used only here. I don't think seeming-expressions with side
effect is not good idea here.

======
+        switch (get_oprrest(expr->opno))
+        {
+            case F_EQSEL:
+
+                /* equality conditions are compatible with all statistics */
+                break;
+
+            default:
+
+                /* unknown estimator */
+                return true;
+        }

This seems somewhat stupid..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: REINDEX CONCURRENTLY 2.0
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Partitioned tables and relfilenode