Re: multivariate statistics (v25) - Mailing list pgsql-hackers

From David Rowley
Subject Re: multivariate statistics (v25)
Date
Msg-id CAKJS1f-fqo97jasVF57yfVyG+=T5JLce5ynCi1vvezXxX=wgoA@mail.gmail.com
Whole thread Raw
In response to Re: multivariate statistics (v25)  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: multivariate statistics (v25)  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
On 25 March 2017 at 07:35, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
As I said in another thread, I pushed parts 0002,0003,0004.  Tomas said
he would try to rebase patches 0001,0005,0006 on top of what was
committed.  My intention is to give that one a look as soon as it is
available.  So we will have n-distinct and functional dependencies in
PG10.  It sounds unlikely that we will get MCVs and histograms in, since
they're each a lot of code.

I've been working on the MV functional dependencies part of the patch to polish it up a bit. Tomas has been busy with a few other duties.

I've made some changes around how clauselist_selectivity() determines if it should try to apply any extended stats. The solution I came up with was to add two parameters to this function, one for the RelOptInfo in question, and one a bool to control if we should try to apply any extended stats. For clauselist_selectivity() usage involving join rels we just pass the rel as NULL, that way we can skip all the extended stats stuff with very low overhead. When we actually have a base relation to pass along we can do so, along with a true tryextstats value to have the function attempt to use any extended stats to assist with the selectivity estimation.

When adding these two parameters I had 2nd thoughts that the "tryextstats" was required at all. We could just have this controlled by if the rel is a base rel of kind RTE_RELATION. I ended up having to pass these parameters further, down to clauselist_selectivity's singleton couterpart, clause_selectivity(). This was due to clause_selectivity() calling clauselist_selectivity() for some clause types. I'm not entirely sure if this is actually required, but I can't see any reason for it to cause problems.

I've also attempted to simplify some of the logic within clauselist_selectivity and some other parts of clausesel.c to remove some unneeded code and make it a bit more efficient. For example, we no longer count the attributes in the clause list before calling a similar function to retrieve the actual attnums. This is now done as a single step.

I've not yet quite gotten as far as I'd like with this. I'd quite like to see clauselist_ext_split() gone, and instead we could build up a bitmapset of clause list indexes to ignore when applying the selectivity of clauses that couldn't use any extended stats. I'm planning on having a bit more of a look at this tomorrow.

The attached patch should apply to master as of f90d23d0c51895e0d7db7910538e85d3d38691f0.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Next
From: Pavan Deolasee
Date:
Subject: Re: Patch: Write Amplification Reduction Method (WARM)