Re: multivariate statistics v14 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: multivariate statistics v14
Date
Msg-id 89341a68-4729-ad28-bb39-cef31849aedb@2ndquadrant.com
Whole thread Raw
In response to Re: multivariate statistics v14  (Tatsuo Ishii <ishii@postgresql.org>)
Responses Re: multivariate statistics v14  (Tatsuo Ishii <ishii@postgresql.org>)
List pgsql-hackers
Hello,

On 03/22/2016 09:13 AM, Tatsuo Ishii wrote:
>>> Do you have any other missing parts in this work? I am asking
>>> because I wonder if you want to push this into 9.6 or rather 9.7.
>>
>> I think the first few parts of the patch series, namely:
>>
>>   * shared infrastructure (0002)
>>   * functional dependencies (0003)
>>   * MCV lists (0004)
>>   * histograms (0005)
>>
>> might make it into 9.6. I believe the code for building and storing
>> the different kinds of stats is reasonably solid. What probably needs
>> more thorough review are the changes in clauselist_selectivity(), but
>> the code in these parts is reasonably simple as it only supports using
>> a single multi-variate statistics per relation.
>>
>> The part (0006) that allows using multiple statistics (i.e. selects
>> which of the available stats to use and in what order) is probably the
>> most complex part of the whole patch, and I myself do have some
>> questions about some aspects of it. I don't think this part might get
>> into 9.6 at this point (although it'd be nice if we managed to do
>> that).
>
> Hum. So without 0006 or beyond, there's not much benefit for the
> PostgreSQL users, and you are not too confident about 0006 or
> beyond. Then I would think it is a little bit hard to justify in
> putting 000[2-5] into 9.6. I really like this feature and would like
> to see in PostgreSQL someday, but I'm not sure if we should put the
> patches (0002-0005) into PostgreSQL now. Please let me know if there's
> some reaons we should put the patches into PostgreSQL now.

I don't think so. While being able to combine multiple statistics is 
certainly useful, I'm convinced that the initial patched add enough 
value on their own, even if the 0006 patch gets committed later.

A lot of queries will be just fine with the "single multivariate 
statistics" limitation, either because it's using less than 8 columns, 
or because only 8 columns are actually correlated. (FWIW the 8 column 
limit is mostly arbitrary, it may get increased if needed.)

I haven't really mentioned the aspects of 0006 that I think need more 
discussion, but it's mostly about the question whether combining the 
statistics by using the overlapping clauses as "conditions" is the right 
thing to do (or whether a more expensive approach is needed). None of 
that however invalidates the preceding patches.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: multivariate statistics v14
Next
From: Fabien COELHO
Date:
Subject: Re: checkpointer continuous flushing