Re: Additional improvements to extended statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Additional improvements to extended statistics
Date
Msg-id 20200309000157.ig5tcrynvaqu4ixd@development
Whole thread Raw
In response to Re: Additional improvements to extended statistics  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: Additional improvements to extended statistics
Re: Additional improvements to extended statistics
List pgsql-hackers
On Sun, Mar 08, 2020 at 07:17:10PM +0000, Dean Rasheed wrote:
>On Fri, 6 Mar 2020 at 12:58, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> Here is a rebased version of this patch series. I've polished the first
>> two parts a bit - estimation of OR clauses and (Var op Var) clauses.
>>
>
>Hi,
>
>I've been looking over the first patch (OR list support). It mostly
>looks reasonable to me, except there's a problem with the way
>statext_mcv_clauselist_selectivity() combines multiple stat_sel values
>into the final result -- in the OR case, it needs to start with sel =
>0, and then apply the OR formula to factor in each new estimate. I.e.,
>this isn't right for an OR list:
>
>        /* Factor the estimate from this MCV to the oveall estimate. */
>        sel *= stat_sel;
>
>(Oh and there's a typo in that comment: s/oveall/overall/).
>
>For example, with the regression test data, this isn't estimated well:
>
>  SELECT * FROM mcv_lists_multi WHERE a = 0 OR b = 0 OR c = 0 OR d = 0;
>
>Similarly, if no extended stats can be applied it needs to return 0
>not 1, for example this query on the test data:
>
>  SELECT * FROM mcv_lists WHERE a = 1 OR a = 2 OR d IS NOT NULL;
>

Ah, right. Thanks for noticing this. Attaches is an updated patch series
with parts 0002 and 0003 adding tests demonstrating the issue and then
fixing it (both shall be merged to 0001).

>It might also be worth adding a couple more regression test cases like these.

Agreed, 0002 adds a couple of relevant tests.

Incidentally, I've been working on improving test coverage for extended
stats over the past few days (it has ~80% lines covered, which is not
bad nor great). I haven't submitted that to hackers yet, because it's
mostly mechanical and it's would interfere with the two existing threads
about extended stats ...

Speaking of which, would you take a look at [1]? I think supporting SAOP
is fine, but I wonder if you agree with my conclusion we can't really
support inclusion @> as explained in [2].

[1] https://www.postgresql.org/message-id/flat/13902317.Eha0YfKkKy@pierred-pdoc
[2] https://www.postgresql.org/message-id/20200202184134.swoqkqlqorqolrqv%40development

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Jesse Zhang
Date:
Subject: Re: Use compiler intrinsics for bit ops in hash
Next
From: Tomas Vondra
Date:
Subject: Re: Additional improvements to extended statistics