Home > mailing lists

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Use extended statistics to estimate (Var op Var) clauses
Date	August 12, 2021 01:00:12
Msg-id	6b060ee2-9d31-3bd6-8096-81fcc5eddbe3@enterprisedb.com Whole thread Raw
In response to	Re: Use extended statistics to estimate (Var op Var) clauses (Mark Dilger <mark.dilger@enterprisedb.com>)
List	pgsql-hackers

Tree view

On 8/11/21 5:17 PM, Mark Dilger wrote:
> 
> 
>> On Aug 11, 2021, at 7:51 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
>>
>> I'll go test random data designed to have mcv lists of significance....
> 
> Done.  The data for column_i is set to floor(random()^i*20). 
> column_1 therefore is evenly distributed between 0..19, with
> successive columns weighted more towards smaller values.
>
> This still gives (marginally) worse results than the original test I
> posted, but better than the completely random data from the last post.
> After the patch, 72294 estimates got better and 30654 got worse.  The
> biggest losers from this data set are:
> 
> better:0, worse:31:  A >= B or A = A or not A = A
> better:0, worse:31:  A >= B or A = A
> better:0, worse:31:  A >= B or not A <> A
> better:0, worse:31:  A >= A or A = B or not B = A
> better:0, worse:31:  A >= B and not A < A or A = A
> better:0, worse:31:  A = A or not A > B or B <> A
> better:0, worse:31:  A >= B or not A <> A or not A >= A
> better:0, worse:32:  B < A and B > C and not C < B                    <----
> better:1, worse:65:  A <> C and A <= B                                  <----
> better:0, worse:33:  B <> A or B >= B
> better:0, worse:33:  B <> A or B <= B
> better:0, worse:33:  B <= A or B = B or not B > B
> better:0, worse:33:  B <> A or not B >= B or not B < B
> better:0, worse:33:  B = A or not B > B or B = B
> better:0, worse:44:  A = B or not A > A or A = A
> better:0, worse:44:  A <> B or A <= A
> better:0, worse:44:  A <> B or not A >= A or not A < A
> better:0, worse:44:  A <= B or A = A or not A > A
> better:0, worse:44:  A <> B or A >= A
>
> Of which, a few do not contain columns compared against themselves,
> marked with <---- above.
>
> I don't really know what to make of these results.  It doesn't 
> bother me that any particular estimate gets worse after the patch.
> That's just the nature of estimating.  But it does bother me a bit
> that some types of estimates consistently get worse.  We should
> either show that my analysis is wrong about that, or find a way to
> address it to avoid performance regressions.  If I'm right that there
> are whole classes of estimates that are made consistently worse, then
> it stands to reason some users will have those data distributions and
> queries, and could easily notice.

I'm not quite sure that's really a problem. Extended statistics are 
meant for correlated columns, and it's mostly expected the estimates may 
be a bit worse for random / independent data. The idea is mostly that 
statistics will be created only for correlated columns, in which case it 
should improve the estimates. I'd be way more concerned if you observed 
consistently worse estimates on such data set.

Of course, there may be errors - the incorrect handling of (A op A) is 
an example of such issue, probably.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Michael Meskes
Date: 11 August 2021, 23:41:59
Subject: Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE

From: Melanie Plageman
Date: 12 August 2021, 01:00:40
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

Previous

Next