Home > mailing lists

Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: PoC/WIP: Extended statistics on expressions
Date	March 17, 2021 22:07:22
Msg-id	f4dac079-6bc4-ccc0-0fd8-3e2e2da28d92@enterprisedb.com Whole thread Raw
In response to	Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses	Re: PoC/WIP: Extended statistics on expressions
List	pgsql-hackers

Tree view


On 3/17/21 7:54 PM, Dean Rasheed wrote:
> On Wed, 17 Mar 2021 at 17:26, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>>
>> My concern is that the current behavior (where we prefer expression
>> stats over multi-column stats to some extent) works fine as long as the
>> parts are independent, but once there's dependency it's probably more
>> likely to produce underestimates. I think underestimates for grouping
>> estimates were a risk in the past, so let's not make that worse.
>>
> 
> I'm not sure the current behaviour really is preferring expression
> stats over multi-column stats. In this example, where we're grouping
> by (a+b), (c+d) and have stats on [(a+b),c] and (c+d), neither of
> those multi-column stats actually match more than one
> column/expression. If anything, I'd go the other way and say that it
> was wrong to use the [(a+b),c] stats in the first case, where they
> were the only stats available, since those stats aren't really
> applicable to (c+d), which probably ought to be treated as
> independent. IOW, it might have been better to estimate the first case
> as
> 
>      ndistinct((a+b)) * ndistinct(c) * ndistinct(d)
> 
> and the second case as
> 
>      ndistinct((a+b)) * ndistinct((c+d))
> 

OK. I might be confused, but isn't that what the algorithm currently
does? Or am I just confused about what the first/second case refers to?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: John Naylor
Date: 17 March 2021, 21:59:38
Subject: Re: WIP: BRIN multi-range indexes

From: Tomas Vondra
Date: 17 March 2021, 22:16:14
Subject: Re: WIP: BRIN multi-range indexes

Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers

Previous

Next