Home > mailing lists

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Use extended statistics to estimate (Var op Var) clauses
Date	August 20, 2021 21:20:50
Msg-id	eb3da9e6-210a-3b4c-b185-690dd487e7df@enterprisedb.com Whole thread Raw
In response to	Re: Use extended statistics to estimate (Var op Var) clauses (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses	Re: Use extended statistics to estimate (Var op Var) clauses Re: Use extended statistics to estimate (Var op Var) clauses
List	pgsql-hackers

Tree view

On 8/18/21 3:16 PM, Mark Dilger wrote:
> 
> 
>> On Aug 18, 2021, at 3:43 AM, Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>> 
>> I've pushed everything (generator and results) to this github repo
> 
> Thanks for the link.  I took a very brief look.  Perhaps we can
> combine efforts.  I need to make progress on several other patches
> first, but hope to get back to this.
> 

Sure - it'd be great to combine efforts. That's why I posted my scripts 
& results. I understand there's plenty other work for both of us, so 
take your time - no rush.

After looking at this for a while, it's clear the main issue is handling 
of clauses referencing the same Var twice, like for example (a = a) or 
(a < a). But it's not clear to me if this is something worth fixing, or 
if extended statistics is the right place to do it.

If those clauses are worth the effort, why not to handle them better 
even without extended statistics? We can easily evaluate these clauses 
on per-column MCV, because they only reference a single Var.

It'd be rather strange if for example

     select * from t where (a < a)

is mis-estimated simply because it can't use extended statistics 
(there's just a single Var, so we won't consider extended stats), while

     select * from t where (a < a) and b = 1

suddenly gets much better thanks to extended stats on (a,b), even when 
(a,b) are perfectly independent.

So I think we better make eqsel/ineqsel smarter about estimating those 
clauses, assuming we consider them important enough.

I think we can either reject the patch, which would mean we don't 
consider (Var op Var) clauses to be common/important enough. Or we need 
to improve the existing selectivity functions (even those without 
extended statistics) to handle those clauses in a smarter way. Otherwise 
there'd be strange/surprising inconsistencies.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 20 August 2021, 21:19:56
Subject: Minor pg_amcheck fixes spotted while reading code

From: Mark Dilger
Date: 20 August 2021, 21:36:56
Subject: Re: Use extended statistics to estimate (Var op Var) clauses

Re: Use extended statistics to estimate (Var op Var) clauses - Mailing list pgsql-hackers

Previous

Next