Re: using extended statistics to improve join estimates - Mailing list pgsql-hackers

From Andy Fan
Subject Re: using extended statistics to improve join estimates
Date
Msg-id 87a5khi8as.fsf@163.com
Whole thread Raw
In response to Re: using extended statistics to improve join estimates  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: using extended statistics to improve join estimates
List pgsql-hackers
Andrei Lepikhov <a.lepikhov@postgrespro.ru> writes:

> On 20/5/2024 15:52, Andy Fan wrote:
>> Hi Andrei,
>> 
>>> On 4/3/24 01:22, Tomas Vondra wrote:
>>>> Cool! There's obviously no chance to get this into v18, and I have stuff
>>>> to do in this CF. But I'll take a look after that.
>>> I'm looking at your patch now - an excellent start to an eagerly awaited
>>> feature!
>>> A couple of questions:
>>> 1. I didn't find the implementation of strategy 'c' - estimation by the
>>> number of distinct values. Do you forget it?
>> What do you mean the "strategy 'c'"?
> As described in 0001-* patch:
> * c) No extended stats with MCV. If there are multiple join clauses,
> * we can try using ndistinct coefficients and do what eqjoinsel does.

OK, I didn't pay enough attention to this comment before. and yes, I get
the same conclusion as you -  there is no implementation of this.

and if so, I think we should remove the comments and do the
implementation in the next patch. 

>>> 2. Can we add a clauselist selectivity hook into the core (something
>>> similar the code in attachment)? It can allow the development and
>>> testing of multicolumn join estimations without patching the core.
>> The idea LGTM. But do you want
>> +    if (clauselist_selectivity_hook)
>> +        s1 = clauselist_selectivity_hook(root, clauses, varRelid, jointype,
>> +
>> rather than
>> +    if (clauselist_selectivity_hook)
>> +        *return* clauselist_selectivity_hook(root, clauses, ..)
> Of course - library may estimate not all the clauses - it is a reason,
> why I added input/output parameter 'estimatedclauses' by analogy with
> statext_clauselist_selectivity.

OK.

Do you think the hook proposal is closely connected with the current
topic? IIUC it's seems not. So a dedicated thread to explain the problem
to slove and the proposal and the follwing discussion should be helpful
for both topics. I'm just worried about mixing the two in one thread
would make things complexer unnecessarily.

-- 
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: First draft of PG 17 release notes
Next
From: David Rowley
Date:
Subject: Re: Speed up JSON escape processing with SIMD plus other optimisations