Re: planner chooses incremental but not the best one - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: planner chooses incremental but not the best one
Date
Msg-id 575d0bca-d235-4bb1-8901-3866e5f3c250@enterprisedb.com
Whole thread Raw
In response to Re: planner chooses incremental but not the best one  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-hackers

On 2/15/24 13:45, Andrei Lepikhov wrote:
> On 15/2/2024 18:10, Tomas Vondra wrote:
>>
>>
>> On 2/15/24 07:50, Andrei Lepikhov wrote:
>>> On 18/12/2023 19:53, Tomas Vondra wrote:
>>>> On 12/18/23 11:40, Richard Guo wrote:
>>>> The challenge is where to get usable information about correlation
>>>> between columns. I only have a couple very rought ideas of what might
>>>> try. For example, if we have multi-column ndistinct statistics, we
>>>> might
>>>> look at ndistinct(b,c) and ndistinct(b,c,d) and deduce something from
>>>>
>>>>       ndistinct(b,c,d) / ndistinct(b,c)
>>>>
>>>> If we know how many distinct values we have for the predicate
>>>> column, we
>>>> could then estimate the number of groups. I mean, we know that for the
>>>> restriction "WHERE b = 3" we only have 1 distinct value, so we could
>>>> estimate the number of groups as
>>>>
>>>>       1 * ndistinct(b,c)
>>> Did you mean here ndistinct(c,d) and the formula:
>>> ndistinct(b,c,d) / ndistinct(c,d) ?
>>
>> Yes, I think that's probably a more correct ... Essentially, the idea is
>> to estimate the change in number of distinct groups after adding a
>> column (or restricting it in some way).
> Thanks, I got it. I just think how to implement such techniques with
> extensions just to test the idea in action. In the case of GROUP-BY we
> can use path hook, of course. But what if to invent a hook on clauselist
> estimation?

Maybe.

I have thought about introducing such hook to alter estimation of
clauses, so I'm not opposed to it. Ofc, it depends on where would the
hook be, what would it be allowed to do etc. And as it doesn't exist
yet, it'd be more a "local" improvement to separate the changes into an
extension.

>>> Do you implicitly bear in mind here the necessity of tracking clauses
>>> that were applied to the data up to the moment of grouping?
>>>
>>
>> I don't recall what exactly I considered two months ago when writing the
>> message, but I don't see why we would need to track that beyond what we
>> already have. Shouldn't it be enough for the grouping to simply inspect
>> the conditions on the lower levels?
> Yes, exactly. I've thought about looking into baserestrictinfos and, if
> group-by references a subquery targetlist, into subqueries too.
> 

True. Something like that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index prefetching
Next
From: Bertrand Drouvot
Date:
Subject: Re: Synchronizing slots from primary to standby