Re: ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: ERROR: corrupt MVNDistinct entry
Date
Msg-id 79660add-4238-4616-903e-ba1607f79f6e@gmail.com
Whole thread Raw
In response to Re: ERROR: corrupt MVNDistinct entry  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
On 25/12/2024 16:36, Richard Guo wrote:
> On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> 2. It is ok for Vars. But what about expressions? We use equal() in
>>> distinct, MCV and dependencies modules. Do we need to remove nulls
>>> before using extended statistics as a general rule?
>>
>> AFAIU, the expressions in extended statistics are not decorated with
>> any nullingrels bits, are they?
> 
> I've just realized that there may be other places with similar issues,
> not just in estimate_num_groups.  For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah, 
it seems that all types of statistics may be lost because of varnullingrels.

> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
> 
> It seems that we need to strip out the nullingrels bits from
> expressions before matching them to extended statistics or
> expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have 
already discussed some approaches to calculate NULLs generated by RHS of 
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch, 
but we should remember this example - it is not apparent to me how to 
estimate a group of clauses in the case when part of Vars has 
varnullingrels and part of them - doesn't.

Also, I think this is a good example that an explain analyse summary 
could have some sort of extended statistics usage report. It can help to 
clearly identify cases when extended statistics don't work, but should. 
- something like already implemented in SQL Server.

-- 
regards, Andrei Lepikhov



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: adjust_limit_rows_costs algorithm
Next
From: "Vitaly Davydov"
Date:
Subject: Re: An improvement of ProcessTwoPhaseBuffer logic