On 25/12/2024 16:36, Richard Guo wrote:
> On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> 2. It is ok for Vars. But what about expressions? We use equal() in
>>> distinct, MCV and dependencies modules. Do we need to remove nulls
>>> before using extended statistics as a general rule?
>>
>> AFAIU, the expressions in extended statistics are not decorated with
>> any nullingrels bits, are they?
>
> I've just realized that there may be other places with similar issues,
> not just in estimate_num_groups. For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah,
it seems that all types of statistics may be lost because of varnullingrels.
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
>
> It seems that we need to strip out the nullingrels bits from
> expressions before matching them to extended statistics or
> expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have
already discussed some approaches to calculate NULLs generated by RHS of
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch,
but we should remember this example - it is not apparent to me how to
estimate a group of clauses in the case when part of Vars has
varnullingrels and part of them - doesn't.
Also, I think this is a good example that an explain analyse summary
could have some sort of extended statistics usage report. It can help to
clearly identify cases when extended statistics don't work, but should.
- something like already implemented in SQL Server.
--
regards, Andrei Lepikhov