Thread: ERROR: corrupt MVNDistinct entry

ERROR: corrupt MVNDistinct entry

From
Richard Guo
Date:
I ran into this error in estimate_multivariate_ndistinct, and it can
be reproduced with the query below.

create table t (a int, b int);
insert into t select 1, 1;
create statistics s (ndistinct) on a, b from t;
analyze;

explain select 1 from t t1
    left join (select a c1, coalesce(a) c2 from t t2) s on true
group by s.c1, s.c2;
ERROR:  corrupt MVNDistinct entry

And the first bad commit is:

2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit
commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jan 30 13:16:20 2023 -0500

    Make Vars be outer-join-aware.

So in this query, there are two grouping expressions: s.c1 is Var t2.a
with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being
{3}, and its contained expression is Var t2.a with empty nullingrels.

This eventually leads to estimate_num_groups creating two separate
GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with
empty nullingrels.  As a result, estimate_multivariate_ndistinct
incorrectly assumes there are two matching expressions.  When it later
fails to find the exact match for the combination, it mistakenly
concludes that there is a corrupt MVNDistinct entry.

It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.

So I wonder if we can fix this issue by removing the nullingrels
within the grouping expressions first in estimate_num_groups. Such as:

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
@@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
    if (groupExprs == NIL || (pgset && *pgset == NIL))
        return 1.0;

+   groupExprs = (List *) remove_nulling_relids((Node *) groupExprs,
+                                               root->outer_join_rels,
+                                               NULL);
+
    /*
     * Count groups derived from boolean grouping expressions.  For other
     * expressions, find the unique Vars used, treating an expression as a Var

Any thoughts?

Thanks
Richard



Re: ERROR: corrupt MVNDistinct entry

From
Andrei Lepikhov
Date:
On 12/24/24 15:00, Richard Guo wrote:
> Any thoughts?
I have a couple of notes.
1. The nulling_relids provides us sensible information about possible 
nulls inside the input. We are not using it to estimate the number of 
such nulls for now. Does Your idea consist of obtaining 'clear' 
statistics and reusing nulling_relids knowledge somewhere later?
2. It is ok for Vars. But what about expressions? We use equal() in 
distinct, MCV and dependencies modules. Do we need to remove nulls 
before using extended statistics as a general rule?

-- 
regards, Andrei Lepikhov



Re: ERROR: corrupt MVNDistinct entry

From
Richard Guo
Date:
On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> I have a couple of notes.
> 1. The nulling_relids provides us sensible information about possible
> nulls inside the input. We are not using it to estimate the number of
> such nulls for now. Does Your idea consist of obtaining 'clear'
> statistics and reusing nulling_relids knowledge somewhere later?

Are you referring to the nullfrac estimates?  A RelOptInfo's
nulling_relids records all above outer joins that can null this rel.
However, I cannot see how it helps with nullfrac.

> 2. It is ok for Vars. But what about expressions? We use equal() in
> distinct, MCV and dependencies modules. Do we need to remove nulls
> before using extended statistics as a general rule?

AFAIU, the expressions in extended statistics are not decorated with
any nullingrels bits, are they?

Thanks
Richard



Re: ERROR: corrupt MVNDistinct entry

From
Richard Guo
Date:
On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > 2. It is ok for Vars. But what about expressions? We use equal() in
> > distinct, MCV and dependencies modules. Do we need to remove nulls
> > before using extended statistics as a general rule?
>
> AFAIU, the expressions in extended statistics are not decorated with
> any nullingrels bits, are they?

I've just realized that there may be other places with similar issues,
not just in estimate_num_groups.  For instance,

-- after v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..15032.50 rows=5000 width=16)
   Filter: ((t2.a + t2.b) IS NULL)
   ->  Seq Scan on t t1  (cost=0.00..15.00 rows=1000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on t t2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

-- before v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..15032.50 rows=1 width=16)
   Filter: ((t2.a + t2.b) IS NULL)
   ->  Seq Scan on t t1  (cost=0.00..15.00 rows=1000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on t t2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.

Thanks
Richard



Re: ERROR: corrupt MVNDistinct entry

From
Andrei Lepikhov
Date:
On 25/12/2024 16:36, Richard Guo wrote:
> On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
>>> 2. It is ok for Vars. But what about expressions? We use equal() in
>>> distinct, MCV and dependencies modules. Do we need to remove nulls
>>> before using extended statistics as a general rule?
>>
>> AFAIU, the expressions in extended statistics are not decorated with
>> any nullingrels bits, are they?
> 
> I've just realized that there may be other places with similar issues,
> not just in estimate_num_groups.  For instance,
I'm pleased to see that you've grasped my initially unclear idea. Yeah, 
it seems that all types of statistics may be lost because of varnullingrels.

> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
> 
> It seems that we need to strip out the nullingrels bits from
> expressions before matching them to extended statistics or
> expressional index columns in more places.
I think Tomas Vondra may have a decisive opinion in this place: we have 
already discussed some approaches to calculate NULLs generated by RHS of 
Left Join shortly.
Maybe we can commit a quick cure like the one provided in your patch, 
but we should remember this example - it is not apparent to me how to 
estimate a group of clauses in the case when part of Vars has 
varnullingrels and part of them - doesn't.

Also, I think this is a good example that an explain analyse summary 
could have some sort of extended statistics usage report. It can help to 
clearly identify cases when extended statistics don't work, but should. 
- something like already implemented in SQL Server.

-- 
regards, Andrei Lepikhov