ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers

From Richard Guo
Subject ERROR: corrupt MVNDistinct entry
Date
Msg-id CAMbWs4-2Z4k+nFTiZe0Qbu5n8juUWenDAtMzi98bAZQtwHx0-w@mail.gmail.com
Whole thread Raw
Responses Re: ERROR: corrupt MVNDistinct entry
List pgsql-hackers
I ran into this error in estimate_multivariate_ndistinct, and it can
be reproduced with the query below.

create table t (a int, b int);
insert into t select 1, 1;
create statistics s (ndistinct) on a, b from t;
analyze;

explain select 1 from t t1
    left join (select a c1, coalesce(a) c2 from t t2) s on true
group by s.c1, s.c2;
ERROR:  corrupt MVNDistinct entry

And the first bad commit is:

2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit
commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jan 30 13:16:20 2023 -0500

    Make Vars be outer-join-aware.

So in this query, there are two grouping expressions: s.c1 is Var t2.a
with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being
{3}, and its contained expression is Var t2.a with empty nullingrels.

This eventually leads to estimate_num_groups creating two separate
GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with
empty nullingrels.  As a result, estimate_multivariate_ndistinct
incorrectly assumes there are two matching expressions.  When it later
fails to find the exact match for the combination, it mistakenly
concludes that there is a corrupt MVNDistinct entry.

It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.

So I wonder if we can fix this issue by removing the nullingrels
within the grouping expressions first in estimate_num_groups. Such as:

--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
 #include "parser/parse_clause.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
 #include "statistics/statistics.h"
 #include "storage/bufmgr.h"
 #include "utils/acl.h"
@@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
    if (groupExprs == NIL || (pgset && *pgset == NIL))
        return 1.0;

+   groupExprs = (List *) remove_nulling_relids((Node *) groupExprs,
+                                               root->outer_join_rels,
+                                               NULL);
+
    /*
     * Count groups derived from boolean grouping expressions.  For other
     * expressions, find the unique Vars used, treating an expression as a Var

Any thoughts?

Thanks
Richard



pgsql-hackers by date:

Previous
From: Yugo Nagata
Date:
Subject: Re: Allow ILIKE forward matching to use btree index
Next
From: wenhui qiu
Date:
Subject: Re: transaction lost when delete clog file after normal shutdown