I ran into this error in estimate_multivariate_ndistinct, and it can
be reproduced with the query below.
create table t (a int, b int);
insert into t select 1, 1;
create statistics s (ndistinct) on a, b from t;
analyze;
explain select 1 from t t1
left join (select a c1, coalesce(a) c2 from t t2) s on true
group by s.c1, s.c2;
ERROR: corrupt MVNDistinct entry
And the first bad commit is:
2489d76c4906f4461a364ca8ad7e0751ead8aa0d is the first bad commit
commit 2489d76c4906f4461a364ca8ad7e0751ead8aa0d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Jan 30 13:16:20 2023 -0500
Make Vars be outer-join-aware.
So in this query, there are two grouping expressions: s.c1 is Var t2.a
with nullingrels set to {3}; s.c2 is a PHV with nullingrels also being
{3}, and its contained expression is Var t2.a with empty nullingrels.
This eventually leads to estimate_num_groups creating two separate
GroupVarInfos for Var t2.a: one with nullingrels {3}, and another with
empty nullingrels. As a result, estimate_multivariate_ndistinct
incorrectly assumes there are two matching expressions. When it later
fails to find the exact match for the combination, it mistakenly
concludes that there is a corrupt MVNDistinct entry.
It seems to me that when estimating the number of groups, we do not
need to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the grouping expressions, and we should not
count the same Var more than once.
So I wonder if we can fix this issue by removing the nullingrels
within the grouping expressions first in estimate_num_groups. Such as:
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -121,6 +121,7 @@
#include "parser/parse_clause.h"
#include "parser/parse_relation.h"
#include "parser/parsetree.h"
+#include "rewrite/rewriteManip.h"
#include "statistics/statistics.h"
#include "storage/bufmgr.h"
#include "utils/acl.h"
@@ -3446,6 +3447,10 @@ estimate_num_groups(PlannerInfo *root, List
*groupExprs, double input_rows,
if (groupExprs == NIL || (pgset && *pgset == NIL))
return 1.0;
+ groupExprs = (List *) remove_nulling_relids((Node *) groupExprs,
+ root->outer_join_rels,
+ NULL);
+
/*
* Count groups derived from boolean grouping expressions. For other
* expressions, find the unique Vars used, treating an expression as a Var
Any thoughts?
Thanks
Richard