Home > mailing lists

Re: ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: ERROR: corrupt MVNDistinct entry
Date	December 27, 2024 07:16:07
Msg-id	CAMbWs4_euoGhR+ZEYAFLduVw1U+Ww3LY7LRpCZPqdxKg1AYy3Q@mail.gmail.com Whole thread Raw
In response to	Re: ERROR: corrupt MVNDistinct entry (Richard Guo <guofenglinux@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Dec 25, 2024 at 6:36 PM Richard Guo <guofenglinux@gmail.com> wrote:
> In v16 and later, the nullingrels within the expression "t2.a + t2.b"
> prevent it from being matched to the corresponding expression in
> extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

Furthermore, even without extended statistics or expressional index
columns, the nullingrels can still cause issues with selectivity
estimates.  They may cause examine_variable to fail in identifying a
valid RelOptInfo for an expression, making the relation size not
available, which is required in many cases.  For instance,

create table t (a int, b int);
insert into t select i, i from generate_series(1,10)i;
analyze t;

-- in v16 and later
explain (costs on)
select * from t t1
    left join t t2 on true
    left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
                              QUERY PLAN
-----------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..4.94 rows=1 width=24)
   Join Filter: (t1.a = t3.a)
   ->  Nested Loop Left Join  (cost=0.00..3.73 rows=1 width=16)
         Filter: (t1.a = COALESCE(t2.a))
         ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=8)
         ->  Materialize  (cost=0.00..1.15 rows=10 width=8)
               ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=8)
   ->  Seq Scan on t t3  (cost=0.00..1.12 rows=7 width=8)
         Filter: (a < 8)
(9 rows)

-- in v15 and before
explain (costs on)
select * from t t1
    left join t t2 on true
    left join t t3 on t1.a = t3.a and t3.a < 8
where t1.a = coalesce(t2.a);
                              QUERY PLAN
-----------------------------------------------------------------------
 Hash Left Join  (cost=1.21..5.04 rows=10 width=24)
   Hash Cond: (t1.a = t3.a)
   ->  Nested Loop Left Join  (cost=0.00..3.73 rows=10 width=16)
         Filter: (t1.a = COALESCE(t2.a))
         ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=8)
         ->  Materialize  (cost=0.00..1.15 rows=10 width=8)
               ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=8)
   ->  Hash  (cost=1.12..1.12 rows=7 width=8)
         ->  Seq Scan on t t3  (cost=0.00..1.12 rows=7 width=8)
               Filter: (a < 8)
(10 rows)

In v16 and later, when calculating the join selectivity for "t1.a =
coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of
coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the
corresponding RelOptInfo is not identifiable.  This results in very
inaccurate join selectivity.

I'm wondering if we also need to strip out the nullingrels from the
expression in examine_variable().  I tried doing so and noticed a plan
diff in regression test join.sql.

@@ -2573,10 +2573,11 @@
                ->  Materialize
                      ->  Seq Scan on int4_tbl t2
                            Filter: (f1 > 1)
-         ->  Seq Scan on int4_tbl t3
+         ->  Materialize
+               ->  Seq Scan on int4_tbl t3
    ->  Materialize
          ->  Seq Scan on int4_tbl t4
-(13 rows)
+(14 rows)

Thanks
Richard

pgsql-hackers by date:

From: vignesh C
Date: 27 December 2024, 07:11:00
Subject: Re: Logical replication timeout

From: Yurii Rashkovskii
Date: 27 December 2024, 08:10:48
Subject: Re: Add Postgres module info

Re: ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers

Previous

Next