Re: ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: ERROR: corrupt MVNDistinct entry |
Date | |
Msg-id | CAMbWs4_euoGhR+ZEYAFLduVw1U+Ww3LY7LRpCZPqdxKg1AYy3Q@mail.gmail.com Whole thread Raw |
In response to | Re: ERROR: corrupt MVNDistinct entry (Richard Guo <guofenglinux@gmail.com>) |
List | pgsql-hackers |
On Wed, Dec 25, 2024 at 6:36 PM Richard Guo <guofenglinux@gmail.com> wrote: > In v16 and later, the nullingrels within the expression "t2.a + t2.b" > prevent it from being matched to the corresponding expression in > extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005). Furthermore, even without extended statistics or expressional index columns, the nullingrels can still cause issues with selectivity estimates. They may cause examine_variable to fail in identifying a valid RelOptInfo for an expression, making the relation size not available, which is required in many cases. For instance, create table t (a int, b int); insert into t select i, i from generate_series(1,10)i; analyze t; -- in v16 and later explain (costs on) select * from t t1 left join t t2 on true left join t t3 on t1.a = t3.a and t3.a < 8 where t1.a = coalesce(t2.a); QUERY PLAN ----------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..4.94 rows=1 width=24) Join Filter: (t1.a = t3.a) -> Nested Loop Left Join (cost=0.00..3.73 rows=1 width=16) Filter: (t1.a = COALESCE(t2.a)) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8) -> Materialize (cost=0.00..1.15 rows=10 width=8) -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8) -> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8) Filter: (a < 8) (9 rows) -- in v15 and before explain (costs on) select * from t t1 left join t t2 on true left join t t3 on t1.a = t3.a and t3.a < 8 where t1.a = coalesce(t2.a); QUERY PLAN ----------------------------------------------------------------------- Hash Left Join (cost=1.21..5.04 rows=10 width=24) Hash Cond: (t1.a = t3.a) -> Nested Loop Left Join (cost=0.00..3.73 rows=10 width=16) Filter: (t1.a = COALESCE(t2.a)) -> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=8) -> Materialize (cost=0.00..1.15 rows=10 width=8) -> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=8) -> Hash (cost=1.12..1.12 rows=7 width=8) -> Seq Scan on t t3 (cost=0.00..1.12 rows=7 width=8) Filter: (a < 8) (10 rows) In v16 and later, when calculating the join selectivity for "t1.a = coalesce(t2.a)", eqjoinsel sets nd2 — the number of distinct values of coalesce(t2.a) — to DEFAULT_NUM_DISTINCT (200) because the corresponding RelOptInfo is not identifiable. This results in very inaccurate join selectivity. I'm wondering if we also need to strip out the nullingrels from the expression in examine_variable(). I tried doing so and noticed a plan diff in regression test join.sql. @@ -2573,10 +2573,11 @@ -> Materialize -> Seq Scan on int4_tbl t2 Filter: (f1 > 1) - -> Seq Scan on int4_tbl t3 + -> Materialize + -> Seq Scan on int4_tbl t3 -> Materialize -> Seq Scan on int4_tbl t4 -(13 rows) +(14 rows) Thanks Richard
pgsql-hackers by date: