On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > 2. It is ok for Vars. But what about expressions? We use equal() in
> > distinct, MCV and dependencies modules. Do we need to remove nulls
> > before using extended statistics as a general rule?
>
> AFAIU, the expressions in extended statistics are not decorated with
> any nullingrels bits, are they?
I've just realized that there may be other places with similar issues,
not just in estimate_num_groups. For instance,
-- after v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=5000 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)
-- before v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
QUERY PLAN
--------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..15032.50 rows=1 width=16)
Filter: ((t2.a + t2.b) IS NULL)
-> Seq Scan on t t1 (cost=0.00..15.00 rows=1000 width=8)
-> Materialize (cost=0.00..20.00 rows=1000 width=8)
-> Seq Scan on t t2 (cost=0.00..15.00 rows=1000 width=8)
(5 rows)
In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).
It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.
Thanks
Richard