Re: ERROR: corrupt MVNDistinct entry - Mailing list pgsql-hackers

From Richard Guo
Subject Re: ERROR: corrupt MVNDistinct entry
Date
Msg-id CAMbWs4-0zvk_K-0eZp238fu9GNOjdZWZ6-0f2d21EQRegU_D9A@mail.gmail.com
Whole thread Raw
In response to Re: ERROR: corrupt MVNDistinct entry  (Richard Guo <guofenglinux@gmail.com>)
Responses Re: ERROR: corrupt MVNDistinct entry
List pgsql-hackers
On Wed, Dec 25, 2024 at 5:14 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Wed, Dec 25, 2024 at 11:34 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> > 2. It is ok for Vars. But what about expressions? We use equal() in
> > distinct, MCV and dependencies modules. Do we need to remove nulls
> > before using extended statistics as a general rule?
>
> AFAIU, the expressions in extended statistics are not decorated with
> any nullingrels bits, are they?

I've just realized that there may be other places with similar issues,
not just in estimate_num_groups.  For instance,

-- after v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..15032.50 rows=5000 width=16)
   Filter: ((t2.a + t2.b) IS NULL)
   ->  Seq Scan on t t1  (cost=0.00..15.00 rows=1000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on t t2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

-- before v16
explain (costs on)
select * from t t1 left join t t2 on true where (t2.a+t2.b) is null;
                             QUERY PLAN
--------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.00..15032.50 rows=1 width=16)
   Filter: ((t2.a + t2.b) IS NULL)
   ->  Seq Scan on t t1  (cost=0.00..15.00 rows=1000 width=8)
   ->  Materialize  (cost=0.00..20.00 rows=1000 width=8)
         ->  Seq Scan on t t2  (cost=0.00..15.00 rows=1000 width=8)
(5 rows)

In v16 and later, the nullingrels within the expression "t2.a + t2.b"
prevent it from being matched to the corresponding expression in
extended statistics, forcing us to use DEFAULT_UNK_SEL(0.005).

It seems that we need to strip out the nullingrels bits from
expressions before matching them to extended statistics or
expressional index columns in more places.

Thanks
Richard



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: ERROR: corrupt MVNDistinct entry
Next
From: Vladlen Popolitov
Date:
Subject: Re: COPY performance on Windows