Re: Wrapping a where clause to preserve rows with nulls - Mailing list pgsql-general

From Adrian Garcia Badaracco
Subject Re: Wrapping a where clause to preserve rows with nulls
Date
Msg-id CAE8z92FTVnCfbS54F01st0QxeLMsgt1mcafnQAW94h-=6-sZ4g@mail.gmail.com
Whole thread Raw
In response to Re: Wrapping a where clause to preserve rows with nulls  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Wrapping a where clause to preserve rows with nulls
List pgsql-general
Thank you for the great idea Tom. While yes I can't modify the original WHERE clause I do think I'll be able to introspect it or get the system generating it to tell me which columns it references and then add an OR x is NULL OR y is NULL ...

For context, just in case it's interesting, I store Parquet statistics in a Postgres table and run the output of this thing on them: https://github.com/apache/datafusion/blob/f92442ea8e8944c78f8e40d6648d049ff8e335ec/datafusion/physical-optimizer/src/pruning.rs#L146-L456
Hence why I can't really control the WHERE clause (at least not without re-implementing a bunch of finicky error prone code).

On Wed, Dec 18, 2024 at 10:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Wednesday, December 18, 2024, Adrian Garcia Badaracco <
> adrian@adriangb.com> wrote:
>> Is there any way to include the rows where the predicate evaluates to null
>> while still using an index?

> ... A btree index, which handles =, can’t be told to behave
> differently and so cannot fulfill your desire to produce rows where the
> stored value is null; it can only produce those equal to 5000.

Not in a single scan, no.  But multiple scans are possible:

regression=# create table t (id int unique);
CREATE TABLE
regression=# explain select * from t where id = 5000 or id is null;
                                  QUERY PLAN                                 
------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=8.42..18.98 rows=14 width=4)
   Recheck Cond: ((id IS NULL) OR (id = 5000))
   ->  BitmapOr  (cost=8.42..8.42 rows=14 width=0)
         ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.25 rows=13 width=0)
               Index Cond: (id IS NULL)
         ->  Bitmap Index Scan on t_id_key  (cost=0.00..4.16 rows=1 width=0)
               Index Cond: (id = 5000)
(7 rows)

The OP was quite unclear about what semantics he wants for
multiple-variable WHERE clauses, but maybe something like this
would work:

WHERE (original-clause) OR x IS NULL OR y IS NULL OR ...

where each variable mentioned in original-clause is allowed
to also be NULL.  Or perhaps what is wanted is

WHERE (original-clause) OR (x IS NULL AND y IS NULL AND ...)

??

                        regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Wrapping a where clause to preserve rows with nulls
Next
From: Ron Johnson
Date:
Subject: Re: How to deal with dangling files after aborted `pg_restore`?