Regression: partial index with IS NOT NULL predicate not used for min/max optimization on NOT NULL columns - Mailing list pgsql-bugs

From Dmytro Astapov
Subject Regression: partial index with IS NOT NULL predicate not used for min/max optimization on NOT NULL columns
Date
Msg-id CAFQUnFhmAi2hkgY=eEwokriYwZngG+2eeZY=BnBw+m=drDCAVg@mail.gmail.com
Whole thread Raw
Responses Re: Regression: partial index with IS NOT NULL predicate not used for min/max optimization on NOT NULL columns
List pgsql-bugs
Hi!

Merry Christmas, happy New Year, and various seasonal greetings to you all. I come bearing the bug report for a regression, and hopefully a fix as well.

Short summary:

When a column is defined as "NOT NULL" and a partial index exists with a predicate "WHERE column IS NOT NULL", the min/max aggregate optimization fails to use the partial index. This is a regression introduced in PostgreSQL 17.

Environment and PostgreSQL Versions:

Affected versions: 17.7, 18.1
Not affected versions: 16.11, 15.15, 14.20, 13.23

OS: Red Hat 11.5.0-5

Steps to Reproduce

CREATE TABLE test (seqno int NOT NULL);

INSERT INTO test SELECT s FROM generate_series(1, 100000) s;

CREATE UNIQUE INDEX ON test(seqno) WHERE seqno IS NOT NULL;

ANALYZE test;

EXPLAIN SELECT max(seqno) FROM test;

Expected behavior (observed on PostgreSQL 16.11 and earlier)

                                      QUERY PLAN                                       

---------------------------------------------------------------------------------------

Result (cost=0.32..0.33 rows=1 width=4)

  InitPlan 1

    -> Limit (cost=0.29..0.32 rows=1 width=4)

          -> Index Only Scan Backward using test_seqno_idx on test (cost=...)

Actual behavior (observed on PostgreSQL 17.7 and later)

                            QUERY PLAN                            

------------------------------------------------------------------

Aggregate (cost=1693.00..1693.01 rows=1 width=4)

  -> Seq Scan on test (cost=0.00..1443.00 rows=100000 width=4)


I've tried to find the root cause, and this is my (possibly incorrect) analysis

The regression was introduced by commit b262ad440ed ("Add better handling of redundant IS [NOT] NULL quals", 2024-01-23).

This commit added an optimization to remove "IS NOT NULL" restriction clauses when the column is defined as "NOT NULL", since such clauses are always true.
This optimization was added to fix Bug #17540 related to poor index choice in min/max queries.

However, this optimization has an unintended side effect on partial index matching:

1. The min/max optimization (in "planagg.c") rewrites "SELECT max(col)" into a subquery with the clause "WHERE col IS NOT NULL" added to handle NULL values correctly.

2. The new optimization in `add_base_clause_to_rel()` recognizes that the added clause "col IS NOT NULL" is always true (since "col" is "NOT NULL" according to the table schema) and discards the clause before adding it to "baserestrictinfo".

3. Later, "check_index_predicates()" attempts to prove that the partial index predicate ("WHERE seqno IS NOT NULL") is implied by the query's clauses in "baserestrictinfo".

4. Since the "IS NOT NULL" clause was discarded, the partial index predicate cannot be proven, and the index is not marked as usable ("predOK" remains false).

Suggested fix

I think that when clause in the query matches a predicate in the partial index, it should not be discarded.

Then fix could be to modify `add_base_clause_to_rel()` to retain `IS NOT NULL` clauses when they match a partial index predicate on the same column. This preserves the intended fix for Bug #17540 while restoring partial index usability.

I am attaching the patch against REL_18_1 which attempts to do this.

Testing

- The attached patch resolves the reported issue
- All 228 standard regression tests pass


--

Dmytro Astapov
Attachment

pgsql-bugs by date:

Previous
From: Tender Wang
Date:
Subject: Re: BUG #19355: Attempt to insert data unexpectedly during concurrent update
Next
From: Tom Lane
Date:
Subject: Re: Regression: partial index with IS NOT NULL predicate not used for min/max optimization on NOT NULL columns