Home > mailing lists

Query plan for "id IS NULL" on PK - Mailing list pgsql-general

From	Ben Chrobot
Subject	Query plan for "id IS NULL" on PK
Date	February 14, 2023 22:04:51
Msg-id	CALRaRxzGqDbry9r+0KY5WhpY1xB=o3v3ywrQy2BRUYqZeY5fdQ@mail.gmail.com Whole thread
Responses	Re: Query plan for "id IS NULL" on PK Re: Query plan for "id IS NULL" on PK
List	pgsql-general

Tree view

Hello,

Long time listener, first time caller.

We have a large table (~470 million rows) with integer primary key id (not null) on a Postgres 14.5 cluster. A third-party tool is attempting to perform a SELECT-based full table copy in preparation for log-based sync with a query like the following:

SELECT "id", "other_column_a", "other_column_b", "created_at", "updated_at"
FROM "public"."my_large_table"
WHERE (("id" > ? OR "id" IS NULL)) AND (("id" <= ?))
ORDER BY "id" LIMIT 50000;

The lower bound increments by batch size (50k) while the upper bound is always the `max(id)`, in our case around 575,000,000.

The query plan produced is very slow as the index condition does basically nothing:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..21901.46 rows=50000 width=417) (actual time=1708920.675..1709198.995 rows=50000 loops=1)
-> Index Scan using my_large_table_pkey on my_large_table (cost=0.57..135230792.97 rows=308733624 width=417) (actual time=1708920.673..1709195.926 rows=50000 loops=1)
Index Cond: (id <= 575187488)
Filter: ((id > 193208795) OR (id IS NULL))
Rows Removed by Filter: 157784540
Planning Time: 0.186 ms
Execution Time: 1709200.618 ms
(7 rows)
Time: 1709231.721 ms (28:29.232)

We cannot change the query being executed. Is there any way we can make the query planner ignore `OR (id IS NULL)` (as that will never be the case for the PK) and use both `id` clauses in the index condition?

We have provided the vendor with the same query without the `id is null` showing that it's significantly faster (see below). They have informed us that addressing the null check on a not null PK is on their roadmap to address but no timeline.

explain analyze
SELECT "id", "other_column_a", "other_column_b", "created_at", "updated_at"
FROM "public"."my_large_table"
WHERE (("id" > ?)) AND (("id" <= ?))
ORDER BY "id" LIMIT 50000;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..13930.19 rows=50000 width=416) (actual time=2.118..400.937 rows=50000 loops=1)
-> Index Scan using my_large_table_pkey on my_large_table (cost=0.57..85429173.84 rows=306645829 width=416) (actual time=2.117..398.325 rows=50000 loops=1)
Index Cond: ((id > 193208795) AND (id <= 575187488))
Planning Time: 0.166 ms
Execution Time: 402.376 ms

We have tried leading the planner to water with this view but it did not change the slow query plan:

create view my_fast_large_table as
select *
from my_large_table
where id is not null;

Any other tricks to try here?

Thank you,

Ben Chrobot

pgsql-general by date:

From: Rob Sargent
Date: 14 February 2023, 20:49:00
Subject: pro services list

From: Rob Sargent
Date: 14 February 2023, 22:25:32
Subject: Re: Query plan for "id IS NULL" on PK

Query plan for "id IS NULL" on PK - Mailing list pgsql-general

Previous

Next