Re: Query plan for "id IS NULL" on PK - Mailing list pgsql-general
From | Rob Sargent |
---|---|
Subject | Re: Query plan for "id IS NULL" on PK |
Date | |
Msg-id | 8e622f19-bc95-eac3-188d-b03734f547f6@gmail.com Whole thread Raw |
In response to | Query plan for "id IS NULL" on PK (Ben Chrobot <bchrobot@politicsrewired.com>) |
Responses |
Re: Query plan for "id IS NULL" on PK
|
List | pgsql-general |
On 2/14/23 15:04, Ben Chrobot wrote: > Hello, > > Long time listener, first time caller. > > We have a large table (~470 million rows) with integer primary key id > (not null) on a Postgres 14.5 cluster. A third-party tool is > attempting to perform a SELECT-based full table copy in preparation > for log-based sync with a query like the following: > > SELECT "id", "other_column_a", "other_column_b", "created_at", > "updated_at" > FROM "public"."my_large_table" > WHERE (("id" > ? OR "id" IS NULL)) AND (("id" <= ?)) > ORDER BY "id" LIMIT 50000; > > The lower bound increments by batch size (50k) while the upper bound > is always the `max(id)`, in our case around 575,000,000. > > The query plan produced is very slow as the index condition does > basically nothing: > > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.57..21901.46 rows=50000 width=417) (actual > time=1708920.675..1709198.995 rows=50000 loops=1) > -> Index Scan using my_large_table_pkey on > my_large_table (cost=0.57..135230792.97 rows=308733624 width=417) > (actual time=1708920.673..1709195.926 rows=50000 loops=1) > Index Cond: (id <= 575187488) > Filter: ((id > 193208795) OR (id IS NULL)) > Rows Removed by Filter: 157784540 > Planning Time: 0.186 ms > Execution Time: 1709200.618 ms > (7 rows) > Time: 1709231.721 ms (28:29.232) > > We cannot change the query being executed. Is there any way we can > make the query planner ignore `OR (id IS NULL)` (as that will never be > the case for the PK) and use both `id` clauses in the index condition? > > We have provided the vendor with the same query without the `id is > null` showing that it's significantly faster (see below). They have > informed us that addressing the null check on a not null PK is on > their roadmap to address but no timeline. > > explain analyze > SELECT "id", "other_column_a", "other_column_b", "created_at", > "updated_at" > FROM "public"."my_large_table" > WHERE (("id" > ?)) AND (("id" <= ?)) > ORDER BY "id" LIMIT 50000; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.57..13930.19 rows=50000 width=416) (actual > time=2.118..400.937 rows=50000 loops=1) > -> Index Scan using my_large_table_pkey on my_large_table > (cost=0.57..85429173.84 rows=306645829 width=416) (actual > time=2.117..398.325 rows=50000 loops=1) > Index Cond: ((id > 193208795) AND (id <= 575187488)) > Planning Time: 0.166 ms > Execution Time: 402.376 ms > > We have tried leading the planner to water with this view but it did > not change the slow query plan: > > create view my_fast_large_table as > select * > from my_large_table > where id is not null; > > Any other tricks to try here? > > Thank you, > Ben Chrobot When will id be null in a primary key?
pgsql-general by date: