Re: Query plan for "id IS NULL" on PK - Mailing list pgsql-general

From Rob Sargent
Subject Re: Query plan for "id IS NULL" on PK
Date
Msg-id 8e622f19-bc95-eac3-188d-b03734f547f6@gmail.com
Whole thread Raw
In response to Query plan for "id IS NULL" on PK  (Ben Chrobot <bchrobot@politicsrewired.com>)
Responses Re: Query plan for "id IS NULL" on PK  ("David G. Johnston" <david.g.johnston@gmail.com>)
List pgsql-general
On 2/14/23 15:04, Ben Chrobot wrote:
> Hello,
>
> Long time listener, first time caller.
>
> We have a large table (~470 million rows) with integer primary key id 
> (not null) on a Postgres 14.5 cluster. A third-party tool is 
> attempting to perform a SELECT-based full table copy in preparation 
> for log-based sync with a query like the following:
>
> SELECT "id", "other_column_a", "other_column_b", "created_at", 
> "updated_at"
> FROM "public"."my_large_table"
> WHERE (("id" > ? OR "id" IS NULL)) AND (("id" <= ?))
> ORDER  BY "id" LIMIT 50000;
>
> The lower bound increments by batch size (50k) while the upper bound 
> is always the `max(id)`, in our case around 575,000,000.
>
> The query plan produced is very slow as the index condition does 
> basically nothing:
>
>                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.57..21901.46 rows=50000 width=417) (actual 
> time=1708920.675..1709198.995 rows=50000 loops=1)
>    ->  Index Scan using my_large_table_pkey on 
> my_large_table  (cost=0.57..135230792.97 rows=308733624 width=417) 
> (actual time=1708920.673..1709195.926 rows=50000 loops=1)
>          Index Cond: (id <= 575187488)
>          Filter: ((id > 193208795) OR (id IS NULL))
>          Rows Removed by Filter: 157784540
>  Planning Time: 0.186 ms
>  Execution Time: 1709200.618 ms
> (7 rows)
> Time: 1709231.721 ms (28:29.232)
>
> We cannot change the query being executed. Is there any way we can 
> make the query planner ignore `OR (id IS NULL)` (as that will never be 
> the case for the PK) and use both `id` clauses in the index condition?
>
> We have provided the vendor with the same query without the `id is 
> null` showing that it's significantly faster (see below). They have 
> informed us that addressing the null check on a not null PK is on 
> their roadmap to address but no timeline.
>
> explain analyze
> SELECT "id", "other_column_a", "other_column_b", "created_at", 
> "updated_at"
> FROM "public"."my_large_table"
> WHERE (("id" > ?)) AND (("id" <= ?))
> ORDER  BY "id" LIMIT 50000;
>
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.57..13930.19 rows=50000 width=416) (actual 
> time=2.118..400.937 rows=50000 loops=1)
> -> Index Scan using my_large_table_pkey on my_large_table 
> (cost=0.57..85429173.84 rows=306645829 width=416) (actual 
> time=2.117..398.325 rows=50000 loops=1)
> Index Cond: ((id > 193208795) AND (id <= 575187488))
> Planning Time: 0.166 ms
> Execution Time: 402.376 ms
>
> We have tried leading the planner to water with this view but it did 
> not change the slow query plan:
>
> create view my_fast_large_table as
> select *
> from my_large_table
> where id is not null;
>
> Any other tricks to try here?
>
> Thank you,
> Ben Chrobot

When will id be null in a primary key?





pgsql-general by date:

Previous
From: Ben Chrobot
Date:
Subject: Query plan for "id IS NULL" on PK
Next
From: "David G. Johnston"
Date:
Subject: Re: Query plan for "id IS NULL" on PK