Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT - Mailing list pgsql-general

From Alan Hodgson
Subject Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date
Msg-id bc47e9d0c490942eebb01dc1129097012005e1d2.camel@lists.simkin.ca
Whole thread Raw
Responses Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT  (Michael Lewis <mlewis@entrata.com>)
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT  (Rob Sargent <robjsargent@gmail.com>)
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
I keep running into problems like these:

Devs are using an ORM. It really likes to produce queries like:

SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1;

I don't know why they do this. Usually it's more like 50 for pagination which make more sense. But for whatever reason this keeps coming up.

The table has nearly 29 million records. 5069 of them match shipment_import_id = 5090609. There is an index on shipment_import_id, which the planner happily uses without the LIMIT specifically. Yet with it the query planner will always do something like:

# explain SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1;         
                                                        QUERY PLAN                                                           
-----------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.44..873.35 rows=1 width=243)
  ->  Index Scan using shipment_import_records_pkey on shipment_import_records  (cost=0.44..5122227.70 rows=5868 width=243)
        Filter: (shipment_import_id = 5090609)

.. which takes minutes.

I know I can work around this. Generally I would just drop the index on shipment_import_id and create one on shipment_import_id,id. Or if I can get the devs to wrap their query in an inner select with a fake offset to fool the query planner that works too. But both seem hacky.

Just wondering if there's a knob I can turn to make these more likely to work without constantly implementing workarounds?

Thanks for any help.

pgsql-general by date:

Previous
From: Avi Weinberg
Date:
Subject: Are Foreign Key Disabled During Logical Replication Initial Sync?
Next
From: Michael Lewis
Date:
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT