Home > mailing lists

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT - Mailing list pgsql-general

From	Pavel Stehule
Subject	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date	December 6, 2021 18:22:15
Msg-id	CAFj8pRDsmF2PVKOQK47EhOfhSPrKuRoUSfb=cwJJSM+NubbUMg@mail.gmail.com Whole thread
In response to	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT (Francisco Olarte <folarte@peoplecall.com>)
Responses	Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
List	pgsql-general

Tree view

po 6. 12. 2021 v 18:21 odesílatel Francisco Olarte <folarte@peoplecall.com> napsal:

On Mon, 6 Dec 2021 at 18:03, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:
...
> The table has nearly 29 million records. 5069 of them match shipment_import_id = 5090609. There is an index on shipment_import_id, which the planner happily uses without the LIMIT specifically. Yet with it the query planner will always do something like:
>
> # explain SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.44..873.35 rows=1 width=243)
> -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5122227.70 rows=5868 width=243)
> Filter: (shipment_import_id = 5090609)
> .. which takes minutes.

Can you post an explain analyze? To me it seems like the planner
thinks shipment_import_id is randomly distributed and the table is
well correlated with it's PK, so scanning it for the first id should
be fast.

But from the names of the field you may have correlation between
shipment_import_id and id hidden somewhere ( like they are two serial
growing together, you query for the latest shipment ids and it scans
all the table ). An explain analyze should show that ( or three, one
for that shipment import id, one for 1, one for a really big one )

> Just wondering if there's a knob I can turn to make these more likely to work without constantly implementing workarounds?

You may try a composite index.

+1 These issues can be solved by composite indexes. The low limit clause deforms costs and when the data are not really random, then index scan can be too long.

Francisco Olarte.

pgsql-general by date:

From: Adrian Klaver
Date: 06 December 2021, 18:07:57
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

From: "Phil Endecott"
Date: 06 December 2021, 18:48:47
Subject: Advice on using materialized views

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT - Mailing list pgsql-general

Previous

Next