Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT - Mailing list pgsql-general

From Alan Hodgson
Subject Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Date
Msg-id ac6b2c9214684d7645554b0673d4a7148c0a52e9.camel@lists.simkin.ca
Whole thread Raw
In response to Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general
On Mon, 2021-12-06 at 18:20 +0100, Francisco Olarte wrote:

Can you post an explain analyze? To me it seems like the planner
thinks shipment_import_id is randomly distributed and the table is
well correlated with it's PK, so scanning it for the first id should
be fast.

   #explain analyze SELECT "shipment_import_records".* FROM 
   shipment_import_records" WHERE 
   shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
   "shipment_import_records"."id" ASC LIMIT 1;
   ---------------------------------------------------------------------
   ---------------------------------------------------------------------
   -------------------------------------
    Limit  (cost=0.44..873.08 rows=1 width=243) (actual
   time=31689.725..31689.726 rows=1 loops=1)
      ->  Index Scan using shipment_import_records_pkey on
   shipment_import_records  (cost=0.44..5122405.71 rows=5870 width=243)
   (actual time=31689.723..31689.724 rows=1 loops=1)
            Filter: (shipment_import_id = 5090609)
            Rows Removed by Filter: 28710802
    Planning Time: 0.994 ms
    Execution Time: 31689.744 ms
   (6 rows)


      The biggest one (but yes "earlier"):

# explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 1247888 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1; 
                                                                                  QUERY PLAN                                                                          
        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------
Limit  (cost=0.44..426.59 rows=1 width=243) (actual time=8007.069..8007.070 rows=1 loops=1)
  ->  Index Scan using shipment_import_records_pkey on shipment_import_records  (cost=0.44..5126628.40 rows=12030 width=243) (actual time=8007.068..8007.068 rows=1 l
oops=1)
        Filter: (shipment_import_id = 1247888)
        Rows Removed by Filter: 10929193
Planning Time: 0.584 ms
Execution Time: 8007.086 ms
(6 rows)


And the smallest/latest, which actually uses the "right" index:

 # explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5116174 ORDER BY 
"shipment_import_records"."id" ASC LIMIT 1;      
                                                                                  QUERY PLAN                                                                  
                        
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------
Limit  (cost=145.44..145.44 rows=1 width=243) (actual time=0.018..0.018 rows=1 loops=1)
  ->  Sort  (cost=145.44..145.64 rows=79 width=243) (actual time=0.017..0.018 rows=1 loops=1)
        Sort Key: id
        Sort Method: quicksort  Memory: 26kB
        ->  Index Scan using index_shipment_import_records_on_shipment_import_id on shipment_import_records  (cost=0.44..145.05 rows=79 width=243) (actual time=0.013
..0.014 rows=1 loops=1)
              Index Cond: (shipment_import_id = 5116174)
Planning Time: 0.104 ms
Execution Time: 0.032 ms
(8 rows)


But from the names of the field you may have correlation between
shipment_import_id and id hidden somewhere ( like they are two serial
growing together, you query for the latest shipment ids and it scans
all the table ). An explain analyze should show that ( or three, one
for that shipment import id, one for 1, one for a really big one )


This is definitely the case. And we are generally looking for newer data for most operations.

Thanks for looking at it.

pgsql-general by date:

Previous
From: Alan Hodgson
Date:
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
Next
From: Alan Hodgson
Date:
Subject: Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT