On Mon, 2021-12-06 at 10:18 -0700, Michael Lewis wrote:
> What is your default_statistics_target and how accurate is that
> estimate of 5668 rows? What is random_page_cost set to by the way?
>
>
>
default_statistics_target = 1000
random_page_cost = 2.0 (it's on AWS on a 9000 iops gp2 volume)
Postgresql 13.5 btw.
The estimate was reasonably accurate, there were 5069 actual rows
matching.
> More importantly, what is the better plan that you'd like the planner
> to use with your existing indexes?
Well, it takes a few ms to grab all 5000 rows by shipment_import_id
and then sort/limit them. It takes 30 seconds to do what it is doing
instead, and only when the table is mostly cached already, more like
4-5 minutes otherwise.
#explain analyze SELECT "shipment_import_records".* FROM
shipment_import_records" WHERE
shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
"shipment_import_records"."id" ASC LIMIT 1;
---------------------------------------------------------------------
---------------------------------------------------------------------
-------------------------------------
Limit (cost=0.44..873.08 rows=1 width=243) (actual
time=31689.725..31689.726 rows=1 loops=1)
-> Index Scan using shipment_import_records_pkey on
shipment_import_records (cost=0.44..5122405.71 rows=5870 width=243)
(actual time=31689.723..31689.724 rows=1 loops=1)
Filter: (shipment_import_id = 5090609)
Rows Removed by Filter: 28710802
Planning Time: 0.994 ms
Execution Time: 31689.744 ms
(6 rows)
Just with a kludge to force the better index:
# explain analyze SELECT * FROM (SELECT "shipment_import_records".*
FROM "shipment_import_records" WHERE
"shipment_import_records"."shipment_import_id" = 5090609 OFFSET 0) AS
x ORDER BY "id" ASC LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------
---------------------------------------------------------------------
------------------
Limit (cost=10655.34..10655.34 rows=1 width=243) (actual
time=4.868..4.869 rows=1 loops=1)
-> Sort (cost=10655.34..10670.02 rows=5870 width=243) (actual
time=4.867..4.868 rows=1 loops=1)
Sort Key: shipment_import_records.id
Sort Method: top-N heapsort Memory: 27kB
-> Index Scan using
index_shipment_import_records_on_shipment_import_id on
shipment_import_records (cost=0.44..10567.29 rows=5870 width=243)
(actual time=0.037..3.560 rows=5069 loops=1)
Index Cond: (shipment_import_id = 5090609)
Planning Time: 0.135 ms
Execution Time: 4.885 ms
(8 rows)
>
> Certainly a composite index would be very helpful here. Using explain
> analyze and sharing the output would give more info to go on.
>
Yeah I am going to just do the composite index for now, but was
hoping for a more generic option.
Thanks for looking at it.