Thread: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18423 Logged by: Jiayin Mao Email address: maojiayin@gmail.com PostgreSQL version: 15.0 Operating system: all Description: We found that when we use ORDER BY on an indexed field with a small limit value the index on the ordered field is always used, leading to long execution time when the ordered field does not have good selectivity. Usually the main purpose for using ORDER BY with LIMIT is to make sure the search result is stable. It is the conditions in the query (the expressions between WHERE and ORDER BY) that application developers want query planner to choose to select targeted rows as a first step. For example, our "user" table has an id primary key, an "org_id" column and a "disabled" column. The table has millions of rows and for each org_id there is only usually a few hundred rows. We have an index on (org_id, disabled) and that index can quickly select a few hundred rows from the million-row table. We want to find the row with the smallest id with a given org_id, so we use "ORDER BY id LIMIT 1" as order condition. The query plan uses the btree index on the id field since it is the ordered field, causing the execution to first see a lot more rows than it needs. In the following query plan, it has to remove 596003 rows but the org only has a few hundred rows, which can be seen in the second query plan below. ``` explain analyze SELECT * FROM "user" WHERE org_id = 123456 AND disabled = false ORDER BY id LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..366.83 rows=1 width=232) (actual time=357.319..357.320 rows=1 loops=1) -> Index Scan using user_pkey on user (cost=0.43..306673.84 rows=837 width=232) (actual time=357.319..357.319 rows=1 loops=1) Filter: ((NOT disabled) AND (org_id = 123456)) Rows Removed by Filter: 596003 Planning Time: 0.885 ms Execution Time: 357.373 ms ``` If the index of (org_id, disabled) is used by tricking the query planner with "ORDER BY id + 0 LIMIT 1", the query time drops from 357ms to 1.5ms. ``` EXPLAIN analyze SELECT * FROM "user" WHERE org_id = 123456 and disabled = false order by user.id + 0 limit 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=3145.61..3145.62 rows=1 width=236) (actual time=1.513..1.514 rows=1 loops=1) -> Sort (cost=3145.61..3147.71 rows=837 width=236) (actual time=1.512..1.513 rows=1 loops=1) Sort Key: ((id + 0)) Sort Method: top-N heapsort Memory: 25kB -> Index Scan using user_org_disabled_idx on user (cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166 loops=1) Index Cond: ((org_id = 123456) AND (disabled = false)) Planning Time: 0.908 ms Execution Time: 1.580 ms ```
Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
From
Jeff Janes
Date:
On Sat, Apr 6, 2024 at 5:44 PM PG Bug reporting form <noreply@postgresql.org> wrote:
For example, our "user" table has an id primary key, an "org_id" column and
a "disabled" column. The table has millions of rows and for each org_id
there is only usually a few hundred rows.
It would be helpful to know precisely how many millions of rows. We know it actually removed 596003 rows from the ordered index scan, but we don't know how many it thought it would need to remove. I reckon it thought it would remove # in table / 837, but I don't know what that division comes out to, not knowing the numerator.
-> Index Scan using user_org_disabled_idx on user
(cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
loops=1)
So this estimate is quite wrong, 837/166 = 5. Do you know why? This bad estimate makes this plan look 5 times too expensive, and the competing one look 5 times too cheap, for a ratio of 25. That is more than the current ratio between the two plan cost estimates, so fixing this could drive the difference. (The ratio of actual times is more than 25, so there is more to the problem than just this, but fixing this alone should be enough to drive the correct choice). So why is this estimate that bad? Is the selectivity estimate of `org_id = 123456` alone that bad, or is it only when combined with `disabled=false`?
A more robust solution is to add an index on (org_id, disabled, id). That way it can combine the two strategies, jumping to just the part of the index it needs and then reading it already in order. Not only will this be much faster than either of the two plans you show, it will also be more resilient to estimation errors.
Anyway, these just look like well-known estimation difficulties, nothing which seems like an actual bug. Estimation is hard and sometimes there is no way to know the correct value to use until after the query is already underway.
Cheers,
Jeff
Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
From
Mao Jiayin
Date:
Thanks Jeff. This is really helpful. I agree this is not really a bug, please feel free to close it.
If you still want to know the size of the table, it is about 6 million.
On Sat, Apr 6, 2024 at 7:25 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 6, 2024 at 5:44 PM PG Bug reporting form <noreply@postgresql.org> wrote:
For example, our "user" table has an id primary key, an "org_id" column and
a "disabled" column. The table has millions of rows and for each org_id
there is only usually a few hundred rows.It would be helpful to know precisely how many millions of rows. We know it actually removed 596003 rows from the ordered index scan, but we don't know how many it thought it would need to remove. I reckon it thought it would remove # in table / 837, but I don't know what that division comes out to, not knowing the numerator.-> Index Scan using user_org_disabled_idx on user
(cost=0.43..3141.43 rows=837 width=236) (actual time=0.049..1.407 rows=166
loops=1)So this estimate is quite wrong, 837/166 = 5. Do you know why? This bad estimate makes this plan look 5 times too expensive, and the competing one look 5 times too cheap, for a ratio of 25. That is more than the current ratio between the two plan cost estimates, so fixing this could drive the difference. (The ratio of actual times is more than 25, so there is more to the problem than just this, but fixing this alone should be enough to drive the correct choice). So why is this estimate that bad? Is the selectivity estimate of `org_id = 123456` alone that bad, or is it only when combined with `disabled=false`?A more robust solution is to add an index on (org_id, disabled, id). That way it can combine the two strategies, jumping to just the part of the index it needs and then reading it already in order. Not only will this be much faster than either of the two plans you show, it will also be more resilient to estimation errors.Anyway, these just look like well-known estimation difficulties, nothing which seems like an actual bug. Estimation is hard and sometimes there is no way to know the correct value to use until after the query is already underway.Cheers,Jeff