Thread: BUG #18834: Query planer is choosing the sub-optimal plan when limit is present
BUG #18834: Query planer is choosing the sub-optimal plan when limit is present
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18834 Logged by: Michael Stoeckel Email address: michael@joincandidhealth.com PostgreSQL version: 15.10 Operating system: x86_64-pc-linux-gnu Description: We have a table, `fn_ledger`, containing approximately **80 million records**. The table includes a **primary key (`id`)** of type **UUID** and a **`provenance`** column of type **JSONB**, which is **GIN indexed**, along with other fields. We observed **poor query performance** when using an **`ORDER BY`** clause combined with **`LIMIT`**, which led us to investigate further. Consider the following simple query: ``` select * from fn_ledger where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}' order by id ``` In the query above, the `WHERE` clause is **highly selective**, meaning it will return at most a handful of records. Through our investigation, we found that: - **When `LIMIT` is not present**, the planner correctly chooses to use the **bitmap index scan** on `ix_fn_ledger_provenance`, resulting in efficient query execution. - **When `LIMIT` is introduced**, the planner **incorrectly opts** for a sequential scan on `fn_ledger_pkey` while filtering on the `provenance` value, leading to poor performance. See the query plans below: ``` postgres=> explain analyze select * from fn_ledger where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}' order by id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=32625.36..32645.41 rows=8020 width=333) (actual time=2.943..2.944 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on fn_ledger (cost=1230.16..32105.29 rows=8020 width=333) (actual time=2.939..2.939 rows=1 loops=1) Recheck Cond: (provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on ix_fn_ledger_provenance (cost=0.00..1228.15 rows=8020 width=0) (actual time=2.923..2.924 rows=1 loops=1) Index Cond: (provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb) Planning Time: 0.115 ms Execution Time: 2.978 ms (10 rows) postgres=> explain analyze select * from fn_ledger where provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}' order by id limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.57..1816.54 rows=1 width=333) (actual time=317508.674..317508.675 rows=1 loops=1) -> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564073.97 rows=8020 width=333) (actual time=317508.673..317508.673 rows=1 loops=1) Filter: (provenance @> '{"locator": "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb) Rows Removed by Filter: 49197422 Planning Time: 0.148 ms Execution Time: 317508.742 ms (6 rows) ``` Furthermore, since the planner incorrectly chooses to perform an **index scan** on `fn_ledger_pkey`, the query performance varies significantly depending on **where the candidate record's `id` appears** within the `fn_ledger_pkey` index. To illustrate this, I selected the **first and last records** based on the ordering of the `id` column. ``` select id, provenance->'locator' from fn_ledger where id in ( '00000025-fb46-46cd-b7eb-19e111fa1e87', 'ffffffdb-23f6-4a6e-9385-86e5e32a9309' ) =================== 00000025-fb46-46cd-b7eb-19e111fa1e87 "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d" ffffffdb-23f6-4a6e-9385-86e5e32a9309 "f2d20507-a543-4c38-b3e3-c977ac407b42" ``` then, I executed the following queries, both with `LIMIT` ``` postgres=> explain analyze select * from fn_ledger where fn_ledger.provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}' order by id limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.57..1817.68 rows=1 width=333) (actual time=0.016..0.017 rows=1 loops=1) -> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14478744.69 rows=7968 width=333) (actual time=0.015..0.016 rows=1 loops=1) Filter: (provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb) Planning Time: 0.184 ms Execution Time: 0.031 ms (5 rows) postgres=> explain analyze select * from fn_ledger where fn_ledger.provenance @> '{"locator": "f2d20507-a543-4c38-b3e3-c977ac407b42"}' order by id limit 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.57..1817.75 rows=1 width=333) (actual time=57515.333..57515.334 rows=1 loops=1) -> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14479340.89 rows=7968 width=333) (actual time=57515.331..57515.332 rows=1 loops=1) Filter: (provenance @> '{"locator": "f2d20507-a543-4c38-b3e3-c977ac407b42"}'::jsonb) Rows Removed by Filter: 30847879 Planning Time: 0.147 ms Execution Time: 57515.418 ms (6 rows) ``` As you can see, both queries are **identical**, yet the first query is **orders of magnitude faster** than the second. This difference occurs because the **candidate record** for the first query appears **towards the beginning** of the `fn_ledger_pkey` index, allowing it to be found quickly. In contrast, the candidate record for the second query appears **towards the end** of the index, requiring significantly more scanning. Beyond this inconsistency, if the **`LIMIT` value exceeds the actual number of records found by the query**, the index scan will have to **traverse the entire index**, regardless of the candidate record’s position. This further degrades performance. ``` postgres=> explain analyze select * from fn_ledger where fn_ledger.provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}' order by id limit 1; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.57..1816.41 rows=1 width=333) (actual time=0.013..0.014 rows=1 loops=1) -> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49 rows=8021 width=333) (actual time=0.013..0.013 rows=1 loops=1) Filter: (provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb) Planning Time: 0.145 ms Execution Time: 0.027 ms (5 rows) postgres=> explain analyze select * from fn_ledger where fn_ledger.provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}' order by id limit 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.57..3632.25 rows=2 width=333) (actual time=0.017..126786.189 rows=1 loops=1) -> Index Scan using fn_ledger_pkey on fn_ledger (cost=0.57..14564849.49 rows=8021 width=333) (actual time=0.016..126786.187 rows=1 loops=1) Filter: (provenance @> '{"locator": "9cdf7902-ba7d-4c07-9f92-451aeebe6e2d"}'::jsonb) Rows Removed by Filter: 80472490 Planning Time: 0.148 ms Execution Time: 126786.246 ms (6 rows) ``` As you can see from the above queries, the only difference is that the second query's `LIMIT` is 2 instead of 1. In conclusion, it seems that the query planner is **placing too much emphasis on the `LIMIT` clause**, prioritizing it over selecting the more efficient `ix_fn_ledger_provenance` index. As a result, it **ignores the optimal indexing strategy** and instead opts for a less efficient scan on `fn_ledger_pkey`, leading to inconsistent and suboptimal query performance.
Re: BUG #18834: Query planer is choosing the sub-optimal plan when limit is present
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > We observed **poor query performance** when using an **`ORDER BY`** clause > combined with **`LIMIT`**, which led us to investigate further. This is a well-known syndrome. The fundamental problem is the poor selectivity estimate: > -> Bitmap Index Scan on ix_fn_ledger_provenance > (cost=0.00..1228.15 rows=8020 width=0) (actual time=2.923..2.924 rows=1 > loops=1) > Index Cond: (provenance @> '{"locator": > "52ca2f03-8184-448f-9b9e-7a0ed99e6922"}'::jsonb) If the thing thinks there are 8020 matches for the @> condition when there's only one, it's really pure luck if you get an optimal plan. All its cost estimates will be off by a factor of more than 1000, and these decisions are by no means linear. They're particularly not linear in the presence of LIMIT, but query plans can go far astray even without that. So what you need to do about this is get a better selectivity estimate. You might be able to get somewhere with custom statistics (see CREATE STATISTICS), or just by cranking up the statistics target for the "provenance" column to the maximum. But I'm not sure how much that will help. Fundamentally, putting stuff into unstructured JSON storage and expecting to get efficient searches of it is an antipattern. A more reliable answer is to change your query. You could, for example, make an expression index on (provenance ->> 'locator') and then write WHERE (provenance ->> 'locator') = '52ca2f03-8184-448f-9b9e-7a0ed99e6922' That will provide both fast searches and reasonably trustworthy stats (once ANALYZE has seen the index). regards, tom lane