Thread: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)
Hi.
I'm trying to understand the logic which the planner uses in "WHERE x IN (IDS) ORDER BY y LIMIT N" queries when the correct index exists in the database.
I expected that, if IDS list is small and N is small too, the planner should've done the following: for each element in IDS, query first N items from the index, then union the results (up to IDS*N elements, thus small) and limit it by N items.
Instead, the planner decides to run a bitmap index scan, fetch thousands of rows, and then post-filter most of them. Why? Is it possible to somehow tell the planner to use individual first-N fetches?
(SET STATISTICS to 10000 for both columns doesn't change anything; also I don't see how cardinality of any of these fields can theoretically affect the plan: we still need first N items from each of the index sub-parts.)
Instead, the planner decides to run a bitmap index scan, fetch thousands of rows, and then post-filter most of them. Why? Is it possible to somehow tell the planner to use individual first-N fetches?
(SET STATISTICS to 10000 for both columns doesn't change anything; also I don't see how cardinality of any of these fields can theoretically affect the plan: we still need first N items from each of the index sub-parts.)
CREATE TABLE roles(
id bigint NOT NULL,
id1 bigint,
created_at timestamptz NOT NULL
);
CREATE INDEX ON roles(id1, created_at DESC);
EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
'1001361878439251615', '1001349402553202617', '1001329448424677858',
'1001348457743394950', '1001361706624116300', '1001338330225145648',
'1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 50
WHERE id1 IN(
'1001361878439251615', '1001349402553202617', '1001329448424677858',
'1001348457743394950', '1001361706624116300', '1001338330225145648',
'1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 50
Limit (cost=50171.99..50177.83 rows=50 width=42) (actual time=206.056..208.865 rows=50 loops=1) -> Gather Merge (cost=50171.99..57802.99 rows=65404 width=42) (actual time=206.055..208.857 rows=50 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=49171.97..49253.73 rows=32702 width=42) (actual time=198.944..198.948 rows=40 loops=3) Sort Key: created_at DESC Sort Method: top-N heapsort Memory: 31kB Worker 0: Sort Method: top-N heapsort Memory: 30kB Worker 1: Sort Method: top-N heapsort Memory: 32kB -> Parallel Bitmap Heap Scan on roles (cost=4209.08..48085.63 rows=32702 width=42) (actual time=78.119..180.352 rows=60563 loops=3) Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[])) Filter: (cred_id = '1001344096118566254'::bigint) Rows Removed by Filter: 526 Heap Blocks: exact=5890 -> Bitmap Index Scan on roles_asset_created_desc (cost=0.00..4189.46 rows=182139 width=0) (actual time=73.761..73.761 rows=183934 loops=1) Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[])) Planning Time: 0.590 ms Execution Time: 208.935 ms
Your query and explain analyze output do not seem to match.
Filter: (cred_id = '1001344096118566254'::bigint)
I don't see anything like that in your query, nor an index that would support accomplishing that without filtering after fetching the 184k rows initially like the planner does.
Yeah, that was a plan for a query before its simplification. But effect is still the same, and also the question is still the same - why a bitmap scan is preferred over a number of individual index scans with fetching first 50 elements from each. (Also, replacing LIMIT 50 to LIMIT 2 doesn't seem to change anything, although having 2 here should logically make it prefer 8 index scans with selecting 2 first elements from each over selecting 186051 rows in one bitmap index scan.)
Here is the plan for the exact query with LIMIT=2:
Here is the plan for the exact query with LIMIT=2:
CREATE TABLE roles(
id bigint NOT NULL,
id1 bigint,
created_at timestamptz NOT NULL
);
CREATE INDEX roles_asset_created_desc ON roles(id1, created_at DESC);
EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
'1001361878439251615', '1001349402553202617', '1001329448424677858',
'1001348457743394950', '1001361706624116300', '1001338330225145648',
'1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 2;
WHERE id1 IN(
'1001361878439251615', '1001349402553202617', '1001329448424677858',
'1001348457743394950', '1001361706624116300', '1001338330225145648',
'1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 2;
Limit (cost=49712.75..49712.99 rows=2 width=42) (actual time=82.611..83.462 rows=2 loops=1)
-> Gather Merge (cost=49712.75..67421.89 rows=151782 width=42) (actual time=82.611..83.459 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=48712.73..48902.46 rows=75891 width=42) (actual time=70.404..70.404 rows=2 loops=3)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Bitmap Heap Scan on roles (cost=4266.99..47953.82 rows=75891 width=42) (actual time=7.854..57.664 rows=61255 loops=3)
Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Heap Blocks: exact=6886
-> Bitmap Index Scan on roles_asset_created_desc (cost=0.00..4221.46 rows=182139 width=0) (actual time=14.031..14.031 rows=186051 loops=1)
Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Planning Time: 0.074 ms
Execution Time: 83.491 ms
-> Gather Merge (cost=49712.75..67421.89 rows=151782 width=42) (actual time=82.611..83.459 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=48712.73..48902.46 rows=75891 width=42) (actual time=70.404..70.404 rows=2 loops=3)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Bitmap Heap Scan on roles (cost=4266.99..47953.82 rows=75891 width=42) (actual time=7.854..57.664 rows=61255 loops=3)
Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Heap Blocks: exact=6886
-> Bitmap Index Scan on roles_asset_created_desc (cost=0.00..4221.46 rows=182139 width=0) (actual time=14.031..14.031 rows=186051 loops=1)
Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
Planning Time: 0.074 ms
Execution Time: 83.491 ms
On Wed, Apr 14, 2021 at 5:41 PM Michael Lewis <mlewis@entrata.com> wrote:
Your query and explain analyze output do not seem to match.Filter: (cred_id = '1001344096118566254'::bigint)I don't see anything like that in your query, nor an index that would support accomplishing that without filtering after fetching the 184k rows initially like the planner does.