Unexpected planner choice in simple JOIN - Mailing list pgsql-performance
| From | Mark Kirkwood |
|---|---|
| Subject | Unexpected planner choice in simple JOIN |
| Date | |
| Msg-id | 71afdb09-0583-4af2-a3b3-50da4f987b9b@gmail.com Whole thread Raw |
| Responses |
Re: Unexpected planner choice in simple JOIN
|
| List | pgsql-performance |
I have a 2 table parent child setup (tab0 -< tab1) with a fairly small (100 K rows) parent and big child (100 M rows).. Exact setup is included below. If I do a simple range scan on small part of the pk of tab0 the planner chooses an index scan (pretty much as expected): test0=# EXPLAIN ANALYZE SELECT t0.id0 FROM tab0 AS t0 WHERE t0.id0 < 5; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Only Scan using tab0_pkey on tab0 t0 (cost=0.29..4.38 rows=5 width=4) (actual time=0.006..0.007 rows=5.00 loops=1) Index Cond: (id0 < 5) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=3 Planning: Buffers: shared hit=65 Planning Time: 0.383 ms Execution Time: 0.038 ms (9 rows) However joining it to tab1 changes this to a parallel seq scan: test0=# EXPLAIN ANALYZE SELECT t0.id0, t1.val FROM tab0 AS t0 JOIN tab1 AS t1 ON (t0.id0 = t1.id0) WHERE t0.id0 < 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1097.61..42577.77 rows=5000 width=98) (actual time=0.324..7.486 rows=3500.00 loops=1) Workers Planned: 1 Workers Launched: 1 Buffers: shared hit=1751 -> Nested Loop (cost=97.61..41077.77 rows=2941 width=98) (actual time=1.273..3.723 rows=1750.00 loops=2) Buffers: shared hit=1751 -> Parallel Seq Scan on tab0 t0 (cost=0.00..2375.29 rows=3 width=4) (actual time=1.247..3.232 rows=2.50 loops=2) Filter: (id0 < 5) Rows Removed by Filter: 49998 Buffers: shared hit=1640 -> Bitmap Heap Scan on tab1 t1 (cost=97.61..12867.78 rows=3305 width=98) (actual time=0.031..0.148 rows=700.00 loops=5) Recheck Cond: (t0.id0 = id0) Heap Blocks: exact=97 Buffers: shared hit=111 -> Bitmap Index Scan on tab1_id0_hash (cost=0.00..96.79 rows=3305 width=0) (actual time=0.024..0.024 rows=700.00 loops=5) Index Cond: (id0 = t0.id0) Index Searches: 5 Buffers: shared hit=14 Planning: Buffers: shared hit=160 Planning Time: 0.506 ms Execution Time: 7.658 ms (22 rows) However, disabling seq scan gets back to the index scan again, and what looks to be a lower cost overall plan: test0=# SET enable_seqscan=off; SET Time: 0.133 ms test0=# EXPLAIN ANALYZE SELECT t0.id0, t1.val FROM tab0 AS t0 JOIN tab1 AS t1 ON (t0.id0 = t1.id0) WHERE t0.id0 < 5; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=97.91..64508.51 rows=5000 width=98) (actual time=0.044..1.093 rows=3500.00 loops=1) Buffers: shared hit=113 -> Index Only Scan using tab0_pkey on tab0 t0 (cost=0.29..4.38 rows=5 width=4) (actual time=0.004..0.007 rows=5.00 loops=1) Index Cond: (id0 < 5) Heap Fetches: 0 Index Searches: 1 Buffers: shared hit=3 -> Bitmap Heap Scan on tab1 t1 (cost=97.61..12867.78 rows=3305 width=98) (actual time=0.029..0.153 rows=700.00 loops=5) Recheck Cond: (t0.id0 = id0) Heap Blocks: exact=97 Buffers: shared hit=110 -> Bitmap Index Scan on tab1_id0_hash (cost=0.00..96.79 rows=3305 width=0) (actual time=0.022..0.022 rows=700.00 loops=5) Index Cond: (id0 = t0.id0) Index Searches: 5 Buffers: shared hit=13 Planning: Buffers: shared hit=9 Planning Time: 0.187 ms Execution Time: 1.262 ms (19 rows) This is unexpected (to me anyway). This is 19devel from earlier this week, All parameters default. I note that I'm running on NVMe storage, and should ideally lower random_page_cost - but this does not materially effect the plan for these queries. To be fair, both possible plans are pretty fast, I am intrigued that what *looks* like a higher cost plan is being chosen! The Setup ------------- CREATE TABLE tab0 ( id0 SERIAL PRIMARY KEY, val VARCHAR(100) ); CREATE TABLE tab1 ( id1 BIGSERIAL PRIMARY KEY, val VARCHAR(100), id0 INTEGER REFERENCES tab0(id0) ); -- Make 100,000 rows INSERT INTO tab0 (id0, val) SELECT generate_series(0,99999), 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; -- Make 10,000,000 rows INSERT INTO tab1 (id0, val) SELECT generate_series(0,9999999)/100, 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; -- Now make another 90,000,000 rows INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; -- Create index on foreign key INSERT INTO tab1 (id0, val) SELECT id0, val FROM tab1 LIMIT 10000000; CREATE INDEX tab1_id0_hash ON tab1 USING HASH (id0); VACUUM; ANALYZE;
pgsql-performance by date: