Index use with left join - Mailing list pgsql-general
From | Julian Scarfe |
---|---|
Subject | Index use with left join |
Date | |
Msg-id | 17ad01c53c1a$1512ff70$0600a8c0@Wilbur Whole thread Raw |
Responses |
Re: Index use with left join
|
List | pgsql-general |
In a 7.4 database I have a table n of which the relevant columns see to be: Table "public.n" Column | Type | Modifiers ---------------+-----------------------------+----------- code | text | not null ref | text | not null part | integer | not null ... q_node | point | bbox | box | with indexes Indexes: "n_pkey" primary key, btree (code, ref, part) "n_bbox" rtree (bbox) Performance on simple spatial queries seem sensible. 1) Using the index explain analyze select n.ref, n.code from n where bbox && box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using n_bbox on n (cost=0.00..88.33 rows=22 width=20) (actual time=0.090..7.309 rows=396 loops=1) Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box) Total runtime: 7.713 ms 2) Filtering on a different criterion to force a sequential scan. (These 150 rows are a subset of those from query 1) explain analyze select n.ref, n.code from n where box (q_node, q_node) @ box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN ------------------------------------------------------------------------------------------------------ Seq Scan on n (cost=0.00..9087.60 rows=5 width=20) (actual time=646.273..1135.774 rows=150 loops=1) Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box) Total runtime: 1136.919 ms 3) Combining the two, the strategy seems sensible explain analyze select n.ref, n.code from n where bbox && box (point (-0.032, 0.873), point (0.017, 0.908)) and box (q_node, q_node) @ box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN --------------------------------------------------------------------------------------------------------------- Index Scan using n_bbox on n (cost=0.00..88.44 rows=1 width=20) (actual time=0.360..11.482 rows=150 loops=1) Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box) Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box) Total runtime: 11.772 ms So far so good. Now I want to left join it with another table a (again, just the columns that appear relevant) Table "public.a" Column | Type | Modifiers ---------------------+-----------------------+----------- ident | character(4) | name | character varying(30) | ... Indexes: "a_ident" unique, btree (ident) 4) First with a filter for which there's an index, like query 1 explain analyze select n.ref, n.code, a.ident, a.name from n left outer join a on (a.ident = n.code) where bbox && box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=1371.99..1449.68 rows=174 width=45) (actual time=182.082..211.555 rows=396 loops=1) Merge Cond: ("outer".code = "inner"."?column3?") -> Sort (cost=88.82..88.87 rows=22 width=20) (actual time=16.120..16.375 rows=396 loops=1) Sort Key: n.code -> Index Scan using n_bbox on n (cost=0.00..88.33 rows=22 width=20) (actual time=0.213..11.893 rows=396 loops=1) Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box) -> Sort (cost=1283.17..1308.44 rows=10105 width=25) (actual time=164.296..170.774 rows=10054 loops=1) Sort Key: (a.ident)::text -> Seq Scan on a (cost=0.00..611.05 rows=10105 width=25) (actual time=0.066..68.968 rows=10105 loops=1) Total runtime: 214.777 ms 5) Now with a filter that forces a sequential scan explain analyze select n.ref, n.code, a.ident, a.name from n left outer join a on (a.ident = n.code) where box (q_node, q_node) @ box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=10370.84..10447.06 rows=40 width=45) (actual time=912.068..914.158 rows=150 loops=1) Merge Cond: ("outer".code = "inner"."?column3?") -> Sort (cost=9087.66..9087.68 rows=5 width=20) (actual time=728.757..728.847 rows=150 loops=1) Sort Key: n.code -> Seq Scan on n (cost=0.00..9087.60 rows=5 width=20) (actual time=314.466..726.479 rows=150 loops=1) Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box) -> Sort (cost=1283.17..1308.44 rows=10105 width=25) (actual time=180.078..180.927 rows=1391 loops=1) Sort Key: (a.ident)::text -> Seq Scan on a (cost=0.00..611.05 rows=10105 width=25) (actual time=0.170..83.442 rows=10105 loops=1) Total runtime: 917.066 ms Again, so far, nothing obviously unusual. Now I combine the filters in 4 & 5 (as I did from 1 & 2 to get 3) 6) Now I combine the filters in 4 & 5 (as I did from 1 & 2 to get 3, which performed in a similar time to 1) explain analyze select n.ref, n.code, a.ident, a.name from n left outer join a on (a.ident = n.code) where bbox && box (point (-0.032, 0.873), point (0.017, 0.908)) and box (q_node, q_node) @ box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..851.06 rows=8 width=45) (actual time=11.662..7919.946 rows=150 loops=1) Join Filter: (("inner".ident)::text = "outer".code) -> Index Scan using n_bbox on n (cost=0.00..88.44 rows=1 width=20) (actual time=0.107..10.256 rows=150 loops=1) Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box) Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box) -> Seq Scan on a (cost=0.00..611.05 rows=10105 width=25) (actual time=0.006..18.044 rows=10105 loops=150) Total runtime: 7920.684 ms Whoa! Instead of a performance similar to query 4, it chooses a different strategy, and takes 40 times as long. (Both tables just analyzed.) By brute force: set enable_nestloop to off; explain analyze select n.ref, n.code, a.ident, a.name from n left outer join a on (a.ident = n.code) where bbox && box (point (-0.032, 0.873), point (0.017, 0.908)) and box (q_node, q_node) @ box (point (-0.032, 0.873), point (0.017, 0.908)) QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Merge Left Join (cost=1371.62..1447.50 rows=8 width=45) (actual time=177.273..179.341 rows=150 loops=1) Merge Cond: ("outer".code = "inner"."?column3?") -> Sort (cost=88.45..88.45 rows=1 width=20) (actual time=8.452..8.538 rows=150 loops=1) Sort Key: n.code -> Index Scan using n_bbox on n (cost=0.00..88.44 rows=1 width=20) (actual time=0.109..7.031 rows=150 loops=1) Index Cond: (bbox && '(0.017,0.908),(-0.032,0.873)'::box) Filter: (box(q_node, q_node) @ '(0.017,0.908),(-0.032,0.873)'::box) -> Sort (cost=1283.17..1308.44 rows=10105 width=25) (actual time=165.520..166.348 rows=1391 loops=1) Sort Key: (a.ident)::text -> Seq Scan on a (cost=0.00..611.05 rows=10105 width=25) (actual time=0.042..69.560 rows=10105 loops=1) Total runtime: 182.275 ms What's happening here, please? How am I misleading the planner? Is it because the index is rtree? Yes, I should consider PostGIS for spatial stuff, but I've got what I've got :-). TIA Julian Scarfe
pgsql-general by date: