Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.
Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.
This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.
Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.
slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where
group_id= group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65
;
slo-> slo-> slo-> slo-> slo-> slo->
QUERYPLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125
loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double
precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double
precision)
SubPlan
-> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
-> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
Hash Cond: ("outer".thing_id = "inner".thing_id)
-> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423
loops=125)
-> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual
time=0.21..0.42rows=8 loops=125)
Index Cond: (group_id = $0)
Total runtime: 7424.11 msec
(12 rows)
slo=> slo=> set enable_seqscan = off;
SET
slo=> set enable_mergejoin = off;
SET
slo=> explain analyze
SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where
group_id= group.group_id)) as num_xxs
FROM group
WHERE geom2 @ make_box(-79.3885,43.6438,65)
AND earth_dist(geom, -79.3885,43.6438) < 65
;
slo-> slo-> slo-> slo-> slo-> slo->
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125
loops=1)
Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double
precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double
precision)
SubPlan
-> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
-> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
-> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
-> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual
time=0.09..0.22rows=8 loops=125)
Index Cond: (group_id = $0)
-> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual
time=0.05..0.08rows=1 loops=1055)
Index Cond: (xx_thing.thing_id = "outer".thing_id)
Total runtime: 252.60 msec
(12 rows)
slo=> >
--
greg