Optimizer isn't perfect - Mailing list pgsql-general
From | Greg Stark |
---|---|
Subject | Optimizer isn't perfect |
Date | |
Msg-id | 87lltcil77.fsf@stark.dyndns.tv Whole thread Raw |
Responses |
Re: Optimizer isn't perfect
Re: Optimizer isn't perfect |
List | pgsql-general |
Hm, here's a query where the optimizer is choosing the wrong plan by far. I think it boils down to it guessing wrong on how selective an rtree index is, which I guess would be hard to predict. Except if it guesses wrong by assuming it isn't selective it would be maybe 50% slower doing lots of index lookups instead of a more efficient full table scan and join. If it guesses wrong by assuming it'll be very selective as it is in this case then it's 1000% slower doing repeated full table scans. This is the same thing someone else pointed out a while ago. The consequences of guessing wrong and favouring a full table scan instead of index lookups with nested loops are simply much more severe than the reverse. Even if the full table scan was faster with one dataset I might prefer to use the slower index lookup and live with the lower performance but be able to guarantee that a sudden shift in data distribution or user behaviour wouldn't suddenly cause a complete outage. I'm wondering if it doesn't make sense to go into production with enable_seqscan = off. slo=> explain analyze SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id= group.group_id)) as num_xxs FROM group WHERE geom2 @ make_box(-79.3885,43.6438,65) AND earth_dist(geom, -79.3885,43.6438) < 65 ; slo-> slo-> slo-> slo-> slo-> slo-> QUERYPLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_group_geom on group (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125 loops=1) Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box) Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision) SubPlan -> Aggregate (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125) -> Hash IN Join (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125) Hash Cond: ("outer".thing_id = "inner".thing_id) -> Seq Scan on xx_thing (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423 loops=125) -> Hash (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125) -> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.21..0.42rows=8 loops=125) Index Cond: (group_id = $0) Total runtime: 7424.11 msec (12 rows) slo=> slo=> set enable_seqscan = off; SET slo=> set enable_mergejoin = off; SET slo=> explain analyze SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where group_id= group.group_id)) as num_xxs FROM group WHERE geom2 @ make_box(-79.3885,43.6438,65) AND earth_dist(geom, -79.3885,43.6438) < 65 ; slo-> slo-> slo-> slo-> slo-> slo-> QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_group_geom on group (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125 loops=1) Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box) Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double precision) SubPlan -> Aggregate (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125) -> Nested Loop (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125) -> HashAggregate (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125) -> Index Scan using idx_thing_group on thing (cost=0.00..18.97 rows=15 width=4) (actual time=0.09..0.22rows=8 loops=125) Index Cond: (group_id = $0) -> Index Scan using idx_xx_thing_loc on xx_thing (cost=0.00..8.43 rows=7 width=8) (actual time=0.05..0.08rows=1 loops=1055) Index Cond: (xx_thing.thing_id = "outer".thing_id) Total runtime: 252.60 msec (12 rows) slo=> > -- greg
pgsql-general by date: