Optimizer isn't perfect - Mailing list pgsql-general

From Greg Stark
Subject Optimizer isn't perfect
Date
Msg-id 87lltcil77.fsf@stark.dyndns.tv
Whole thread Raw
Responses Re: Optimizer isn't perfect  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Optimizer isn't perfect  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
List pgsql-general
Hm, here's a query where the optimizer is choosing the wrong plan by far. I
think it boils down to it guessing wrong on how selective an rtree index is,
which I guess would be hard to predict.

Except if it guesses wrong by assuming it isn't selective it would be maybe
50% slower doing lots of index lookups instead of a more efficient full table
scan and join. If it guesses wrong by assuming it'll be very selective as it
is in this case then it's 1000% slower doing repeated full table scans.

This is the same thing someone else pointed out a while ago. The consequences
of guessing wrong and favouring a full table scan instead of index lookups
with nested loops are simply much more severe than the reverse.

Even if the full table scan was faster with one dataset I might prefer to use
the slower index lookup and live with the lower performance but be able to
guarantee that a sudden shift in data distribution or user behaviour wouldn't
suddenly cause a complete outage. I'm wondering if it doesn't make sense to go
into production with enable_seqscan = off.


slo=> explain analyze
            SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where
group_id= group.group_id)) as num_xxs 
              FROM group
             WHERE geom2 @ make_box(-79.3885,43.6438,65)
               AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->

QUERYPLAN
      

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_group_geom on group  (cost=0.00..130.62 rows=1 width=4) (actual time=69.53..7410.31 rows=125
loops=1)
   Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
   Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double
precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double
precision)
   SubPlan
     ->  Aggregate  (cost=127.44..127.44 rows=1 width=4) (actual time=59.11..59.11 rows=1 loops=125)
           ->  Hash IN Join  (cost=19.00..127.42 rows=7 width=4) (actual time=36.63..58.66 rows=11 loops=125)
                 Hash Cond: ("outer".thing_id = "inner".thing_id)
                 ->  Seq Scan on xx_thing  (cost=0.00..81.23 rows=5423 width=8) (actual time=0.03..32.64 rows=5423
loops=125)
                 ->  Hash  (cost=18.97..18.97 rows=15 width=4) (actual time=0.59..0.59 rows=0 loops=125)
                       ->  Index Scan using idx_thing_group on thing  (cost=0.00..18.97 rows=15 width=4) (actual
time=0.21..0.42rows=8 loops=125) 
                             Index Cond: (group_id = $0)
 Total runtime: 7424.11 msec
(12 rows)

slo=> slo=> set enable_seqscan = off;
SET

slo=> set enable_mergejoin = off;
SET

slo=> explain analyze
            SELECT (select count(distinct xx_id) from xx_thing where thing_id in (select thing_id from thing where
group_id= group.group_id)) as num_xxs 
              FROM group
             WHERE geom2 @ make_box(-79.3885,43.6438,65)
               AND earth_dist(geom, -79.3885,43.6438) < 65

;
slo-> slo-> slo-> slo-> slo-> slo->
                               QUERY PLAN
                                      

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_group_geom on group  (cost=0.00..149.93 rows=1 width=4) (actual time=0.74..239.90 rows=125
loops=1)
   Index Cond: (geom2 @ '(-78.5801566350508,44.2287532037437),(-80.1968433649492,43.0588467962563)'::box)
   Filter: (sqrt((pow((80.4113732090646::double precision * (geom[0] - -79.3885::double precision)), 2::double
precision)+ pow((111.12::double precision * (geom[1] - 43.6438::double precision)), 2::double precision))) < 65::double
precision)
   SubPlan
     ->  Aggregate  (cost=146.75..146.75 rows=1 width=4) (actual time=1.81..1.81 rows=1 loops=125)
           ->  Nested Loop  (cost=19.00..146.73 rows=7 width=4) (actual time=0.65..1.60 rows=11 loops=125)
                 ->  HashAggregate  (cost=19.00..19.00 rows=15 width=4) (actual time=0.36..0.63 rows=8 loops=125)
                       ->  Index Scan using idx_thing_group on thing  (cost=0.00..18.97 rows=15 width=4) (actual
time=0.09..0.22rows=8 loops=125) 
                             Index Cond: (group_id = $0)
                 ->  Index Scan using idx_xx_thing_loc on xx_thing  (cost=0.00..8.43 rows=7 width=8) (actual
time=0.05..0.08rows=1 loops=1055) 
                       Index Cond: (xx_thing.thing_id = "outer".thing_id)
 Total runtime: 252.60 msec
(12 rows)

slo=> >

--
greg

pgsql-general by date:

Previous
From: "Derrick Betts"
Date:
Subject: Errors sending Blob to DB
Next
From: Jonathan Bartlett
Date:
Subject: Re: left outer join terrible slow compared to inner join