Re: Strange query optimization in 7.3.2 - Mailing list pgsql-general

From Alec Mitchell
Subject Re: Strange query optimization in 7.3.2
Date
Msg-id 200304171012.19563.apm13@columbia.edu
Whole thread Raw
In response to Re: Strange query optimization in 7.3.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Strange query optimization in 7.3.2  (Alec Mitchell <apm13@columbia.edu>)
List pgsql-general
On Tuesday 15 April 2003 09:40 pm, Tom Lane wrote:
> Sigh ... I do know better than to commit changes without having
> regression-tested 'em.  Honest ;-)
>
> Add this patch atop the last:
>
    It is running once again, thanks.  Unfortunately this doesn't seem to change
the query plan (actual times have changed because this is running on a faster
computer):

 Hash Join  (cost=123.97..7169.64 rows=2890 width=140) (actual
time=134.53..3106.29 rows=19775 loops=1)
   Hash Cond: ("outer".manifest = "inner".manifest)
   ->  Seq Scan on stops s  (cost=0.00..6421.15 rows=117680 width=88) (actual
time=0.19..2241.00 rows=118606 loops=1)
         Filter: ((date >= '01/01/2003'::date) AND (date <=
'01/31/2003'::date))
   ->  Hash  (cost=123.13..123.13 rows=332 width=52) (actual
time=134.09..134.09 rows=0 loops=1)
         ->  Merge Join  (cost=16.79..123.13 rows=332 width=52) (actual
time=132.26..133.88 rows=52 loops=1)
               Merge Cond: ("outer".trailer = "inner".trailer)
               ->  Index Scan using manifests_trailer_idx on manifests m
(cost=0.00..309.32 rows=13526 width=15) (actual time=51.96..83.88 rows=174
loops=1)
               ->  Sort  (cost=16.79..16.94 rows=62 width=37) (actual
time=48.68..48.74 rows=52 loops=1)
                     Sort Key: tr.trailer
                     ->  Hash Join  (cost=1.47..14.94 rows=62 width=37)
(actual time=36.33..48.31 rows=52 loops=1)
                           Hash Cond: ("outer".terminal = "inner".terminal)
                           ->  Index Scan using trucks_group_idx on trucks tr
(cost=0.00..10.31 rows=319 width=21) (actual time=29.79..41.05 rows=319
loops=1)
                                 Index Cond: (group_num = 1)
                           ->  Hash  (cost=1.45..1.45 rows=7 width=16) (actual
time=5.82..5.82 rows=0 loops=1)
                                 ->  Seq Scan on terminals t  (cost=0.00..1.45
rows=7 width=16) (actual time=5.70..5.79 rows=7 loops=1)
                                       Filter: (city_id = 2)
 Total runtime: 3130.95 msec

It seems odd to me that Index Scan on manifests predicts that all rows will be
returned, but I guess that doesn't really influence the join decision.  Also
it turns out the "better" query I was using mistakenly included redundant
join conditions again.  The same slow execution path is choosen without the
redundancy (silly me).  So even a query that gives accurate planner
statistics (estimate of 62 rows vs. 52 actual rows at the join) results in a
hash join which ignores the existing indices rather than a nested loop which
uses them efficiently.  Maybe I should just disable seqscan for this query,
and reenable it afterward (it will be run within a function, so that should
be easy).  If you think that this issue merits further consideration, I'd be
happy to send you the relevant columns of the stops table.  Many thanks for
your help so far.

Alec Mitchell


pgsql-general by date:

Previous
From: "Cris"
Date:
Subject: Table as a column
Next
From: "Jeff MacDonald"
Date:
Subject: Re: ran an update outside of transaction, can i roll it back in anyway ?