Re: Strange query optimization in 7.3.2 - Mailing list pgsql-general

From Alec Mitchell
Subject Re: Strange query optimization in 7.3.2
Date
Msg-id 200304171401.46341.apm13@columbia.edu
Whole thread Raw
In response to Re: Strange query optimization in 7.3.2  (Alec Mitchell <apm13@columbia.edu>)
Responses Re: Strange query optimization in 7.3.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On Thursday 17 April 2003 10:12 am, Alec Mitchell wrote:
> It seems odd to me that Index Scan on manifests predicts that all rows will
> be returned, but I guess that doesn't really influence the join decision.
> Also it turns out the "better" query I was using mistakenly included
> redundant join conditions again.  The same slow execution path is choosen
> without the redundancy (silly me).  So even a query that gives accurate
> planner statistics (estimate of 62 rows vs. 52 actual rows at the join)
> results in a hash join which ignores the existing indices rather than a
> nested loop which uses them efficiently.  Maybe I should just disable
> seqscan for this query, and reenable it afterward (it will be run within a
> function, so that should be easy).  If you think that this issue merits
> further consideration, I'd be happy to send you the relevant columns of the
> stops table.  Many thanks for your help so far.
>
I'll reply to myself to provide some details on why the nested loop isn't
choosen unless enable_seqscan is set to off.  This query skips the indirect
route through the manifests table and joins directly to the stops table on
(s.trailer = tr.trailer) like this:

SELECT * from (trucks tr JOIN terminals t ON (t.terminal = tr.terminal)) JOIN
stops s ON (tr.trailer = s.trailer) WHERE ((tr.group_num = 1) AND (t.city_id
= 2) AND (s.date BETWEEN '1/1/2003' AND '1/31/2003'));

I had been doing the join on manifests because the join on an integer is
faster than a join on varchar, but manifests will grow much faster than
trailers, that combined with the unsolvably poor statistics for the tr/m join
make that a poor choice for this query as the tables grow.

With enable_seqscan set to off, I get the fastest query plan (at least with
the patched version, unpatched I have to turn off both enable_hashjoin and
enable_mergejoin).  It looks like the reason the fast nested loop isn't
choosen by default is that the planner estimates a cost of 477.54 per loop
for 62 loops, whereas the actual cost is 8.34 per loop for 52 loops.  That's
an order of magnitude farther off than the planners other cost estimates.
The EXPLAIN ANALYZE output follows:

 Nested Loop  (cost=6.02..29697.65 rows=7545 width=125) (actual
time=1.43..756.54 rows=19775 loops=1)
   ->  Hash Join  (cost=6.02..19.50 rows=62 width=37) (actual time=1.32..5.22
rows=52 loops=1)
         Hash Cond: ("outer".terminal = "inner".terminal)
         ->  Index Scan using trucks_group_idx on trucks tr  (cost=0.00..10.31
rows=319 width=21) (actual time=0.04..3.12 rows=319 loops=1)
               Index Cond: (group_num = 1)
         ->  Hash  (cost=6.00..6.00 rows=7 width=16) (actual time=0.58..0.58
rows=0 loops=1)
               ->  Index Scan using terminals_pkey on terminals t
(cost=0.00..6.00 rows=7 width=16) (actual time=0.38..0.55 rows=7 loops=1)
                     Filter: (city_id = 2)
   ->  Index Scan using stops_trailer_date_idx on stops s  (cost=0.00..476.91
rows=124 width=88) (actual time=0.05..7.89 rows=380 loops=52)
         Index Cond: (("outer".trailer = s.trailer) AND (s.date >=
'01/01/2003'::date) AND (s.date <= '01/31/2003'::date))
 Total runtime: 785.12 msec


Thanks Again,
Alec Mitchell


pgsql-general by date:

Previous
From: "Andrew J. Kopciuch"
Date:
Subject: Packages question
Next
From: Tom Lane
Date:
Subject: Re: How foreign key info is stored