Re: Odd planner choice? - Mailing list pgsql-performance
From | Gary Doades |
---|---|
Subject | Re: Odd planner choice? |
Date | |
Msg-id | 416706D5.13084.2A2108B2@localhost Whole thread Raw |
In response to | Re: Odd planner choice? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
On 8 Oct 2004 at 16:04, Tom Lane wrote: > "Gary Doades" <gpd@gpdnet.co.uk> writes: > > If I remove the redundant clauses, the planner now estimates 1000 rows returned from > > the table, not unreasonable since it has no statistics. But *why* in that case, with *more* > > estimated rows does it choose to materialize that table (26 rows) 573416 times!!! > > It isn't. It's materializing that once and scanning it 573416 times, > once for each row in the outer relation. And this is not a bad plan > given the estimates. If it had stuck to what you call the good plan, > and there *had* been 1000 rows in the temp table, that plan would have > run 1000 times longer than it did. > > As a general rule, if your complaint is that you get a bad plan for an > unanalyzed table, the response is going to be "so analyze the table". > The problem is in this case is that if I *do* analyse the table I *always* get the bad plan. Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't choose the better plan by just reading the 26 rows once and index scan the booking_plan table 26 times (as in the "good" plan). OK, with 1000 row estimate I can see that index scanning 1000 times into the booking_plan table would take some time, but the even if planner estimates 5 rows it still produces the same slow query. If I analyze the table it then knows there are 26 rows and therefore always goes slow. This is why I am not analyzing this table, to fool the planner into thinking there is only one row and produce a much faster access plan. Not ideal I know. Just using one redundant clause I now get: select distinct b.staff_id from staff_booking b, booking_plan bp, t_search_reqt_dates rd where b.booking_id = bp.booking_id and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1 and rd.search_id = 13 QUERY PLAN Unique (cost=50885.97..50921.37 rows=462 width=4) (actual time=35231.000..35241.000 rows=110 loops=1) -> Sort (cost=50885.97..50903.67 rows=35397 width=4) (actual time=35231.000..35241.000 rows=2173 loops=1) Sort Key: b.staff_id -> Hash Join (cost=44951.32..50351.07 rows=35397 width=4) (actual time=34530.000..35231.000 rows=2173 loops=1) Hash Cond: ("outer".booking_id = "inner".booking_id) -> Seq Scan on staff_booking b (cost=0.00..4233.39 rows=197877 width=8) (actual time=0.000..351.000 rows=197877 loops=1) -> Hash (cost=44933.62..44933.62 rows=35397 width=4) (actual time=34530.000..34530.000 rows=0 loops=1) -> Nested Loop (cost=15.50..44933.62 rows=35397 width=4) (actual time=8342.000..34520.000 rows=2173 loops=1) Join Filter: (("inner".datetime_from <= "outer".datetime_to) AND ("inner".datetime_to >= "outer".datetime_from) AND ("outer".booking_date >= ("inner".reqt_date - 1)) AND ("outer".booking_date <= ("inner".reqt_date + 1))) -> Seq Scan on booking_plan bp (cost=0.00..7646.08 rows=573416 width=24) (actual time=0.000..1053.000 rows=573416 loops=1) -> Materialize (cost=15.50..15.53 rows=5 width=20) (actual time=0.001..0.019 rows=26 loops=573416) -> Seq Scan on t_search_reqt_dates rd (cost=0.00..15.50 rows=5 width=20) (actual time=0.000..0.000 rows=26 loops=1) Filter: (search_id = 13) Total runtime: 35241.000 ms If this is the only answer for now, then fair enough I will just have to do more testing. Regards, Gary.
pgsql-performance by date: