Re: Odd planner choice? - Mailing list pgsql-performance

From Gary Doades
Subject Re: Odd planner choice?
Date
Msg-id 416706D5.13084.2A2108B2@localhost
Whole thread Raw
In response to Re: Odd planner choice?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 8 Oct 2004 at 16:04, Tom Lane wrote:

> "Gary Doades" <gpd@gpdnet.co.uk> writes:
> > If I remove the redundant clauses, the planner now estimates 1000 rows returned from
> > the table, not unreasonable since it has no statistics. But *why* in that case, with *more*
> > estimated rows does it choose to materialize that table (26 rows) 573416 times!!!
>
> It isn't.  It's materializing that once and scanning it 573416 times,
> once for each row in the outer relation.  And this is not a bad plan
> given the estimates.  If it had stuck to what you call the good plan,
> and there *had* been 1000 rows in the temp table, that plan would have
> run 1000 times longer than it did.
>
> As a general rule, if your complaint is that you get a bad plan for an
> unanalyzed table, the response is going to be "so analyze the table".
>

The problem is in this case is that if I *do* analyse the table I *always* get the bad plan.
Bad in this case meaning the query takes a lot longer. I'm still not sure why it can't
choose the better plan by just reading the 26 rows once and index scan the
booking_plan table 26 times (as in the "good" plan).

OK, with 1000 row estimate I can see that index scanning 1000 times into the
booking_plan table would take some time, but the even if planner estimates 5 rows it still
produces the same slow query.

If I analyze the table it then knows there are 26 rows and therefore always goes slow.

This is why I am not analyzing this table, to fool the planner into thinking there is only
one row and produce a much faster access plan. Not ideal I know.

Just using one redundant clause I now get:

select distinct b.staff_id  from staff_booking b, booking_plan bp, t_search_reqt_dates rd
where b.booking_id = bp.booking_id
and rd.datetime_from <= bp.datetime_to and rd.datetime_to >= bp.datetime_from
AND bp.booking_date between rd.reqt_date-1 and rd.reqt_date+1
and rd.search_id = 13

QUERY PLAN
Unique  (cost=50885.97..50921.37 rows=462 width=4) (actual
time=35231.000..35241.000 rows=110 loops=1)
  ->  Sort  (cost=50885.97..50903.67 rows=35397 width=4) (actual
time=35231.000..35241.000 rows=2173 loops=1)
        Sort Key: b.staff_id
        ->  Hash Join  (cost=44951.32..50351.07 rows=35397 width=4) (actual
time=34530.000..35231.000 rows=2173 loops=1)
              Hash Cond: ("outer".booking_id = "inner".booking_id)
              ->  Seq Scan on staff_booking b  (cost=0.00..4233.39 rows=197877 width=8)
(actual time=0.000..351.000 rows=197877 loops=1)
              ->  Hash  (cost=44933.62..44933.62 rows=35397 width=4) (actual
time=34530.000..34530.000 rows=0 loops=1)
                    ->  Nested Loop  (cost=15.50..44933.62 rows=35397 width=4) (actual
time=8342.000..34520.000 rows=2173 loops=1)
                          Join Filter: (("inner".datetime_from <= "outer".datetime_to) AND
("inner".datetime_to >= "outer".datetime_from) AND ("outer".booking_date >=
("inner".reqt_date - 1)) AND ("outer".booking_date <= ("inner".reqt_date + 1)))
                          ->  Seq Scan on booking_plan bp  (cost=0.00..7646.08 rows=573416
width=24) (actual time=0.000..1053.000 rows=573416 loops=1)
                          ->  Materialize  (cost=15.50..15.53 rows=5 width=20) (actual
time=0.001..0.019 rows=26 loops=573416)
                                ->  Seq Scan on t_search_reqt_dates rd  (cost=0.00..15.50 rows=5
width=20) (actual time=0.000..0.000 rows=26 loops=1)
                                      Filter: (search_id = 13)
Total runtime: 35241.000 ms

If this is the only answer for now, then fair enough I will just have to do more testing.

Regards,
Gary.


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Odd planner choice?
Next
From: Josh Berkus
Date:
Subject: First set of OSDL Shared Mem scalability results, some wierdness ...