Re: Wildly inaccurate query plan - Mailing list pgsql-performance

From Thom Brown
Subject Re: Wildly inaccurate query plan
Date
Msg-id AANLkTimD7onzq_mLk1bYdGoYW3yaS_iqNH53UqF_Uzyx@mail.gmail.com
Whole thread Raw
In response to Re: Wildly inaccurate query plan  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 28 May 2010 19:54, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thombrown@gmail.com> writes:
>> I get this:
>
>> Limit  (cost=0.00..316895.11 rows=400 width=211) (actual
>> time=3.880..1368.936 rows=400 loops=1)
>>    ->  GroupAggregate  (cost=0.00..41843621.95 rows=52817 width=211)
>> (actual time=3.872..1367.048 rows=400 loops=1)
>>          ->  Index Scan using "binaryID_2576_idx" on parts_2576
>> (cost=0.00..41683754.21 rows=10578624 width=211) (actual
>> time=0.284..130.756 rows=19954 loops=1)
>>                Index Cond: (("binaryID")::text >
>> '1082fa89fe499741b8271f9c92136f44'::text)
>>  Total runtime: 1370.140 ms
>
>> The first thing which strikes me is how the GroupAggregate step shows
>> it got the 400 rows which matches the limit, but it estimated 52,817
>> rows.  Shouldn't it have already known it would be 400?
>
> No.  Rowcount estimates are always in terms of what the node would emit
> if allowed to run to completion.  Likewise cost.  In this case both the
> indexscan and the groupagg are terminated early once they satisfy the
> limit.  The planner is expecting this which is why the estimated cost
> for the limit node is way less than those for its inputs.
>
> That looks like a perfectly reasonable plan from here, though it would
> probably not get chosen with a larger limit or no limit at all, since
> the ultimate costs are pretty large.  Essentially this is a fast-start
> plan rather than a lowest-total-cost plan, and that looks like the
> best bet for a small limit value.
>
>                        regards, tom lane

You're absolutely right, it's not chosen when without limit.  I see
what you mean though about terminating once it has enough rows.  It's
a shame I can't optimise it though as the real case that runs is with
a limit of 4000 which takes a long time to complete.

Thanks

Thom

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: shared_buffers advice
Next
From: Scott Marlowe
Date:
Subject: Zeus IOPS