Re: 8.4.7, incorrect estimate - Mailing list pgsql-performance

From Wayne Conrad
Subject Re: 8.4.7, incorrect estimate
Date
Msg-id 4DBED871.4010306@yagni.com
Whole thread Raw
In response to Re: 8.4.7, incorrect estimate  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 05/02/11 08:11, Tom Lane wrote:
> Wayne Conrad<wconrad@yagni.com>  writes:
>> On 04/29/11 12:12, Kevin Grittner wrote:
>>> Out of curiosity, what do you get with?:
>>>
>>> explain analyze
>>> select
>>> page_number,
>>> ps_id,
>>> ps_page_id
>>> from ps_page p
>>> where exists
>>> (
>>> select * from documents_ps_page d
>>> where d.ps_page_id = p.ps_page_id
>>> and exists
>>> (select * from temp_document_ids t
>>> where t.document_id = d.document_id)
>>> )
>>> order by ps_page_id
>
>>    Merge Semi Join  (cost=186501.69..107938082.91 rows=29952777 width=12)
>> (actual time=242801.828..244572.318 rows=5 loops=1)
>>      Merge Cond: (p.ps_page_id = d.ps_page_id)
>>      ->   Index Scan using ps_page_pkey on ps_page p
>> (cost=0.00..2995637.47 rows=86141904 width=12) (actual
>> time=0.052..64140.510 rows=85401688 loops=1)
>>      ->   Index Scan using documents_ps_page_ps_page_id_idx on
>> documents_ps_page d  (cost=0.00..104384546.06 rows=37358320 width=4)
>> (actual time=161483.657..163254.131 rows=5 loops=1)
>>            Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
>>            SubPlan 1
>>              ->   Seq Scan on temp_doc_ids t  (cost=0.00..1.35 rows=1
>> width=0) (never executed)
>>                    Filter: (document_id = $0)
>>            SubPlan 2
>>              ->   Seq Scan on temp_doc_ids t  (cost=0.00..1.34 rows=5
>> width=35) (actual time=0.005..0.007 rows=5 loops=1)
>>    Total runtime: 244572.432 ms
>> (11 rows)
>
> [ pokes at that ... ] I think what you've got here is an oversight in
> the convert-EXISTS-to-semijoin logic: it pulls up the outer EXISTS but
> fails to recurse on it, which would be needed to convert the lower
> EXISTS into a semijoin as well, which is what's needed in order to get
> a non-bogus selectivity estimate for it.
>
> I'll take a look at fixing that, but not sure if it'll be reasonable to
> back-patch or not.  In the meantime, you need to look into restructuring
> the query to avoid nesting the EXISTS probes, if possible.
>
>             regards, tom lane
>

Tom,

Thanks for looking at this.  FYI, the same problem occurs when nesting
"where ... in (...)" (see start of thread, or I can repost it if you
want).  In any case, I can make the problem go away by using another
layer of temporary table to avoid the nesting.  That's what I'll do for now.

I'm not worried about back-patches to fix this in 8.4.  We'll be
upgrading this box to 9 at some point; we'll just pick up any fix when
it hits 9.

Best Regards,
Wayne Conrad

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: 8.4.7, incorrect estimate
Next
From: Mark
Date:
Subject: Re: Query improvement