Re: Huge overestimation in rows expected results in bad plan - Mailing list pgsql-performance

From Tom Lane
Subject Re: Huge overestimation in rows expected results in bad plan
Date
Msg-id 20765.1289346912@sss.pgh.pa.us
Whole thread Raw
In response to Re: Huge overestimation in rows expected results in bad plan  (bricklen <bricklen@gmail.com>)
Responses Re: Huge overestimation in rows expected results in bad plan  (bricklen <bricklen@gmail.com>)
List pgsql-performance
bricklen <bricklen@gmail.com> writes:
> On Tue, Nov 9, 2010 at 3:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The query doesn't seem to match the plan.  Where is that OR (c.id =
>> 38441828354::bigint) condition coming from?

> Ah sorry, I was testing it with and without that part. Here is the
> corrected query, with that as part of the join condition:

> explain analyze
> select c.id, c.transactionid, c.clickgenerated, c.confirmed,
> c.rejected, cr.rejectedreason
> from conversion c
> inner join conversionrejected cr on cr.idconversion = c.id or c.id = 38441828354
> where date = '2010-11-06'
> and idaction = 12906
> and idaffiliate = 198338
> order by transactionid;

Hm.  Well, the trouble with that query is that if there is any
conversion row with c.id = 38441828354, it will join to *every* row of
conversionrejected.  The planner not unreasonably assumes there will be
at least one such row, so it comes up with a join size estimate that's
>= size of conversionrejected; and it also tends to favor a seqscan
since it thinks it's going to have to visit every row of
conversionrejected anyway.

If you have reason to think the c.id = 38441828354 test is usually dead
code, you might see if you can get rid of it, or at least rearrange the
query as a UNION of two independent joins.

            regards, tom lane

pgsql-performance by date:

Previous
From: bricklen
Date:
Subject: Re: Huge overestimation in rows expected results in bad plan
Next
From: Tom Lane
Date:
Subject: Re: anti-join chosen even when slower than old plan