Re: New hashed IN code ignores distinctiveness of subquery - Mailing list pgsql-bugs

From Bradley Baetz
Subject Re: New hashed IN code ignores distinctiveness of subquery
Date
Msg-id 20030127032257.GA3391@mango.home
Whole thread Raw
In response to Re: New hashed IN code ignores distinctiveness of subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: New hashed IN code ignores distinctiveness of subquery  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote:

> We're already checking that as a separate plan alternative.  The
> implementation could be improved a little, though --- we could combine
> the uniq-ification into the Hash node.

Right, or skip it entirely when selecting stuff with unique constraints.

> AFAICS there are two or three different concepts of selectivity being
> tossed about here.

<snip>

Ah, OK. Yeah, I think I was confusing myself there.

>
> It might be that we have to bite the bullet and set up a different
> selectivity estimation procedure for IN.  I'd prefer to avoid that
> but haven't really figured out if it's necessary or not.

I don't think it is. The number of rows is correct if you do product_id
IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on. The
only difference for a subselect is that the number of values generated
is only known approximately, rather than exactly, but I don't think that
thats relevent.

> In the meantime, I think you are right that it's bogus that
> set_joinrel_size_estimates uses different equations for JOIN_IN and
> JOIN_UNIQUE_INNER.  Whatever the decision is about how to do the
> estimation, these should be done the same way.

Hmm. So, I made that change, and now get an estimate of 50218 rows for
the join result, which is about right (although capping the result to
have a maximum of the number of input rows is valid for an inner join),
but the cost for the hashjoin is still 4281586.50. cost_hashjoin
probably needs to be taught about the short circuiting done for _IN, but
I'm not sure where to do that from a quick glance through the code.

What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
that JOIN_IN doesn't? executor/* doesn't appear to use it.

>             regards, tom lane

Bradley

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: New hashed IN code ignores distinctiveness of subquery
Next
From: Tom Lane
Date:
Subject: Re: New hashed IN code ignores distinctiveness of subquery