On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote:
> We're already checking that as a separate plan alternative. The
> implementation could be improved a little, though --- we could combine
> the uniq-ification into the Hash node.
Right, or skip it entirely when selecting stuff with unique constraints.
> AFAICS there are two or three different concepts of selectivity being
> tossed about here.
<snip>
Ah, OK. Yeah, I think I was confusing myself there.
>
> It might be that we have to bite the bullet and set up a different
> selectivity estimation procedure for IN. I'd prefer to avoid that
> but haven't really figured out if it's necessary or not.
I don't think it is. The number of rows is correct if you do product_id
IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on. The
only difference for a subselect is that the number of values generated
is only known approximately, rather than exactly, but I don't think that
thats relevent.
> In the meantime, I think you are right that it's bogus that
> set_joinrel_size_estimates uses different equations for JOIN_IN and
> JOIN_UNIQUE_INNER. Whatever the decision is about how to do the
> estimation, these should be done the same way.
Hmm. So, I made that change, and now get an estimate of 50218 rows for
the join result, which is about right (although capping the result to
have a maximum of the number of input rows is valid for an inner join),
but the cost for the hashjoin is still 4281586.50. cost_hashjoin
probably needs to be taught about the short circuiting done for _IN, but
I'm not sure where to do that from a quick glance through the code.
What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
that JOIN_IN doesn't? executor/* doesn't appear to use it.
> regards, tom lane
Bradley