Home > mailing lists

Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: HashJoin w/option to unique-ify inner rel
Date	April 24, 2009 23:49:32
Msg-id	24756.1240627763@sss.pgh.pa.us Whole thread Raw
In response to	Re: HashJoin w/option to unique-ify inner rel (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: HashJoin w/option to unique-ify inner rel Re: HashJoin w/option to unique-ify inner rel
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> As far as I can tell, the focus on trying to estimate the number of
> tuples per bucket is entirely misguided.  Supposing the relation is
> mostly unique so that the values don't cluster too much, the right
> answer is (of course) NTUP_PER_BUCKET.

But the entire point of that code is to arrive at a sane estimate
when the inner relation *isn't* mostly unique and *does* cluster.
So I think you're being much too hasty to conclude that it's wrong.

> Because the extra tuples that get thrown into the bucket
> generally don't have the same hash value (or if they did, they would
> have been in the bucket either way...) and get rejected with a simple
> integer comparison, which is much cheaper than
> hash_qual_cost.per_tuple.

Yeah, we are charging more than we ought to for bucket entries that can
be rejected on the basis of hashcode comparisons.  The difficulty is to
arrive at a reasonable guess of what fraction of the bucket entries will
be so rejected, versus those that will incur a comparison-function call.
I'm leery of assuming there are no hash collisions, which is what you
seem to be proposing.
        regards, tom lane

pgsql-hackers by date:

From: Robert Haas
Date: 24 April 2009, 23:37:29
Subject: Re: HashJoin w/option to unique-ify inner rel

From: Robert Haas
Date: 25 April 2009, 00:52:59
Subject: Re: HashJoin w/option to unique-ify inner rel

Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers

Previous

Next