Re: Hash Join cost estimates - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Hash Join cost estimates
Date
Msg-id 20130330210702.GA4361@tamriel.snowman.net
Whole thread Raw
In response to Re: Hash Join cost estimates  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Hash Join cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
* Jeff Davis (pgsql@j-davis.com) wrote:
> In Stephen's case the table was only 41KB, so something still seems off.
> Maybe we should model the likelihood of a collision based on the
> cardinalities (assuming a reasonably good hash function)?

It's not really 'hash collisions' that we're trying to be wary of, per
se, it's the risk of duplicates.  To push this very far in the other
direction- if you have 41k of the *same value* in the small table, then
it's currently faster to build the hash table on the large table and
then seq scan the small table (10s vs. 14s on my laptop running w/o
being plugged in, so it's a bit slower).

Now, that's a pretty ridiculous case, but it seems like we're being
pretty dumb here- for every input row from the outer table, we're
looking through all 41K *duplicate* keys in that one hash bucket.  This
goes back to the suggestion I just made- if the hash bucket list
contained only unique values (which are a result of actual hash
collisions), then we'd only be doing *one* comparison for each input row
of the outer table that doesn't match- and when we found one which
*did*, we would only need to step through the dup list for that one
entry and blast back all of those rows, forgetting about the rest of the
bucket which we know can't match.

> Also, I think I found an important assumption that seems dubious (in
> comment for estimate_hash_bucketsize()):

I've wondered about that also.  It certainly seems quite bogus that we
can end up with an 'estimated # of entries in a bucket' that's larger
than the # of entries we've found for the MCV in the table, especially
*double* that.

> Stephen, do you think this could explain your problem?

As I tried to articulate in my initial email- even if we had a *perfect*
answer to "how many comparisons will we need to do", the current costing
would cause us to pick the plan that, intuitively and empirically, takes
longer (hash the 41M row table) because that cost is multiplied times
the number of outer row tables and the cpu_tuple_cost (charged to build
the hash table) isn't high enough relative to the cpu_op_cost (charged
to do the comparisons in the buckets).
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Hash Join cost estimates
Next
From: Josh Berkus
Date:
Subject: Re: Changing recovery.conf parameters into GUCs