Re: Hash Join cost estimates - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Hash Join cost estimates
Date
Msg-id 1364599244.1187.186.camel@sussancws0025
Whole thread Raw
In response to Re: Hash Join cost estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Hash Join cost estimates  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Fri, 2013-03-29 at 16:37 -0400, Tom Lane wrote: 
> Jeff Davis <pgsql@j-davis.com> writes:
> > Yes, I have run into this issue (or something very similar). I don't
> > understand why the bucketsize even matters much -- assuming few hash
> > collisions, we are not actually evaluating the quals any more times than
> > necessary. So why all of the hashjoin-specific logic in determining the
> > number of qual evaluations? The only reason I can think of is to model
> > the cost of comparing the hashes themselves.
> 
> I think the point is that there may *not* be few hash collisions ...

In Stephen's case the table was only 41KB, so something still seems off.
Maybe we should model the likelihood of a collision based on the
cardinalities (assuming a reasonably good hash function)?

Also, I think I found an important assumption that seems dubious (in
comment for estimate_hash_bucketsize()):

"If the other relation in the join has a key distribution similar to
this one's, then the most-loaded buckets are exactly those that will be
probed most often.  Therefore, the "average" bucket size for costing
purposes should really be taken as something close to the "worst case"
bucket size.  We try to estimate this by adjusting the fraction if there
are too few distinct data values, and then scaling up by the ratio of
the most common value's frequency to the average frequency."

But the key distribution is not necessarily similar at all... the large
table might have many more distinct values.

Stephen, do you think this could explain your problem?

Regards,Jeff Davis






pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [COMMITTERS] pgsql: Add parallel pg_dump option.
Next
From: Amit Kapila
Date:
Subject: Re: Getting to 9.3 beta