Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why? - Mailing list pgsql-performance

From Scott Carey
Subject Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Date
Msg-id D716AF17-DB9E-49D2-BD27-930E81FBDFF1@richrelevance.com
Whole thread Raw
Responses Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
List pgsql-performance
Sorry for resurrecting this thread, but this has been in my outbox for months and I think it is important: On Oct 27, 2010, at 12:56 PM, Tom Lane wrote: > Scott Carey writes: > > Why does hashjoin behave poorly when the inner relation is not > > uniformly distributed and the outer is? > Because a poorly distributed inner relation leads to long hash chains. > In the very worst case, all the keys are on the same hash chain and it > degenerates to a nested-loop join. (There is an assumption in the > costing code that the longer hash chains also tend to get searched more > often, which maybe doesn't apply if the outer rel is flat, but it's not > obvious that it's safe to not assume that.) I disagree. Either 1: The estimator is wrong or 2: The hash data structure is flawed. A pathological skew case (all relations with the same key), should be _cheaper_ to probe. There should be only _one_ entry in the hash (for the one key), and that entry will be a list of all relations matching the key. Therefore, hash probes will either instantly fail to match on an empty bucket, fail to match the one key with one compare, or match the one key and join on the matching list. In particular for anti-join, high skew should be the best case scenario. A hash structure that allows multiple entries per key is inappropriate for skewed data, because it is not O(n). One that has one entry per key remains O(n) for all skew. Furthermore, the hash buckets and # of entries is proportional to n_distinct in this case, and smaller and more cache and memory friendly to probe. > Not really. It's still searching a long hash chain; maybe it will find > an exact match early in the chain, or maybe not. It's certainly not > *better* than antijoin with a well-distributed inner rel. There shouldn't be long hash chains. A good hash function + proper bucket count + one entry per key = no long chains. > Although the > point is moot, anyway, since if it's an antijoin there is only one > candidate for which rel to put on the outside. You can put either relation on the outside with an anti-join, but would need a different algorithm and cost estimator if done the other way around. Construct a hash on the join key, that keeps a list of relations per key, iterate over the other relation, and remove the key and corresponding list from the hash when there is a match, when complete the remaining items in the hash are the result of the join (also already grouped by the key). It could be terminated early if all entries are removed. This would be useful if the hash was small, the other side of the hash too large to fit in memory, and alternative was a massive sort on the other relation. Does the hash cost estimator bias towards smaller hashes due to hash probe cost increasing with hash size due to processor caching effects? Its not quite O(n) due to caching effects. > regards, tom lane

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: Linux: more cores = less concurrency.
Next
From: Scott Carey
Date:
Subject: Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?