Re: BUG #6668: hashjoin cost problem - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #6668: hashjoin cost problem
Date
Msg-id 23375.1338504481@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #6668: hashjoin cost problem  (Postgres User <postgresuser@yahoo.com>)
List pgsql-bugs
Postgres User <postgresuser@yahoo.com> writes:
> Why is cost_hashjoin estimating 50 billion tuple comparisons for 10K rows of output though?

Well, if it hashes the smaller table, there's 100 million rows on the
outside, and each of them will probe one hash chain in the hash table.
If you're unlucky, each of those probes will hit a populated hash chain
with at least 1000 entries, leading to 100 billion comparisons.  I think
it might derate that worst-case by a factor of 2.  Now if you're lucky,
a lot of the outer tuples hit unpopulated hash chains and so the number
of comparisons is a lot less --- but in non-artificial examples,
that's not a very good bet to make.  The conservative assumption is that
both sides of the join have similar key distributions, so that the more
populated hash chains are also more likely to be probed.  The cost
estimate is therefore designed to discriminate against using an inner
relation with a non-flat distribution.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #6666: pg_upgrade 9.2beta1 plpython/plpython2
Next
From: Tom Lane
Date:
Subject: Re: BUG #6671: Killed restore command causes postmaster to exit