Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: HashJoin w/option to unique-ify inner rel |
Date | |
Msg-id | 603c8f070904151954v716a809dm1510da5a8a35154d@mail.gmail.com Whole thread Raw |
In response to | HashJoin w/option to unique-ify inner rel (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: HashJoin w/option to unique-ify inner rel
Re: HashJoin w/option to unique-ify inner rel |
List | pgsql-hackers |
On Sun, Apr 12, 2009 at 12:00 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 19, 2009 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Greg Stark <stark@enterprisedb.com> writes: >>> It's tempting to have Hash cheat and just peek at the node beneath it >>> to see if it's a HashAggregate, in which case it could call a special >>> method to request the whole hash. But it would have to know that it's >>> just a plain uniquify and not implementing a GROUP BY. >> >> More to the point, it would have to check if it's unique-ifying on the >> same columns and with the same operators as the upper hash is using. >> If we were going to do something like this, making it a real option to >> the Hash node and teaching the planner about that would be *much* >> easier, and would also allow saner cost estimation. > > I've been thinking about this some more and would like someone else to > check my work. > > The goal here is to handle a semi-join implemented as a hash join by > unique-ifying the inner rel as we are building the hash table, rather > than as a separate step. [long discussion of implementation methodology] > > Does anyone see a hole in this logic? Upon further review, it appears that a big part of this problem is that cost_hashjoin() doesn't understand that it needs cost semi-joins differently from inner or left joins. The bogus logic looks to be right here: /* * The number of tuple comparisons needed is the number of outer tuples * times the typical number of tuples ina hash bucket, which is the inner * relation size times its bucketsize fraction. At each one, we need to * evaluatethe hashjoin quals. But actually, charging the full qual eval * cost at each tuple is pessimistic, since we don'tevaluate the quals * unless the hash values match exactly. For lack of a better idea, halve * the cost estimateto allow for that. */ startup_cost += hash_qual_cost.startup; run_cost += hash_qual_cost.per_tuple * outer_path_rows * clamp_row_est(inner_path_rows * innerbucketsize) * 0.5; Of course, when the join type is JOIN_SEMI, we're going to stop looking after we find the first match, so this estimate is really far off. rhaas=# explain analyze select * from a where i in (select b.i from b); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Hash Join (cost=174.72..2095.53 rows=10 width=4) (actual time=33.274..285.627 rows=10 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=4)(actual time=0.023..125.307 rows=100000 loops=1) -> Hash (cost=174.60..174.60 rows=10 width=4) (actual time=33.209..33.209 rows=10 loops=1) -> HashAggregate (cost=174.50..174.60 rows=10 width=4) (actual time=33.161..33.175 rows=10 loops=1) -> Seq Scan on b (cost=0.00..148.80 rows=10280 width=4) (actual time=0.015..15.304 rows=10280 loops=1)Total runtime: 285.743 ms (7 rows) rhaas=# set enable_hashagg to false; set enable_sort to false; SET SET rhaas=# explain analyze select * from a where i in (select b.i from b); QUERY PLAN ------------------------------------------------------------------------------------------------------------------Hash SemiJoin (cost=277.30..130573.10 rows=10 width=4) (actual time=31.447..292.823 rows=10 loops=1) Hash Cond: (a.i = b.i) -> Seq Scan on a (cost=0.00..1443.00 rows=100000 width=4)(actual time=0.022..125.300 rows=100000 loops=1) -> Hash (cost=148.80..148.80 rows=10280 width=4) (actual time=31.392..31.392 rows=10280 loops=1) -> Seq Scan on b (cost=0.00..148.80 rows=10280 width=4) (actual time=0.014..14.958 rows=10280 loops=1)Total runtime: 293.154 ms (6 rows) The planner costs the semi-join as two orders of magnitude more expensive than the hash-join-over-hash-aggregate, but in reality the semi join is only marginally slower. The planner thinks we're going to end up wasting a lot of time walking down long hash chains and it's just not true. b contains lots of duplicates but they all hash to the same buckets, and when an a-value hashes to that bucket it's probably because we've got a match, and because it's a semi-join, finding one match is a sufficient excuse to skip the rest of the bucket.. ...Robert
pgsql-hackers by date: