Re: HashJoin w/option to unique-ify inner rel - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HashJoin w/option to unique-ify inner rel
Date
Msg-id 603c8f070904151954v716a809dm1510da5a8a35154d@mail.gmail.com
Whole thread Raw
In response to HashJoin w/option to unique-ify inner rel  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: HashJoin w/option to unique-ify inner rel
Re: HashJoin w/option to unique-ify inner rel
List pgsql-hackers
On Sun, Apr 12, 2009 at 12:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 19, 2009 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Greg Stark <stark@enterprisedb.com> writes:
>>> It's tempting to have Hash cheat and just peek at the node beneath it
>>> to see if it's a HashAggregate, in which case it could call a special
>>> method to request the whole hash. But it would have to know that it's
>>> just a plain uniquify and not implementing a GROUP BY.
>>
>> More to the point, it would have to check if it's unique-ifying on the
>> same columns and with the same operators as the upper hash is using.
>> If we were going to do something like this, making it a real option to
>> the Hash node and teaching the planner about that would be *much*
>> easier, and would also allow saner cost estimation.
>
> I've been thinking about this some more and would like someone else to
> check my work.
>
> The goal here is to handle a semi-join implemented as a hash join by
> unique-ifying the inner rel as we are building the hash table, rather
> than as a separate step. [long discussion of implementation methodology]
>
> Does anyone see a hole in this logic?

Upon further review, it appears that a big part of this problem is
that cost_hashjoin() doesn't understand that it needs cost semi-joins
differently from inner or left joins.  The bogus logic looks to be
right here:
   /*    * The number of tuple comparisons needed is the number of outer tuples    * times the typical number of tuples
ina hash bucket, which is the inner    * relation size times its bucketsize fraction.  At each one, we need to    *
evaluatethe hashjoin quals.  But actually, charging the full qual eval    * cost at each tuple is pessimistic, since we
don'tevaluate the quals    * unless the hash values match exactly.  For lack of a better idea, halve    * the cost
estimateto allow for that.    */   startup_cost += hash_qual_cost.startup;   run_cost += hash_qual_cost.per_tuple *
 outer_path_rows * clamp_row_est(inner_path_rows *
 
innerbucketsize) * 0.5;

Of course, when the join type is JOIN_SEMI, we're going to stop
looking after we find the first match, so this estimate is really far
off.

rhaas=# explain analyze select * from a where i in (select b.i from b);
    QUERY PLAN
 

------------------------------------------------------------------------------------------------------------------------Hash
Join (cost=174.72..2095.53 rows=10 width=4) (actual
 
time=33.274..285.627 rows=10 loops=1)  Hash Cond: (a.i = b.i)  ->  Seq Scan on a  (cost=0.00..1443.00 rows=100000
width=4)(actual
 
time=0.023..125.307 rows=100000 loops=1)  ->  Hash  (cost=174.60..174.60 rows=10 width=4) (actual
time=33.209..33.209 rows=10 loops=1)        ->  HashAggregate  (cost=174.50..174.60 rows=10 width=4)
(actual time=33.161..33.175 rows=10 loops=1)              ->  Seq Scan on b  (cost=0.00..148.80 rows=10280
width=4) (actual time=0.015..15.304 rows=10280 loops=1)Total runtime: 285.743 ms
(7 rows)
rhaas=# set enable_hashagg to false; set enable_sort to false;
SET
SET
rhaas=# explain analyze select * from a where i in (select b.i from b);
 QUERY PLAN
 
------------------------------------------------------------------------------------------------------------------Hash
SemiJoin  (cost=277.30..130573.10 rows=10 width=4) (actual
 
time=31.447..292.823 rows=10 loops=1)  Hash Cond: (a.i = b.i)  ->  Seq Scan on a  (cost=0.00..1443.00 rows=100000
width=4)(actual
 
time=0.022..125.300 rows=100000 loops=1)  ->  Hash  (cost=148.80..148.80 rows=10280 width=4) (actual
time=31.392..31.392 rows=10280 loops=1)        ->  Seq Scan on b  (cost=0.00..148.80 rows=10280 width=4)
(actual time=0.014..14.958 rows=10280 loops=1)Total runtime: 293.154 ms
(6 rows)

The planner costs the semi-join as two orders of magnitude more
expensive than the hash-join-over-hash-aggregate, but in reality the
semi join is only marginally slower.  The planner thinks we're going
to end up wasting a lot of time walking down long hash chains and it's
just not true.  b contains lots of duplicates but they all hash to the
same buckets, and when an a-value hashes to that bucket it's probably
because we've got a match, and because it's a semi-join, finding one
match is a sufficient excuse to skip the rest of the bucket..

...Robert


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: psql with "Function Type" in \df
Next
From: James Robinson
Date:
Subject: NOTIFY / LISTEN silently parses and discards schema-ish portion of notification name ...