Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why? - Mailing list pgsql-performance

From Scott Carey
Subject Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Date
Msg-id 56326542-EFD1-493D-BD35-CD67FB004CF4@richrelevance.com
Whole thread Raw
In response to Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Mladen Gogala <mladen.gogala@vmsinfo.com>)
List pgsql-performance
On Oct 18, 2010, at 8:43 PM, Tom Lane wrote:

> Scott Carey <scott@richrelevance.com> writes:
>> I consistently see HashJoin plans that hash the large table, and scan
>> the small table.
>
> Could we see a self-contained test case?  And what cost parameters are
> you using, especially work_mem?

I'll see if I can make a test case.

Tough to do since I catch these on a production machine when a query is taking a long time, and some of the tables are
transient.
work_mem is 800MB, I tried 128K and it didn't matter.  It will switch to merge join at some point, but not a smaller
hash,and the merge join is definitely slower. 

>
>> This is especially puzzling in some cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't
ithash the small table and scan the big one? 
>
> Well, size of the table isn't the only factor; in particular, a highly
> nonuniform distribution of the key value will inflate the cost estimate
> for using a table on the inner size of the hash.  But the example you
> show here seems a bit extreme.
>

In the case today, the index scan returns unique values, and the large table has only a little skew on the join key.

Another case I ran into a few weeks ago is odd.
(8.4.3 this time)

rr=> explain INSERT INTO pav2 (p_id, a_id, values, last_updated, s_id) SELECT av.p_id, av.a_id, av.values,
av.last_updated,a.s_id  
FROM pav av, attr a where av.a_id = a.id;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Hash Join  (cost=2946093.92..631471410.73 rows=1342587125 width=69)
   Hash Cond: (a.id = av.a_id)
   ->  Seq Scan on attr a  (cost=0.00..275.21 rows=20241 width=8)
   ->  Hash  (cost=1200493.44..1200493.44 rows=70707864 width=65)
         ->  Seq Scan on pav av  (cost=0.00..1200493.44 rows=70707864 width=65)

If the cost to hash is 1200493, and it needs to probe the hash 20241 times, why would the total cost be 631471410?  The
costto probe can't be that big! A cost of 500 to probe and join?   
Why favor hashing the large table and probing with the small values rather than the other way around?

In this case, I turned enable_mergejoin off in a test because it was deciding to sort the 70M rows instead of hash 20k
rowsand scan 70M, and then got this 'backwards' hash.  The merge join is much slower, but the cost estimate is much
lessand no combination of cost parameters will make it switch (both estimates are affected up and down similarly by the
costparameters). 

Both tables analyzed, etc.  One of them is a bulk operation staging table with no indexes (the big one), but it is
analyzed. The (av.p_id, av.a_id) pair is unique in it. a.id is unique (primary key). The above thinks it is going to
match20 times on average (but it actually matches only 1 -- PK join).  av.a_id is somewhat skewed , but that is
irrelevantif it always matches one.  Even if it did match 20 on average, is it worse to probe a hash table 70M times
andretrieve 20 maches each time than probe 20k times and retrive 70000 matches each time?  Its the same number of hash
functioncalls and comparisons, but different memory profile. 




>             regards, tom lane


pgsql-performance by date:

Previous
From: AI Rumman
Date:
Subject: Re: how to get the total number of records in report
Next
From: Eric Comeau
Date:
Subject: Re: Help with duration of statement: EXECUTE [PREPARE: COMMIT]