HashJoin order, hash the large or small table? Postgres likes to hash the big one, why? - Mailing list pgsql-performance

From Scott Carey
Subject HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?
Date
Msg-id A82128A6-4E3B-43BD-858D-21B129F7BEEB@richrelevance.com
Whole thread Raw
Responses Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
8.4.5

I consistently see HashJoin plans that hash the large table, and scan the small table.  This is especially puzzling in
somecases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and scan
thebig one? 

Here is one case I saw just recently

               Hash Cond: ((a.e_id)::text = (ta.name)::text)
               ->  Index Scan using c_a_s_e_id on a  (cost=0.00..8.21 rows=14 width=27)
                     Index Cond: (id = 12)
               ->  Hash  (cost=89126.79..89126.79 rows=4825695 width=74)
                     ->  Seq Scan on p_a_1287446030 tmp  (cost=0.00..89126.79 rows=4825695 width=74)
                           Filter: (id = 12)

Does this ever make sense?  Isn't it always better to hash the smaller side of the join, or at least predominantly so?
Maybeif  you want the order of elements returning from the join to coincide with the order of the outer part of the
joinfor a join higher up the plan tree.  in this specific case, I want the order to be based on the larger table for
thejoin higher up (not shown) in the plan so that its index scan is in the order that tmp already is. 

Certainly, for very small hash tables (< 1000 entries) the cache effects strongly favor small tables -- the lookup
shouldbe very cheap.  Building a very large hash is not cheap, and wastes lots of memory.  I suppose at very large
sizessomething else might come into play that favors hashing the bigger table, but I can't think of what that would be
forthe general case. 

Any ideas?  I've seen this with dozens of queries, some simple, some with 5 or 6 tables and joins.  I even tried making
work_memvery small in a 30M row to 500 row join, and it STILL hashed the big table.  At first I thought that I was
readingthe plan wrong, but google suggests its doing what it looks like its doing.  Perhaps this is a bug? 

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: unexpected query failure: ERROR: GIN indexes do not support whole-index scans
Next
From: Jon Nelson
Date:
Subject: Re: unexpected query failure: ERROR: GIN indexes do not support whole-index scans