Hi,<br /><br />Please see the following plan:<br /><br />postgres=# explain select * from small_table left outer join
big_tableusing (id);<br /> QUERY PLAN <br
/>----------------------------------------------------------------------------<br/> Hash Left Join
(cost=126408.00..142436.98rows=371 width=12)<br /> Hash Cond: (<a href="http://small_table.id">small_table.id</a> =
<ahref="http://big_table.id">big_table.id</a>)<br /> -> Seq Scan on small_table (cost=0.00..1.09 rows=9
width=8)<br/> -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)<br /> -> Seq Scan on
big_table (cost=0.00..59142.00 rows=4100000 width=8)<br />(5 rows)<br /><br />Here I have a puzzle, why not choose the
smalltable to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it? <br /><br />We
canperform this query in two phases: <br />1) inner join, using the small table to build hash table.<br />2) check
whethereach tuple in the hash table has matches before, which can be done with another flag bit<br /><br /> The only
compromiseis the output order, due to the two separate phases. Not sure whether the SQL standard requires it.<br /><br
/>Thanks,<br/>Li Jie<br /><br /><br />