Re: small table left outer join big table - Mailing list pgsql-hackers

From Gurjeet Singh
Subject Re: small table left outer join big table
Date
Msg-id AANLkTi=_==09etYitq0Aiysa2BPScWZcq4WSq=mwm-3t@mail.gmail.com
Whole thread Raw
In response to small table left outer join big table  (Jie Li <jay23jack@gmail.com>)
List pgsql-hackers
On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:
Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using (id);
                                 QUERY PLAN                                
----------------------------------------------------------------------------
 Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
   Hash Cond: (small_table.id = big_table.id)
   ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
   ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
         ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000 width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not sure whether the SQL standard requires it.


SQL standard does not require the result to be in any particular order unless an ORDER BY is used.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

pgsql-hackers by date:

Previous
From: Andy Colson
Date:
Subject: page compression
Next
From: Tom Lane
Date:
Subject: Re: Libpq PGRES_COPY_BOTH - version compatibility