Thread: small table left outer join big table

small table left outer join big table

From

Jie Li

Date:

28 December 2010, 10:34:03

Hi,<br /><br />Please see the following plan:<br /><br />postgres=# explain select * from small_table left outer join
big_tableusing (id);<br />                                 QUERY PLAN                                 <br
/>----------------------------------------------------------------------------<br/>  Hash Left Join 
(cost=126408.00..142436.98rows=371 width=12)<br />   Hash Cond: (<a href="http://small_table.id">small_table.id</a> =
<ahref="http://big_table.id">big_table.id</a>)<br />   ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9
width=8)<br/>    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)<br />         ->  Seq Scan on
big_table (cost=0.00..59142.00 rows=4100000 width=8)<br />(5 rows)<br /><br />Here I have a puzzle, why not choose the
smalltable to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it? <br /><br />We
canperform this query in two phases: <br />1) inner join, using the small table to build hash table.<br />2) check
whethereach tuple in the hash table has matches before, which can be done with another flag bit<br /><br /> The only
compromiseis the output order, due to the two separate phases. Not sure whether the SQL standard requires it.<br /><br
/>Thanks,<br/>Li Jie<br /><br /><br />

Re: small table left outer join big table

From

Gurjeet Singh

Date:

28 December 2010, 11:10:19

On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:

Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using (id);
                                 QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
   Hash Cond: (small_table.id = big_table.id)
   -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
   -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
         -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not sure whether the SQL standard requires it.

SQL standard does not require the result to be in any particular order unless an ORDER BY is used.

Regards,

--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: small table left outer join big table

From

Robert Haas

Date:

29 December 2010, 08:17:35

On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:
> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table using
> (id);
>                                  QUERY PLAN
> ----------------------------------------------------------------------------
>  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
>    Hash Cond: (small_table.id = big_table.id)
>    ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
>    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
>          ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table? It
> can avoid multiple batches thus save significant I/O cost, isn't it?

Yeah, you'd think.  Can you post a full reproducible test case?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: small table left outer join big table

From

Simon Riggs

Date:

29 December 2010, 08:34:39

On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
> 
> Yeah, you'd think.  Can you post a full reproducible test case?

It's not a bug, that's the way it currently works. We don't need a test
case for that.

I agree that the optimisation would be a useful one.

It allows you to ask the query "Show me sales for each of my stores"
efficiently, rather than being forced to request the inner join query
"Show me the sales for each of my stores for which there have been
sales", which is a much less useful query.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: small table left outer join big table

From

Alvaro Herrera

Date:

29 December 2010, 08:40:00

Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:
> > Hi,
> >
> > Please see the following plan:
> >
> > postgres=# explain select * from small_table left outer join big_table using
> > (id);
> >                                  QUERY PLAN
> > ----------------------------------------------------------------------------
> >  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
> >    Hash Cond: (small_table.id = big_table.id)
> >    ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
> >    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
> >          ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
> > width=8)
> > (5 rows)
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
> 
> Yeah, you'd think.  Can you post a full reproducible test case?

Also, what version is this?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: small table left outer join big table

From

Robert Haas

Date:

29 December 2010, 08:59:28

On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think.  Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.

Oh, you're right.  I missed the fact that it's a left join.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: small table left outer join big table

From

Tom Lane

Date:

29 December 2010, 10:59:44

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> It's not a bug, that's the way it currently works. We don't need a test
>> case for that.

> Oh, you're right.  I missed the fact that it's a left join.

The only thing that struck me as curious about it was that the OP didn't
get a nestloop-with-inner-indexscan plan.  That would be explainable if
there was no index on the large table's "id" column ... but columns
named like that usually have indexes.

I can't get all *that* excited about complicating hash joins as
proposed.  The query is still fundamentally going to be slow because
you won't get out of having to seqscan the large table.  The only way
to make it really fast is to not read all of the large table, and
nestloop-with-inner-indexscan is the only plan type with a hope of
doing that.
        regards, tom lane

Re: small table left outer join big table

From

"Li Jie"

Date:

29 December 2010, 12:46:25

----- Original Message ----- 
From: "Alvaro Herrera" <alvherre@commandprompt.com>
To: "Robert Haas" <robertmhaas@gmail.com>
Cc: "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Wednesday, December 29, 2010 8:39 PM
Subject: Re: [HACKERS] small table left outer join big table


> Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
>> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:
>> > Hi,
>> >
>> > Please see the following plan:
>> >
>> > postgres=# explain select * from small_table left outer join big_table using
>> > (id);
>> > QUERY PLAN
>> > ----------------------------------------------------------------------------
>> > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
>> > Hash Cond: (small_table.id = big_table.id)
>> > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
>> > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
>> > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
>> > width=8)
>> > (5 rows)
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>> 
>> Yeah, you'd think.  Can you post a full reproducible test case?
> 
> Also, what version is this?
> 
> -- 
> Álvaro Herrera <alvherre@commandprompt.com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support

The version is 9.0.1.  I believe the latest version works in the same way.

Thanks,
Li Jie

Re: small table left outer join big table

From

"Li Jie"

Date:

29 December 2010, 12:46:27

Thank you for all your comments.

I think the condition of this optimization is whether the small table can fit into memory. If not, then it doesn't work
sincetwo tables still need to be written to disk. But if yes, we can save all I/O costs in the hash join process. 
 

Thanks,
Li Jie

----- Original Message ----- 
From: "Robert Haas" <robertmhaas@gmail.com>
To: "Simon Riggs" <simon@2ndquadrant.com>
Cc: "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Wednesday, December 29, 2010 8:59 PM
Subject: Re: [HACKERS] small table left outer join big table


On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.

Oh, you're right.  I missed the fact that it's a left join.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: small table left outer join big table

From

"Li Jie"

Date:

29 December 2010, 12:46:32

----- Original Message ----- 
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Robert Haas" <robertmhaas@gmail.com>
Cc: "Simon Riggs" <simon@2ndquadrant.com>; "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers"
<pgsql-hackers@postgresql.org>
Sent: Wednesday, December 29, 2010 10:59 PM
Subject: Re: [HACKERS] small table left outer join big table 


> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> It's not a bug, that's the way it currently works. We don't need a test
>>> case for that.
> 
>> Oh, you're right.  I missed the fact that it's a left join.
> 
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan.  That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
> 
> I can't get all *that* excited about complicating hash joins as
> proposed.  The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table.  The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
> 
> regards, tom lane

Yes there is no index on the joined column, otherwise nestloop-with-inner-indexscan should be preferred.

But why can't outer join be as clever as inner join? Anyway, if we unfortunately don't have available index, we have no
choicebut rely on hash join, right?
 

Thanks,
Li Jie

Re: small table left outer join big table

From

Simon Riggs

Date:

29 December 2010, 16:58:31

On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> It's not a bug, that's the way it currently works. We don't need a test
> >> case for that.
> 
> > Oh, you're right.  I missed the fact that it's a left join.
> 
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan.  That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
> 
> I can't get all *that* excited about complicating hash joins as
> proposed.  The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table.  The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

Seq scanning the big table isn't bad... we've gone to a lot of trouble
to make it easy to do this, especially with many users.

Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.

Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: small table left outer join big table

From

Jie Li

Date:

29 December 2010, 23:13:26

On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> It's not a bug, that's the way it currently works. We don't need a test
> >> case for that.
>
> > Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

Seq scanning the big table isn't bad... we've gone to a lot of trouble
to make it easy to do this, especially with many users.

Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.

Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.

Thanks for your comments. Yeah I'm excited to write code for PostgreSQL, but I'm new here
and not familiar with the code routine or patch submission. I will try to learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any pgsql-hackers working
on this.

Thanks,
Li Jie

RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From

Tom Lane

Date:

30 December 2010, 11:46:09

I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason.  The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right.  So we could
greatly reduce the scope of the dreaded "FULL JOIN is only supported
with merge-joinable join conditions" error.  (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData.  The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations.  (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.)  But we only need one bit, so what about commandeering
an infomask bit in the tuple itself?  For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2.  We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore.  But
that seems like a kluge that could wait until we really need the flag
space.

Comments?
        regards, tom lane

Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From

Robert Haas

Date:

30 December 2010, 11:50:24

On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason.  The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right.  So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error.  (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat.  It might be a lot faster in some cases, too.

> 2. The obvious way to implement this would involve adding an extra bool
> field to struct HashJoinTupleData.  The difficulty with that, and the
> reason I'd been resistant to the whole idea, is that it'd eat up a full
> word per hashtable entry because of alignment considerations.  (On
> 64-bit machines it'd be free because of alignment considerations, but
> that's cold comfort when 32-bit machines are the ones pressed for
> address space.)  But we only need one bit, so what about commandeering
> an infomask bit in the tuple itself?  For the initial implementation
> I'd be inclined to take one of the free bits in t_infomask2.  We could
> actually get away with overlaying the flag bit with one of the tuple
> visibility bits, since it will only be used in tuples that are in the
> in-memory hash table, which don't need visibility info anymore.  But
> that seems like a kluge that could wait until we really need the flag
> space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From

Tom Lane

Date:

30 December 2010, 12:35:17

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ... But we only need one bit, so what about commandeering
>> an infomask bit in the tuple itself? �For the initial implementation
>> I'd be inclined to take one of the free bits in t_infomask2. �We could
>> actually get away with overlaying the flag bit with one of the tuple
>> visibility bits, since it will only be used in tuples that are in the
>> in-memory hash table, which don't need visibility info anymore. �But
>> that seems like a kluge that could wait until we really need the flag
>> space.

> I think that's a reasonable approach, although I might be inclined to
> do the overlay sooner rather than later if it doesn't add too much
> complexity.

Well, there's no "complexity" involved, it's just which bit do we define
the macro as.  Any complexity is conceptual.
        regards, tom lane

Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From

Jie Li

Date:

30 December 2010, 12:51:21

On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason. The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right. So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error. (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat. It might be a lot faster in some cases, too.

Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join, http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

And Oracle 11g has the full hash join.

http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,

Li Jie

Re: small table left outer join big table

From

Dimitri Fontaine

Date:

30 December 2010, 17:05:26

Tom Lane <tgl@sss.pgh.pa.us> writes:
> I can't get all *that* excited about complicating hash joins as
> proposed.  The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table.  The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

That sounds somewhat like Loose Indexscan as described in the following
wiki page, right?
 http://wiki.postgresql.org/wiki/Loose_indexscan

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)

From

Simon Riggs

Date:

01 January 2011, 07:18:57

On Thu, 2010-12-30 at 10:45 -0500, Tom Lane wrote:

> Comments?

Thanks for working on this. I love the reuse of tuple flags; I can't
help feeling that opens up doors, just not sure how yet...

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services