Thread: small table left outer join big table
Hi,<br /><br />Please see the following plan:<br /><br />postgres=# explain select * from small_table left outer join big_tableusing (id);<br /> QUERY PLAN <br />----------------------------------------------------------------------------<br/> Hash Left Join (cost=126408.00..142436.98rows=371 width=12)<br /> Hash Cond: (<a href="http://small_table.id">small_table.id</a> = <ahref="http://big_table.id">big_table.id</a>)<br /> -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)<br/> -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)<br /> -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 width=8)<br />(5 rows)<br /><br />Here I have a puzzle, why not choose the smalltable to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it? <br /><br />We canperform this query in two phases: <br />1) inner join, using the small table to build hash table.<br />2) check whethereach tuple in the hash table has matches before, which can be done with another flag bit<br /><br /> The only compromiseis the output order, due to the two separate phases. Not sure whether the SQL standard requires it.<br /><br />Thanks,<br/>Li Jie<br /><br /><br />
On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote:
SQL standard does not require the result to be in any particular order unless an ORDER BY is used.
Regards,
-- Hi,
Please see the following plan:
postgres=# explain select * from small_table left outer join big_table using (id);
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
Hash Cond: (small_table.id = big_table.id)
-> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
-> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
-> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 width=8)
(5 rows)
Here I have a puzzle, why not choose the small table to build hash table? It can avoid multiple batches thus save significant I/O cost, isn't it?
We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can be done with another flag bit
The only compromise is the output order, due to the two separate phases. Not sure whether the SQL standard requires it.
SQL standard does not require the result to be in any particular order unless an ORDER BY is used.
Regards,
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote: > Hi, > > Please see the following plan: > > postgres=# explain select * from small_table left outer join big_table using > (id); > QUERY PLAN > ---------------------------------------------------------------------------- > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12) > Hash Cond: (small_table.id = big_table.id) > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8) > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8) > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 > width=8) > (5 rows) > > Here I have a puzzle, why not choose the small table to build hash table? It > can avoid multiple batches thus save significant I/O cost, isn't it? Yeah, you'd think. Can you post a full reproducible test case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote: > > > > Here I have a puzzle, why not choose the small table to build hash table? It > > can avoid multiple batches thus save significant I/O cost, isn't it? > > Yeah, you'd think. Can you post a full reproducible test case? It's not a bug, that's the way it currently works. We don't need a test case for that. I agree that the optimisation would be a useful one. It allows you to ask the query "Show me sales for each of my stores" efficiently, rather than being forced to request the inner join query "Show me the sales for each of my stores for which there have been sales", which is a much less useful query. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010: > On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote: > > Hi, > > > > Please see the following plan: > > > > postgres=# explain select * from small_table left outer join big_table using > > (id); > > QUERY PLAN > > ---------------------------------------------------------------------------- > > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12) > > Hash Cond: (small_table.id = big_table.id) > > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8) > > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8) > > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 > > width=8) > > (5 rows) > > > > Here I have a puzzle, why not choose the small table to build hash table? It > > can avoid multiple batches thus save significant I/O cost, isn't it? > > Yeah, you'd think. Can you post a full reproducible test case? Also, what version is this? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote: >> > >> > Here I have a puzzle, why not choose the small table to build hash table? It >> > can avoid multiple batches thus save significant I/O cost, isn't it? >> >> Yeah, you'd think. Can you post a full reproducible test case? > > It's not a bug, that's the way it currently works. We don't need a test > case for that. > > I agree that the optimisation would be a useful one. > > It allows you to ask the query "Show me sales for each of my stores" > efficiently, rather than being forced to request the inner join query > "Show me the sales for each of my stores for which there have been > sales", which is a much less useful query. Oh, you're right. I missed the fact that it's a left join. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> It's not a bug, that's the way it currently works. We don't need a test >> case for that. > Oh, you're right. I missed the fact that it's a left join. The only thing that struck me as curious about it was that the OP didn't get a nestloop-with-inner-indexscan plan. That would be explainable if there was no index on the large table's "id" column ... but columns named like that usually have indexes. I can't get all *that* excited about complicating hash joins as proposed. The query is still fundamentally going to be slow because you won't get out of having to seqscan the large table. The only way to make it really fast is to not read all of the large table, and nestloop-with-inner-indexscan is the only plan type with a hope of doing that. regards, tom lane
----- Original Message ----- From: "Alvaro Herrera" <alvherre@commandprompt.com> To: "Robert Haas" <robertmhaas@gmail.com> Cc: "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers" <pgsql-hackers@postgresql.org> Sent: Wednesday, December 29, 2010 8:39 PM Subject: Re: [HACKERS] small table left outer join big table > Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010: >> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack@gmail.com> wrote: >> > Hi, >> > >> > Please see the following plan: >> > >> > postgres=# explain select * from small_table left outer join big_table using >> > (id); >> > QUERY PLAN >> > ---------------------------------------------------------------------------- >> > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12) >> > Hash Cond: (small_table.id = big_table.id) >> > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8) >> > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8) >> > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000 >> > width=8) >> > (5 rows) >> > >> > Here I have a puzzle, why not choose the small table to build hash table? It >> > can avoid multiple batches thus save significant I/O cost, isn't it? >> >> Yeah, you'd think. Can you post a full reproducible test case? > > Also, what version is this? > > -- > Álvaro Herrera <alvherre@commandprompt.com> > The PostgreSQL Company - Command Prompt, Inc. > PostgreSQL Replication, Consulting, Custom Development, 24x7 support The version is 9.0.1. I believe the latest version works in the same way. Thanks, Li Jie
Thank you for all your comments. I think the condition of this optimization is whether the small table can fit into memory. If not, then it doesn't work sincetwo tables still need to be written to disk. But if yes, we can save all I/O costs in the hash join process. Thanks, Li Jie ----- Original Message ----- From: "Robert Haas" <robertmhaas@gmail.com> To: "Simon Riggs" <simon@2ndquadrant.com> Cc: "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers" <pgsql-hackers@postgresql.org> Sent: Wednesday, December 29, 2010 8:59 PM Subject: Re: [HACKERS] small table left outer join big table On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote: >> > >> > Here I have a puzzle, why not choose the small table to build hash table? It >> > can avoid multiple batches thus save significant I/O cost, isn't it? >> >> Yeah, you'd think. Can you post a full reproducible test case? > > It's not a bug, that's the way it currently works. We don't need a test > case for that. > > I agree that the optimisation would be a useful one. > > It allows you to ask the query "Show me sales for each of my stores" > efficiently, rather than being forced to request the inner join query > "Show me the sales for each of my stores for which there have been > sales", which is a much less useful query. Oh, you're right. I missed the fact that it's a left join. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Robert Haas" <robertmhaas@gmail.com> Cc: "Simon Riggs" <simon@2ndquadrant.com>; "Jie Li" <jay23jack@gmail.com>; "pgsql-hackers" <pgsql-hackers@postgresql.org> Sent: Wednesday, December 29, 2010 10:59 PM Subject: Re: [HACKERS] small table left outer join big table > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> It's not a bug, that's the way it currently works. We don't need a test >>> case for that. > >> Oh, you're right. I missed the fact that it's a left join. > > The only thing that struck me as curious about it was that the OP didn't > get a nestloop-with-inner-indexscan plan. That would be explainable if > there was no index on the large table's "id" column ... but columns > named like that usually have indexes. > > I can't get all *that* excited about complicating hash joins as > proposed. The query is still fundamentally going to be slow because > you won't get out of having to seqscan the large table. The only way > to make it really fast is to not read all of the large table, and > nestloop-with-inner-indexscan is the only plan type with a hope of > doing that. > > regards, tom lane Yes there is no index on the joined column, otherwise nestloop-with-inner-indexscan should be preferred. But why can't outer join be as clever as inner join? Anyway, if we unfortunately don't have available index, we have no choicebut rely on hash join, right? Thanks, Li Jie
On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> It's not a bug, that's the way it currently works. We don't need a test > >> case for that. > > > Oh, you're right. I missed the fact that it's a left join. > > The only thing that struck me as curious about it was that the OP didn't > get a nestloop-with-inner-indexscan plan. That would be explainable if > there was no index on the large table's "id" column ... but columns > named like that usually have indexes. > > I can't get all *that* excited about complicating hash joins as > proposed. The query is still fundamentally going to be slow because > you won't get out of having to seqscan the large table. The only way > to make it really fast is to not read all of the large table, and > nestloop-with-inner-indexscan is the only plan type with a hope of > doing that. Seq scanning the big table isn't bad... we've gone to a lot of trouble to make it easy to do this, especially with many users. Maintaining many large indexes is definitely bad, all that random I/O is going to suck badly. Seems like an interesting and relatively optimisation to me. Not sure if this is a request for feature, or a proposal to write the optimisation. I hope its the latter. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
Seq scanning the big table isn't bad... we've gone to a lot of troubleOn Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> It's not a bug, that's the way it currently works. We don't need a test
> >> case for that.
>
> > Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
to make it easy to do this, especially with many users.
Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.
Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.
Thanks for your comments. Yeah I'm excited to write code for PostgreSQL, but I'm new here
and not familiar with the code routine or patch submission. I will try to learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any pgsql-hackers working
on this.
Thanks,
Li Jie
I had an epiphany about this topic, or actually two of them. 1. Whether or not you think there's a significant performance reason to support hash right joins, there's a functionality reason. The infrastructure for right join could just as easily do full joins. And AFAICS, a hash full join would only require one hashable join clause --- the other FULL JOIN ON conditions could be anything at all. This is unlike the situation for merge join, where all the JOIN ON conditions have to be mergeable or it doesn't work right. So we could greatly reduce the scope of the dreaded "FULL JOIN is only supported with merge-joinable join conditions" error. (Well, okay, it's not *that* dreaded, but people complain about it occasionally.) 2. The obvious way to implement this would involve adding an extra bool field to struct HashJoinTupleData. The difficulty with that, and the reason I'd been resistant to the whole idea, is that it'd eat up a full word per hashtable entry because of alignment considerations. (On 64-bit machines it'd be free because of alignment considerations, but that's cold comfort when 32-bit machines are the ones pressed for address space.) But we only need one bit, so what about commandeering an infomask bit in the tuple itself? For the initial implementation I'd be inclined to take one of the free bits in t_infomask2. We could actually get away with overlaying the flag bit with one of the tuple visibility bits, since it will only be used in tuples that are in the in-memory hash table, which don't need visibility info anymore. But that seems like a kluge that could wait until we really need the flag space. Comments? regards, tom lane
Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
From
Robert Haas
Date:
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I had an epiphany about this topic, or actually two of them. > > 1. Whether or not you think there's a significant performance reason > to support hash right joins, there's a functionality reason. The > infrastructure for right join could just as easily do full joins. > And AFAICS, a hash full join would only require one hashable join > clause --- the other FULL JOIN ON conditions could be anything at all. > This is unlike the situation for merge join, where all the JOIN ON > conditions have to be mergeable or it doesn't work right. So we could > greatly reduce the scope of the dreaded "FULL JOIN is only supported > with merge-joinable join conditions" error. (Well, okay, it's not > *that* dreaded, but people complain about it occasionally.) Yeah, that would be neat. It might be a lot faster in some cases, too. > 2. The obvious way to implement this would involve adding an extra bool > field to struct HashJoinTupleData. The difficulty with that, and the > reason I'd been resistant to the whole idea, is that it'd eat up a full > word per hashtable entry because of alignment considerations. (On > 64-bit machines it'd be free because of alignment considerations, but > that's cold comfort when 32-bit machines are the ones pressed for > address space.) But we only need one bit, so what about commandeering > an infomask bit in the tuple itself? For the initial implementation > I'd be inclined to take one of the free bits in t_infomask2. We could > actually get away with overlaying the flag bit with one of the tuple > visibility bits, since it will only be used in tuples that are in the > in-memory hash table, which don't need visibility info anymore. But > that seems like a kluge that could wait until we really need the flag > space. I think that's a reasonable approach, although I might be inclined to do the overlay sooner rather than later if it doesn't add too much complexity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... But we only need one bit, so what about commandeering >> an infomask bit in the tuple itself? �For the initial implementation >> I'd be inclined to take one of the free bits in t_infomask2. �We could >> actually get away with overlaying the flag bit with one of the tuple >> visibility bits, since it will only be used in tuples that are in the >> in-memory hash table, which don't need visibility info anymore. �But >> that seems like a kluge that could wait until we really need the flag >> space. > I think that's a reasonable approach, although I might be inclined to > do the overlay sooner rather than later if it doesn't add too much > complexity. Well, there's no "complexity" involved, it's just which bit do we define the macro as. Any complexity is conceptual. regards, tom lane
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Yeah, that would be neat. It might be a lot faster in some cases, too.
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason. The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right. So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error. (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)
Yeah, PostgreSQL should have this great feature.
Actually Oracle 10g already has the right hash join, http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html
And Oracle 11g has the full hash join.
Haven't checked whether other DBMS have this feature.
Thanks,
Li Jie
Tom Lane <tgl@sss.pgh.pa.us> writes: > I can't get all *that* excited about complicating hash joins as > proposed. The query is still fundamentally going to be slow because > you won't get out of having to seqscan the large table. The only way > to make it really fast is to not read all of the large table, and > nestloop-with-inner-indexscan is the only plan type with a hope of > doing that. That sounds somewhat like Loose Indexscan as described in the following wiki page, right? http://wiki.postgresql.org/wiki/Loose_indexscan Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
From
Simon Riggs
Date:
On Thu, 2010-12-30 at 10:45 -0500, Tom Lane wrote: > Comments? Thanks for working on this. I love the reuse of tuple flags; I can't help feeling that opens up doors, just not sure how yet... -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services