Thread: star join optimization
Hello, the hybrid hash join algorithm implemented in the current version of PostgreSQL has any kind of optimization for star join queries for Data Warehouse model? Regards.
On 14/11/11 12:20, Robert Haas wrote: > On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com> wrote: >> the hybrid hash join algorithm implemented in the current version of >> PostgreSQL has any kind of optimization >> for star join queries for Data Warehouse model? > Not really. As much as possible, we try to make the query optimizer a > general-purpose tool that can handle any query you happen to throw at > it, rather than putting in special-purpose hacks to cater to specific > types of queries. I'm not aware of anything in particular that we > could do to better optimize the star-join case than what we do for any > other query. > > Now, one thing that was discussed a year or two ago was the > possibility of considering join algorithms that can handle more than > two tables at a time. Currently, we don't do that, so a four-way join > will be implemented either by joining two tables, then the other two > tables, and then the results of those; or more commonly by joining two > tables, joining the results to a third table, and then joining those > results to the final table. Due to the pipelined nature of our > executor, this works pretty well, but it's possible that there are > better algorithms out there. > Thanks Robert, I'm a new programmer in postgreSQL source code and I working in my tesis project about that optimizations to HHJ algorithm. I think so is very useful that optimizer recognize one star join and apply this optimizations.. For example, SQL Server and Oracle databases implements star join query optimizations for OLAP queries in DW. How can contribute with my tesis project to postreSQL source code? Regards. -- Rudyar Cortés. Estudiante de Ingeniería Civil Informática Universidad Técnica Federico Santa María.
On Mon, Nov 14, 2011 at 10:36 AM, Rudyar <rudyar.cortes@gmail.com> wrote: > On 14/11/11 12:20, Robert Haas wrote: >> >> On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com> wrote: >>> >>> the hybrid hash join algorithm implemented in the current version of >>> PostgreSQL has any kind of optimization >>> for star join queries for Data Warehouse model? >> >> Not really. As much as possible, we try to make the query optimizer a >> general-purpose tool that can handle any query you happen to throw at >> it, rather than putting in special-purpose hacks to cater to specific >> types of queries. I'm not aware of anything in particular that we >> could do to better optimize the star-join case than what we do for any >> other query. >> >> Now, one thing that was discussed a year or two ago was the >> possibility of considering join algorithms that can handle more than >> two tables at a time. Currently, we don't do that, so a four-way join >> will be implemented either by joining two tables, then the other two >> tables, and then the results of those; or more commonly by joining two >> tables, joining the results to a third table, and then joining those >> results to the final table. Due to the pipelined nature of our >> executor, this works pretty well, but it's possible that there are >> better algorithms out there. >> > Thanks Robert, > > I'm a new programmer in postgreSQL source code and I working in my tesis > project about that optimizations to HHJ algorithm. > I think so is very useful that optimizer recognize one star join and apply > this optimizations.. > For example, SQL Server and Oracle databases implements star join query > optimizations for OLAP queries in DW. > > How can contribute with my tesis project to postreSQL source code? A good example might be to show us some of the specific cases that you think can be improved. Perhaps with a script to set up the test data, and EXPLAIN ANALYZE output from the queries involved, and a description of where you see an opportunity for improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14/11/11 12:37, Robert Haas wrote: > On Mon, Nov 14, 2011 at 10:36 AM, Rudyar<rudyar.cortes@gmail.com> wrote: >> On 14/11/11 12:20, Robert Haas wrote: >>> On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com> wrote: >>>> the hybrid hash join algorithm implemented in the current version of >>>> PostgreSQL has any kind of optimization >>>> for star join queries for Data Warehouse model? >>> Not really. As much as possible, we try to make the query optimizer a >>> general-purpose tool that can handle any query you happen to throw at >>> it, rather than putting in special-purpose hacks to cater to specific >>> types of queries. I'm not aware of anything in particular that we >>> could do to better optimize the star-join case than what we do for any >>> other query. >>> >>> Now, one thing that was discussed a year or two ago was the >>> possibility of considering join algorithms that can handle more than >>> two tables at a time. Currently, we don't do that, so a four-way join >>> will be implemented either by joining two tables, then the other two >>> tables, and then the results of those; or more commonly by joining two >>> tables, joining the results to a third table, and then joining those >>> results to the final table. Due to the pipelined nature of our >>> executor, this works pretty well, but it's possible that there are >>> better algorithms out there. >>> >> Thanks Robert, >> >> I'm a new programmer in postgreSQL source code and I working in my tesis >> project about that optimizations to HHJ algorithm. >> I think so is very useful that optimizer recognize one star join and apply >> this optimizations.. >> For example, SQL Server and Oracle databases implements star join query >> optimizations for OLAP queries in DW. >> >> How can contribute with my tesis project to postreSQL source code? > A good example might be to show us some of the specific cases that you > think can be improved. Perhaps with a script to set up the test data, > and EXPLAIN ANALYZE output from the queries involved, and a > description of where you see an opportunity for improvement. > Ok, I'm working in that project. I will send you my results and comparision with SQL server HHJ optimization in one or two months. Regards -- Rudyar Cortés. Estudiante de Ingeniería Civil Informática Universidad Técnica Federico Santa María.
On 11/14/2011 10:44 AM, Rudyar wrote: > Ok, I'm working in that project. I will send you my results > and comparision with SQL server HHJ optimization in one or two months. Please be careful not to share here details of how features like this are built in any commercial databases you evaluate. Some of those implementations use patented design approaches that should be avoided in an open source project. Oracle, Microsoft, and DB2 are all aggressive about patenting the innovative parts of their database server code. In addition to not wanting to accidentally incorporate such a design, it's better for the PostgreSQL project to not be aware of what patents in this area exist too. We don't even want a survey of patents in this area published here because there are increased penalties for willful patent infringement. See http://en.wikipedia.org/wiki/Treble_damages for example. What this project likes best are innovative approaches from recent academic research that haven't been incorporated in any commercial products yet. A good example is how the Serializable Snapshot Isolation technique developed by Cahill and others was added to PostgreSQL 9.1: http://wiki.postgresql.org/wiki/Serializable There was less concern over accidentally duplicating a patented approach because that technique wasn't in any of the commercial databases yet. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Mon, Nov 14, 2011 at 8:25 AM, Rudyar <rudyar.cortes@gmail.com> wrote: > the hybrid hash join algorithm implemented in the current version of > PostgreSQL has any kind of optimization > for star join queries for Data Warehouse model? Not really. As much as possible, we try to make the query optimizer a general-purpose tool that can handle any query you happen to throw at it, rather than putting in special-purpose hacks to cater to specific types of queries. I'm not aware of anything in particular that we could do to better optimize the star-join case than what we do for any other query. Now, one thing that was discussed a year or two ago was the possibility of considering join algorithms that can handle more than two tables at a time. Currently, we don't do that, so a four-way join will be implemented either by joining two tables, then the other two tables, and then the results of those; or more commonly by joining two tables, joining the results to a third table, and then joining those results to the final table. Due to the pipelined nature of our executor, this works pretty well, but it's possible that there are better algorithms out there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14/11/11 13:09, Greg Smith wrote: > On 11/14/2011 10:44 AM, Rudyar wrote: >> Ok, I'm working in that project. I will send you my results >> and comparision with SQL server HHJ optimization in one or two months. > > Please be careful not to share here details of how features like this > are built in any commercial databases you evaluate. Some of those > implementations use patented design approaches that should be avoided > in an open source project. Oracle, Microsoft, and DB2 are all > aggressive about patenting the innovative parts of their database > server code. > > In addition to not wanting to accidentally incorporate such a design, > it's better for the PostgreSQL project to not be aware of what patents > in this area exist too. We don't even want a survey of patents in > this area published here because there are increased penalties for > willful patent infringement. See > http://en.wikipedia.org/wiki/Treble_damages for example. > > What this project likes best are innovative approaches from recent > academic research that haven't been incorporated in any commercial > products yet. A good example is how the Serializable Snapshot > Isolation technique developed by Cahill and others was added to > PostgreSQL 9.1: http://wiki.postgresql.org/wiki/Serializable There > was less concern over accidentally duplicating a patented approach > because that technique wasn't in any of the commercial databases yet. > Greg, Ok. I will consider your recommendations. Best Regards. -- Rudyar Cortés. Estudiante de Ingeniería Civil Informática Universidad Técnica Federico Santa María.