Thread: star join optimization

star join optimization

From
Rudyar
Date:
Hello,

the hybrid hash join algorithm implemented in the current version of 
PostgreSQL has any kind of optimization
for star join queries for Data Warehouse model?

Regards.



Re: star join optimization

From
Rudyar
Date:
On 14/11/11 12:20, Robert Haas wrote:
> On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com>  wrote:
>> the hybrid hash join algorithm implemented in the current version of
>> PostgreSQL has any kind of optimization
>> for star join queries for Data Warehouse model?
> Not really.  As much as possible, we try to make the query optimizer a
> general-purpose tool that can handle any query you happen to throw at
> it, rather than putting in special-purpose hacks to cater to specific
> types of queries.  I'm not aware of anything in particular that we
> could do to better optimize the star-join case than what we do for any
> other query.
>
> Now, one thing that was discussed a year or two ago was the
> possibility of considering join algorithms that can handle more than
> two tables at a time.  Currently, we don't do that, so a four-way join
> will be implemented either by joining two tables, then the other two
> tables, and then the results of those; or more commonly by joining two
> tables, joining the results to a third table, and then joining those
> results to the final table.  Due to the pipelined nature of our
> executor, this works pretty well, but it's possible that there are
> better algorithms out there.
>
Thanks Robert,

I'm a new programmer in postgreSQL source code and I working in my tesis 
project about that optimizations to HHJ algorithm.
I think so is very useful that optimizer recognize one star join and 
apply this optimizations..
For example, SQL Server and Oracle databases implements star join query 
optimizations for OLAP queries in DW.

How can contribute with my tesis project to postreSQL source code?

Regards.

-- 

Rudyar Cortés.
Estudiante de Ingeniería Civil Informática
Universidad Técnica Federico Santa María.



Re: star join optimization

From
Robert Haas
Date:
On Mon, Nov 14, 2011 at 10:36 AM, Rudyar <rudyar.cortes@gmail.com> wrote:
> On 14/11/11 12:20, Robert Haas wrote:
>>
>> On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com>  wrote:
>>>
>>> the hybrid hash join algorithm implemented in the current version of
>>> PostgreSQL has any kind of optimization
>>> for star join queries for Data Warehouse model?
>>
>> Not really.  As much as possible, we try to make the query optimizer a
>> general-purpose tool that can handle any query you happen to throw at
>> it, rather than putting in special-purpose hacks to cater to specific
>> types of queries.  I'm not aware of anything in particular that we
>> could do to better optimize the star-join case than what we do for any
>> other query.
>>
>> Now, one thing that was discussed a year or two ago was the
>> possibility of considering join algorithms that can handle more than
>> two tables at a time.  Currently, we don't do that, so a four-way join
>> will be implemented either by joining two tables, then the other two
>> tables, and then the results of those; or more commonly by joining two
>> tables, joining the results to a third table, and then joining those
>> results to the final table.  Due to the pipelined nature of our
>> executor, this works pretty well, but it's possible that there are
>> better algorithms out there.
>>
> Thanks Robert,
>
> I'm a new programmer in postgreSQL source code and I working in my tesis
> project about that optimizations to HHJ algorithm.
> I think so is very useful that optimizer recognize one star join and apply
> this optimizations..
> For example, SQL Server and Oracle databases implements star join query
> optimizations for OLAP queries in DW.
>
> How can contribute with my tesis project to postreSQL source code?

A good example might be to show us some of the specific cases that you
think can be improved.  Perhaps with a script to set up the test data,
and EXPLAIN ANALYZE output from the queries involved, and a
description of where you see an opportunity for improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: star join optimization

From
Rudyar
Date:
On 14/11/11 12:37, Robert Haas wrote:
> On Mon, Nov 14, 2011 at 10:36 AM, Rudyar<rudyar.cortes@gmail.com>  wrote:
>> On 14/11/11 12:20, Robert Haas wrote:
>>> On Mon, Nov 14, 2011 at 8:25 AM, Rudyar<rudyar.cortes@gmail.com>    wrote:
>>>> the hybrid hash join algorithm implemented in the current version of
>>>> PostgreSQL has any kind of optimization
>>>> for star join queries for Data Warehouse model?
>>> Not really.  As much as possible, we try to make the query optimizer a
>>> general-purpose tool that can handle any query you happen to throw at
>>> it, rather than putting in special-purpose hacks to cater to specific
>>> types of queries.  I'm not aware of anything in particular that we
>>> could do to better optimize the star-join case than what we do for any
>>> other query.
>>>
>>> Now, one thing that was discussed a year or two ago was the
>>> possibility of considering join algorithms that can handle more than
>>> two tables at a time.  Currently, we don't do that, so a four-way join
>>> will be implemented either by joining two tables, then the other two
>>> tables, and then the results of those; or more commonly by joining two
>>> tables, joining the results to a third table, and then joining those
>>> results to the final table.  Due to the pipelined nature of our
>>> executor, this works pretty well, but it's possible that there are
>>> better algorithms out there.
>>>
>> Thanks Robert,
>>
>> I'm a new programmer in postgreSQL source code and I working in my tesis
>> project about that optimizations to HHJ algorithm.
>> I think so is very useful that optimizer recognize one star join and apply
>> this optimizations..
>> For example, SQL Server and Oracle databases implements star join query
>> optimizations for OLAP queries in DW.
>>
>> How can contribute with my tesis project to postreSQL source code?
> A good example might be to show us some of the specific cases that you
> think can be improved.  Perhaps with a script to set up the test data,
> and EXPLAIN ANALYZE output from the queries involved, and a
> description of where you see an opportunity for improvement.
>
Ok, I'm working in that project. I will send you my results
and comparision with SQL server HHJ optimization in one or two months.

Regards

-- 
Rudyar Cortés.
Estudiante de Ingeniería Civil Informática
Universidad Técnica Federico Santa María.



Re: star join optimization

From
Greg Smith
Date:
On 11/14/2011 10:44 AM, Rudyar wrote:
> Ok, I'm working in that project. I will send you my results
> and comparision with SQL server HHJ optimization in one or two months.

Please be careful not to share here details of how features like this 
are built in any commercial databases you evaluate.  Some of those 
implementations use patented design approaches that should be avoided in 
an open source project.  Oracle, Microsoft, and DB2 are all aggressive 
about patenting the innovative parts of their database server code.

In addition to not wanting to accidentally incorporate such a design, 
it's better for the PostgreSQL project to not be aware of what patents 
in this area exist too.  We don't even want a survey of patents in this 
area published here because there are increased penalties for willful 
patent infringement.  See http://en.wikipedia.org/wiki/Treble_damages 
for example.

What this project likes best are innovative approaches from recent 
academic research that haven't been incorporated in any commercial 
products yet.  A good example is how the Serializable Snapshot Isolation 
technique developed by Cahill and others was added to PostgreSQL 9.1:  
http://wiki.postgresql.org/wiki/Serializable  There was less concern 
over accidentally duplicating a patented approach because that technique 
wasn't in any of the commercial databases yet.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: star join optimization

From
Robert Haas
Date:
On Mon, Nov 14, 2011 at 8:25 AM, Rudyar <rudyar.cortes@gmail.com> wrote:
> the hybrid hash join algorithm implemented in the current version of
> PostgreSQL has any kind of optimization
> for star join queries for Data Warehouse model?

Not really.  As much as possible, we try to make the query optimizer a
general-purpose tool that can handle any query you happen to throw at
it, rather than putting in special-purpose hacks to cater to specific
types of queries.  I'm not aware of anything in particular that we
could do to better optimize the star-join case than what we do for any
other query.

Now, one thing that was discussed a year or two ago was the
possibility of considering join algorithms that can handle more than
two tables at a time.  Currently, we don't do that, so a four-way join
will be implemented either by joining two tables, then the other two
tables, and then the results of those; or more commonly by joining two
tables, joining the results to a third table, and then joining those
results to the final table.  Due to the pipelined nature of our
executor, this works pretty well, but it's possible that there are
better algorithms out there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: star join optimization

From
Rudyar
Date:
On 14/11/11 13:09, Greg Smith wrote:
> On 11/14/2011 10:44 AM, Rudyar wrote:
>> Ok, I'm working in that project. I will send you my results
>> and comparision with SQL server HHJ optimization in one or two months.
>
> Please be careful not to share here details of how features like this 
> are built in any commercial databases you evaluate.  Some of those 
> implementations use patented design approaches that should be avoided 
> in an open source project.  Oracle, Microsoft, and DB2 are all 
> aggressive about patenting the innovative parts of their database 
> server code.
>
> In addition to not wanting to accidentally incorporate such a design, 
> it's better for the PostgreSQL project to not be aware of what patents 
> in this area exist too.  We don't even want a survey of patents in 
> this area published here because there are increased penalties for 
> willful patent infringement.  See 
> http://en.wikipedia.org/wiki/Treble_damages for example.
>
> What this project likes best are innovative approaches from recent 
> academic research that haven't been incorporated in any commercial 
> products yet.  A good example is how the Serializable Snapshot 
> Isolation technique developed by Cahill and others was added to 
> PostgreSQL 9.1:  http://wiki.postgresql.org/wiki/Serializable  There 
> was less concern over accidentally duplicating a patented approach 
> because that technique wasn't in any of the commercial databases yet.
>
Greg,

Ok. I will consider your recommendations.

Best Regards.

-- 
Rudyar Cortés.
Estudiante de Ingeniería Civil Informática
Universidad Técnica Federico Santa María.