Thread: Hash Join Optimization

Hash Join Optimization

From
"Gokulakannan Somasundaram"
Date:
Hi,<br />   I had a chance to go through the Hash join code of Postgresql and had the following thoughts.<br /><br />-
Currentlypostgres takes the heaptuple from the slot and creates and minimal_tuple and copies it into the temp file.<br
/><br/>I think the creation of minimal_tuple in the middle is a overhead which can be avoided by creating a mem-map and
directlycreating the minimal_tuple in the mem-map. Since Hash join is used mainly to join huge tables, this might
benefitthose warehouse customers of postgres.<br /><br />Am i missing something???<br /><br />Thanks,<br />Gokul.<br /> 

Re: Hash Join Optimization

From
ITAGAKI Takahiro
Date:
"Gokulakannan Somasundaram" <gokul007@gmail.com> wrote:

> I think the creation of minimal_tuple in the middle is a overhead which can
> be avoided by creating a mem-map and directly creating the minimal_tuple in
> the mem-map.

Many implementations of mem-map disallow to extend the sizes.
Do you have any solution about extending the mmap-ed region?

> Since Hash join is used mainly to join huge tables, this might
> benefit those warehouse customers of postgres.

If we use mmap, we will be restricted by virtual memory size.
It means we need to drop huge tempspace supports in 32bit platform, no?


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Hash Join Optimization

From
"Gokulakannan Somasundaram"
Date:


On Fri, Mar 28, 2008 at 2:04 PM, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:

"Gokulakannan Somasundaram" <gokul007@gmail.com> wrote:

> I think the creation of minimal_tuple in the middle is a overhead which can
> be avoided by creating a mem-map and directly creating the minimal_tuple in
> the mem-map.

Many implementations of mem-map disallow to extend the sizes.
Do you have any solution about extending the mmap-ed region?

No. i think the solution would be to unmap and remap it. But since the mmap
is local to the backend, this should not be a problem.

 


> Since Hash join is used mainly to join huge tables, this might
> benefit those warehouse customers of postgres.

If we use mmap, we will be restricted by virtual memory size.
It means we need to drop huge tempspace supports in 32bit platform, no?

Yes you are right here. i am in the mood of 64 bit platforms. In 32 bit platform this might need more work. Selectively mapping and unmapping portions of the file, based on necessity.


But my aim here is to avoid two copying. HeapTuple -> MinimalTuple and MinimalTuple -> file.  Suggestions are welcome..


Thanks,
Gokul.