Thread: Hash Join Optimization
Hi,<br /> I had a chance to go through the Hash join code of Postgresql and had the following thoughts.<br /><br />- Currentlypostgres takes the heaptuple from the slot and creates and minimal_tuple and copies it into the temp file.<br /><br/>I think the creation of minimal_tuple in the middle is a overhead which can be avoided by creating a mem-map and directlycreating the minimal_tuple in the mem-map. Since Hash join is used mainly to join huge tables, this might benefitthose warehouse customers of postgres.<br /><br />Am i missing something???<br /><br />Thanks,<br />Gokul.<br />
"Gokulakannan Somasundaram" <gokul007@gmail.com> wrote: > I think the creation of minimal_tuple in the middle is a overhead which can > be avoided by creating a mem-map and directly creating the minimal_tuple in > the mem-map. Many implementations of mem-map disallow to extend the sizes. Do you have any solution about extending the mmap-ed region? > Since Hash join is used mainly to join huge tables, this might > benefit those warehouse customers of postgres. If we use mmap, we will be restricted by virtual memory size. It means we need to drop huge tempspace supports in 32bit platform, no? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
On Fri, Mar 28, 2008 at 2:04 PM, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
No. i think the solution would be to unmap and remap it. But since the mmap
is local to the backend, this should not be a problem.
Yes you are right here. i am in the mood of 64 bit platforms. In 32 bit platform this might need more work. Selectively mapping and unmapping portions of the file, based on necessity.
But my aim here is to avoid two copying. HeapTuple -> MinimalTuple and MinimalTuple -> file. Suggestions are welcome..
Thanks,
Gokul.
Many implementations of mem-map disallow to extend the sizes.
"Gokulakannan Somasundaram" <gokul007@gmail.com> wrote:
> I think the creation of minimal_tuple in the middle is a overhead which can
> be avoided by creating a mem-map and directly creating the minimal_tuple in
> the mem-map.
Do you have any solution about extending the mmap-ed region?
No. i think the solution would be to unmap and remap it. But since the mmap
is local to the backend, this should not be a problem.
If we use mmap, we will be restricted by virtual memory size.
> Since Hash join is used mainly to join huge tables, this might
> benefit those warehouse customers of postgres.
It means we need to drop huge tempspace supports in 32bit platform, no?
Yes you are right here. i am in the mood of 64 bit platforms. In 32 bit platform this might need more work. Selectively mapping and unmapping portions of the file, based on necessity.
But my aim here is to avoid two copying. HeapTuple -> MinimalTuple and MinimalTuple -> file. Suggestions are welcome..
Thanks,
Gokul.