Home > mailing lists

Re: hash join hashtable size and work_mem - Mailing list pgsql-hackers

From	Timothy J. Kordas
Subject	Re: hash join hashtable size and work_mem
Date	March 14, 2007 17:35:16
Msg-id	45F830AC.9030900@greenplum.com Whole thread Raw
In response to	Re: hash join hashtable size and work_mem (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: hash join hashtable size and work_mem (Tom Lane <tgl@sss.pgh.pa.us>) Re: hash join hashtable size and work_mem ("Simon Riggs" <simon@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> If the planner has correctly predicted the number of rows, the table
> loading should be about NTUP_PER_BUCKET in either regime.  Are you
> sure you aren't just wishing that NTUP_PER_BUCKET were smaller?

Maybe I wish NTUP_PER_BUCKET was smaller. But I don't think that's the whole 
story.

The planner estimates definitely play a role in my concern here. For 
mis-estimated inner relations, the current calculation may over-subscribe 
the hash-table even if more work_mem was available (that is, there are too 
many hash collisions *and* memory isn't being used to the fullest extent 
allowed).

I've been tracking the number of tuples which land in each bucket, and I'd 
like to see that number go down as I increase work_mem.

I would expect for the same data a hash-join with a work_mem of 256MB to run 
faster than one run with 32MB; even if the inner relation is only 30MB.

the implementation I've been experimenting with actually takes the average 
of the current implementation (ntuples/10) and the spill version 
(work_mem/(tupsize * 10).

-Tim

pgsql-hackers by date:

From: Tom Lane
Date: 14 March 2007, 16:51:14
Subject: Re: hash join hashtable size and work_mem

From: Tom Lane
Date: 14 March 2007, 17:49:02
Subject: Re: hash join hashtable size and work_mem

Re: hash join hashtable size and work_mem - Mailing list pgsql-hackers

Previous

Next