Re: Hash Join cost estimates - Mailing list pgsql-hackers

From ktm@rice.edu
Subject Re: Hash Join cost estimates
Date
Msg-id 20130404211113.GN32580@aart.rice.edu
Whole thread Raw
In response to Re: Hash Join cost estimates  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Thu, Apr 04, 2013 at 04:16:12PM -0400, Stephen Frost wrote:
> * Stephen Frost (sfrost@snowman.net) wrote:
> > It does look like reducing bucket depth, as I outlined before through
> > the use of a 2-level hashing system, might help speed up
> > ExecScanHashBucket, as it would hopefully have very few (eg: 1-2)
> > entries to consider instead of more.  Along those same lines, I really
> > wonder if we're being too generous wrt the bucket-depth goal of '10'
> > instead of, say, '1', especially when we've got plenty of work_mem
> > available.
> 
> Rerunning using a minimally configured build (only --enable-openssl
> and --enable-debug passed to configure) with NTUP_PER_BUCKET set to '1'
> results in a couple of interesting things-
> 
> First, the planner actually picks the plan to hash the small table and
> seqscan the big one.  That also, finally, turns out to be *faster* for
> this test case.
> 
> ...
> 
> I'm certainly curious about those, but I'm also very interested in the
> possibility of making NTUP_PER_BUCKET much smaller, or perhaps variable
> depending on the work_mem setting.  It's only used in
> ExecChooseHashTableSize, so while making it variable or depending on
> work_mem could slow planning down a bit, it's not a per-tuple cost item.
> 
+1 for adjusting this based on work_mem value.

Ken



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Next
From: Tom Lane
Date:
Subject: Re: CREATE EXTENSION BLOCKS