Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets - Mailing list pgsql-hackers

From Joshua Tolley
Subject Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date
Msg-id 20081223182818.GA5867@uber
Whole thread Raw
In response to Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Robert Haas" <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Dec 23, 2008 at 10:14:29AM -0500, Robert Haas wrote:
> > It's equivalent to our assumption that distributions of values in
> > columns in the same table are independent. Making that assumption in
> > this case would probably result in occasional dramatic speed
> > improvements similar to the ones we've seen in less complex joins,
> > offset by just-as-occasional dramatic slowdowns of similar magnitude. In
> > other words, it will increase the variance of our results.
>
> Under what circumstances do you think that it would produce a dramatic
> slowdown?  I'm confused.  I thought the penalty for picking a bad set
> of values for the in-memory hash table was pretty small.
>
> ...Robert

I take that back :) I agree with what others have already said, that it
shouldn't cause dramatic slowdowns when we get it wrong.

- Josh

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Lock conflict behavior?
Next
From: "Fujii Masao"
Date:
Subject: Re: Synchronous replication, network protocol