Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets - Mailing list pgsql-hackers

From Bryce Cutt
Subject Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Date
Msg-id 1924d1180811051606w19aaf30du589e8ea10ea5534d@mail.gmail.com
Whole thread Raw
In response to Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Joshua Tolley" <eggyknap@gmail.com>)
Responses Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  (Joshua Tolley <eggyknap@gmail.com>)
Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  ("Joshua Tolley" <eggyknap@gmail.com>)
Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets  (Joshua Tolley <eggyknap@gmail.com>)
List pgsql-hackers
The error is causes by me Asserting against the wrong variable.  I
never noticed this as I apparently did not have assertions turned on
on my development machine.  That is fixed now and with the new patch
version I have attached all assertions are passing with your query and
my test queries.  I added another assertion to that section of the
code so that it is a bit more vigorous in confirming the hash table
partition is correct.  It does not change the operation of the code.

There are two partition counts.  One holds the maximum number of
buckets in the hash table and the other counts the number of actual
buckets created for hash values.  I was incorrectly testing against
the second one because that was valid before I started using a hash
table to store the buckets.

The enable_hashjoin_usestatmcvs flag was valuable for my own research
and tests and likely useful for your review but Tom is correct that it
can be removed in the final version.

- Bryce Cutt


On Wed, Nov 5, 2008 at 7:22 AM, Joshua Tolley <eggyknap@gmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, Nov 5, 2008 at 8:20 AM, Tom Lane  wrote:
>> Joshua Tolley  writes:
>>> On Mon, Oct 20, 2008 at 03:42:49PM -0700, Lawrence, Ramon wrote:
>>>> We propose a patch that improves hybrid hash join's performance for large
>>>> multi-batch joins where the probe relation has skew.
>>
>>> I also recommend modifying docs/src/sgml/config.sgml to include the
>>> enable_hashjoin_usestatmcvs option.
>>
>> If the patch is actually a win, why would we bother with such a GUC
>> at all?
>>
>>                        regards, tom lane
>
> Good point. Leaving it in place for patch review purposes is useful,
> but we can probably lose it in the end.
>
> - - Josh / eggyknap
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: http://getfiregpg.org
>
> iEYEARECAAYFAkkRujsACgkQRiRfCGf1UMNSTACfbpDSQn0HGSVr3jI30GJApcRD
> YbQAn2VZdI/aIalGBrbn1hlRWPEvbgV5
> =LKZ3
> -----END PGP SIGNATURE-----
>

Attachment

pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: The suppress_redundant_updates_trigger() works incorrectly
Next
From: Joshua Tolley
Date:
Subject: Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets