Home > mailing lists

Re: [PATCH] Resolve Parallel Hash Join Performance Issue - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: [PATCH] Resolve Parallel Hash Join Performance Issue
Date	January 9, 2020 10:04:27
Msg-id	CA+hUKGJbcoiMX15U0Gpv000yycFMqO2Qw-Z01ZKe5SMbWj5JBw@mail.gmail.com Whole thread Raw
In response to	[PATCH] Resolve Parallel Hash Join Performance Issue ("Deng, Gang" <gang.deng@intel.com>)
Responses	Re: [PATCH] Resolve Parallel Hash Join Performance Issue RE: [PATCH] Resolve Parallel Hash Join Performance Issue
List	pgsql-hackers

Tree view

On Thu, Jan 9, 2020 at 10:04 PM Deng, Gang <gang.deng@intel.com> wrote:
> Attached is a patch to resolve parallel hash join performance issue. This is my first time to contribute patch to
PostgreSQLcommunity, I referred one of previous thread as template to report the issue and patch. Please let me know if
needmore information of the problem and patch. 

Thank you very much for investigating this and for your report.

>         HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
>     changed to:
>
>         if (!HeapTupleHeaderHasMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)))
>
>         {
>
>             HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
>
>         }
>
>     Compared with original code, modified code can avoid unnecessary write to memory/cache.

Right, I see.  The funny thing is that the match bit is not even used
in this query (it's used for right and full hash join, and those
aren't supported for parallel joins yet).  Hmm.  So, instead of the
test you proposed, an alternative would be to use if (!parallel).
That's a value that will be constant-folded, so that there will be no
branch in the generated code (see the pg_attribute_always_inline
trick).  If, in a future release, we need the match bit for parallel
hash join because we add parallel right/full hash join support, we
could do it the way you showed, but only if it's one of those join
types, using another constant parameter.

> D. Result
>
> With the modified code, performance of hash join operation can scale better with number of threads. Here is result of
query02after patch. For example, performance improved ~2.5x when run 28 threads. 
>
> number of thread:    1       4        8     16    28
> time used(sec):    465.1  193.1   97.9   55.9  41

Wow.  That is a very nice improvement.

pgsql-hackers by date:

From: Peter Eisentraut
Date: 09 January 2020, 09:56:32
Subject: Re: Remove libpq.rc, use win32ver.rc for libpq

From: Peter Eisentraut
Date: 09 January 2020, 10:15:08
Subject: Re: remove some STATUS_* symbols

Re: [PATCH] Resolve Parallel Hash Join Performance Issue - Mailing list pgsql-hackers

Previous

Next