Re: [HACKERS] WIP: [[Parallel] Shared] Hash - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] WIP: [[Parallel] Shared] Hash
Date
Msg-id CAEepm=2fE0UBOXzaBvvW4HsQZDQG4MpHBFai_T0iou0oA_VBPw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] WIP: [[Parallel] Shared] Hash  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Tue, Mar 14, 2017 at 8:03 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
> <rafia.sabih@enterprisedb.com> wrote:
>> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
>> few regressions,
>> Q21: 52 secs on HEAD and  400 secs with this patch
>
> Thanks Rafia.  Robert just pointed out off-list that there is a bogus
> 0 row estimate in here:
>
> ->  Parallel Hash Semi Join  (cost=1006599.34..1719227.30 rows=0
> width=24) (actual time=38716.488..100933.250 rows=7315896 loops=5)
>
> Will investigate, thanks.

There are two problems here.

1.  There is a pre-existing cardinality estimate problem for
semi-joins with <> filters.  The big Q21 regression reported by Rafia
is caused by that phenomenon, probably exacerbated by another bug that
allowed 0 cardinality estimates to percolate inside the planner.
Estimates have been clamped at or above 1.0 since her report by commit
1ea60ad6.

I started a new thread to discuss that because it's unrelated to this
patch, except insofar as it confuses the planner about Q21 (with or
without parallelism).  Using one possible selectivity tweak suggested
by Tom Lane, I was able to measure significant speedups on otherwise
unpatched master:

https://www.postgresql.org/message-id/CAEepm%3D11BiYUkgXZNzMtYhXh4S3a9DwUP8O%2BF2_ZPeGzzJFPbw%40mail.gmail.com

2.  If you compare master tweaked as above against the latest version
of my patch series with the tweak, then the patched version always
runs faster with 4 or more workers, but with only 1 or 2 workers Q21
is a bit slower... but not always.  I realised that there was a
bi-modal distribution of execution times.  It looks like my 'early
exit' protocol, designed to make tuple-queue deadlock impossible, is
often causing us to lose a worker.  I am working on that now.

I have code changes for Peter G's and Andres's feedback queued up and
will send a v8 series shortly, hopefully with a fix for problem 2
above.

-- 
Thomas Munro
http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Next
From: Erik Rijkers
Date:
Subject: [HACKERS] more on comments of snapbuild.c