Re: pgsql: Add parallel-aware hash joins. - Mailing list pgsql-committers

From Tom Lane
Subject Re: pgsql: Add parallel-aware hash joins.
Date
Msg-id 30655.1514673257@sss.pgh.pa.us
Whole thread Raw
In response to Re: pgsql: Add parallel-aware hash joins.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: pgsql: Add parallel-aware hash joins.  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-committers
Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> This is explained by the early exit case in
>> ExecParallelHashEnsureBatchAccessors().  With just the right timing,
>> it finishes up not reporting the true nbatch number, and never calling
>> ExecParallelHashUpdateSpacePeak().

> Hi Tom,

> You mentioned that prairiedog sees the problem about one time in
> thirty.  Would you mind checking if it goes away with this patch
> applied?

I've run 55 cycles of "make installcheck" without seeing a failure
with this patch installed.  That's not enough to be totally sure
of course, but I think this probably fixes it.

However ... I noticed that my other dinosaur gaur shows the other failure
mode we see in the buildfarm, the "increased_batches = t" diff, and
I can report that this patch does *not* help that.  The underlying
EXPLAIN output goes from something like

!  Finalize Aggregate  (cost=823.85..823.86 rows=1 width=8) (actual time=1378.102..1378.105 rows=1 loops=1)
!    ->  Gather  (cost=823.63..823.84 rows=2 width=8) (actual time=1377.909..1378.006 rows=3 loops=1)
!          Workers Planned: 2
!          Workers Launched: 2
!          ->  Partial Aggregate  (cost=823.63..823.64 rows=1 width=8) (actual time=1280.298..1280.302 rows=1 loops=3)
!                ->  Parallel Hash Join  (cost=387.50..802.80 rows=8333 width=0) (actual time=1070.179..1249.142
rows=6667loops=3) 
!                      Hash Cond: (r.id = s.id)
!                      ->  Parallel Seq Scan on simple r  (cost=0.00..250.33 rows=8333 width=4) (actual
time=0.173..62.063rows=6667 loops=3) 
!                      ->  Parallel Hash  (cost=250.33..250.33 rows=8333 width=4) (actual time=454.305..454.305
rows=6667loops=3) 
!                            Buckets: 4096  Batches: 8  Memory Usage: 208kB
!                            ->  Parallel Seq Scan on simple s  (cost=0.00..250.33 rows=8333 width=4) (actual
time=0.178..67.115rows=6667 loops=3) 
!  Planning time: 1.861 ms
!  Execution time: 1687.311 ms

to something like

!  Finalize Aggregate  (cost=823.85..823.86 rows=1 width=8) (actual time=1588.733..1588.737 rows=1 loops=1)
!    ->  Gather  (cost=823.63..823.84 rows=2 width=8) (actual time=1588.529..1588.634 rows=3 loops=1)
!          Workers Planned: 2
!          Workers Launched: 2
!          ->  Partial Aggregate  (cost=823.63..823.64 rows=1 width=8) (actual time=1492.631..1492.635 rows=1 loops=3)
!                ->  Parallel Hash Join  (cost=387.50..802.80 rows=8333 width=0) (actual time=1270.309..1451.501
rows=6667loops=3) 
!                      Hash Cond: (r.id = s.id)
!                      ->  Parallel Seq Scan on simple r  (cost=0.00..250.33 rows=8333 width=4) (actual
time=0.219..158.144rows=6667 loops=3) 
!                      ->  Parallel Hash  (cost=250.33..250.33 rows=8333 width=4) (actual time=634.614..634.614
rows=6667loops=3) 
!                            Buckets: 4096 (originally 4096)  Batches: 16 (originally 8)  Memory Usage: 176kB
!                            ->  Parallel Seq Scan on simple s  (cost=0.00..250.33 rows=8333 width=4) (actual
time=0.182..120.074rows=6667 loops=3) 
!  Planning time: 1.931 ms
!  Execution time: 2219.417 ms

so again we have a case where the plan didn't change but the execution
behavior did.  This isn't quite 100% reproducible on gaur/pademelon,
but it fails more often than not seems like, so I can poke into it
if you can say what info would be helpful.

            regards, tom lane


pgsql-committers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pgsql: Add parallel-aware hash joins.
Next
From: Thomas Munro
Date:
Subject: Re: pgsql: Add parallel-aware hash joins.