potential deadlock in parallel hashjoin grow-buckets-barrier and blocking nodes? - Mailing list pgsql-hackers

From Luc Vlaming
Subject potential deadlock in parallel hashjoin grow-buckets-barrier and blocking nodes?
Date
Msg-id 3ddf4eab-460d-3cb7-9577-8a4e8f30954d@swarm64.com
Whole thread Raw
List pgsql-hackers
Hi,

Whilst trying to debug a deadlock in some tpc-ds query I noticed 
something that could cause problems in the hashjoin implementation and 
cause potentially deadlocks (if my analysis is right).

Whilst building the inner hash table, the whole time the grow barriers 
are attached (the PHJ_BUILD_HASHING_INNER phase).
Usually this is not a problem, however if one of the nodes blocks 
somewhere further down in the plan whilst trying to fill the inner hash 
table whilst the others are trying to e.g. extend the number of buckets 
using ExecParallelHashIncreaseNumBuckets, they would all wait until the 
blocked process comes back to the hashjoin node and also joins the effort.
Wouldn't this give potential deadlock situations? Or why would a worker 
that is hashing the inner be required to come back and join the effort 
in growing the hashbuckets?

With very skewed workloads (one node providing all data) I was at least 
able to have e.g. 3 out of 4 workers waiting in 
ExecParallelHashIncreaseNumBuckets, whilst one was in the 
execprocnode(outernode). I tried to detatch and reattach the barrier but 
this proved to be a bad idea :)

Regards,
Luc



pgsql-hackers by date:

Previous
From: Alexander Pyhalov
Date:
Subject: CTE push down
Next
From: Craig Ringer
Date:
Subject: Re: [PATCH] Identify LWLocks in tracepoints