Re: lockup in parallel hash join on dikkop (freebsd 14.0-current) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date
Msg-id eae2793c-280f-15b8-885f-d05a7cc314ae@enterprisedb.com
Whole thread Raw
In response to Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
List pgsql-hackers

On 2/7/23 01:09, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 2/7/23 00:48, Thomas Munro wrote:
>>> On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> No, I left the workload as it was for the first lockup, so `make check`
>>>> runs everything as is up until the "join" test suite.
>>>
>>> Wait, shouldn't that be join_hash?
>>
>> No, because join_hash does not exist on 11 (it was added in 12). Also,
>> it actually locked up like this - that's the lockup I reported on 28/1.
> 
> Oh, good.  I had been trying to repro with 12 here and forgot that you
> were looking at 11...

FYI it happened again, on a regular run of regression tests (I gave up
on trying to reproduce this - after some initial hits I didn't hit it in
a couple thousand tries so I just added the machine back to buildfarm).

Anyway, same symptoms - lockup in join_hash on PG11, leader waiting on
WaitLatch and both workers waiting on BarrierArriveAndWait. I forgot
running gdb on the second worker will get it unstuck, so I haven't been
able to collect more info.

What else do you think would be useful to collect next time?

It's hard to draw conclusions due to the low probability of the issue,
but it's pretty weird this only ever happened on 11 so far.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: James Cloos
Date:
Subject: deb’s pg_upgradecluster(1) vs streaming replication
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] hstore: Fix parsing on Mac OS X: isspace() is locale specific