On 2/7/23 01:09, Thomas Munro wrote:
> On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> On 2/7/23 00:48, Thomas Munro wrote:
>>> On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra
>>> <tomas.vondra@enterprisedb.com> wrote:
>>>> No, I left the workload as it was for the first lockup, so `make check`
>>>> runs everything as is up until the "join" test suite.
>>>
>>> Wait, shouldn't that be join_hash?
>>
>> No, because join_hash does not exist on 11 (it was added in 12). Also,
>> it actually locked up like this - that's the lockup I reported on 28/1.
>
> Oh, good. I had been trying to repro with 12 here and forgot that you
> were looking at 11...
FYI it happened again, on a regular run of regression tests (I gave up
on trying to reproduce this - after some initial hits I didn't hit it in
a couple thousand tries so I just added the machine back to buildfarm).
Anyway, same symptoms - lockup in join_hash on PG11, leader waiting on
WaitLatch and both workers waiting on BarrierArriveAndWait. I forgot
running gdb on the second worker will get it unstuck, so I haven't been
able to collect more info.
What else do you think would be useful to collect next time?
It's hard to draw conclusions due to the low probability of the issue,
but it's pretty weird this only ever happened on 11 so far.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company