Home > mailing lists

lockup in parallel hash join on dikkop (freebsd 14.0-current) - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	lockup in parallel hash join on dikkop (freebsd 14.0-current)
Date	January 26, 2023 23:36:06
Msg-id	b2bc5c16-899e-ca99-26ed-e623b4259ec7@enterprisedb.com Whole thread Raw
Responses	Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
List	pgsql-hackers

Tree view

Hi,

I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14)
did not report any results for a couple days, and it seems it got into
an infinite loop in REL_11_STABLE when building hash table in a parallel
hashjoin, or something like that.

It seems to be progressing now, probably because I attached gdb to the
workers to get backtraces, which does signals etc.

Anyway, in 'ps ax' I saw this:

94545  -  Ss       0:03.39 postgres: buildfarm regression [local] SELECT
94627  -  Is       0:00.03 postgres: parallel worker for PID 94545
94628  -  Is       0:00.02 postgres: parallel worker for PID 94545

and the backend was stuck waiting on this query:

    select final > 1 as multibatch
          from hash_join_batches(
        $$
          select count(*) from join_foo
            left join (select b1.id, b1.t from join_bar b1 join join_bar
b2 using (id)) ss
            on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
        $$);

This started on 2023-01-20 23:23:18.125, and the next log (after I did
the gdb stuff), is from 2023-01-26 20:05:16.751. Quite a bit of time.

It seems all three processes are doing WaitEventSetWait, either through
a ConditionVariable, or WaitLatch. But I don't have any good idea of
what might have broken - and as it got "unstuck" I can't investigate
more. But I see there's nodeHash and parallelism, and I recall there's a
lot of gotchas due to how the backends cooperate when building the hash
table, etc. Thomas, any idea what might be wrong?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

From: Peter Geoghegan
Date: 26 January 2023, 23:32:01
Subject: Re: New strategies for freezing, advancing relfrozenxid early

From: Tom Lane
Date: 26 January 2023, 23:43:25
Subject: Re: wrong Append/MergeAppend elision?

lockup in parallel hash join on dikkop (freebsd 14.0-current) - Mailing list pgsql-hackers

Attachment

Previous

Next