Re: Fix a race condition in ConditionVariableTimedSleep() - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: Fix a race condition in ConditionVariableTimedSleep()
Date
Msg-id 2143012f-ac50-4fd2-9697-63b41484713a@postgrespro.ru
Whole thread Raw
In response to Fix a race condition in ConditionVariableTimedSleep()  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Fix a race condition in ConditionVariableTimedSleep()
List pgsql-hackers
05.05.2025 10:52, Bertrand Drouvot пишет:
> Hi hackers,
> 
> While working on wait event related stuff I observed a failed assertion:
> 
> "
> TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line:
91
> "
> 
> during pg_regress/database.
> 
> To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
> would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():


Interestingly, our colleague stepped into same problem recently [1] . It
happened because he attempted to make overcomplex timeout (SIGALARM) handler.

But his solution was a bit different [2].

[1] https://postgr.es/m/076eb7bd-52e6-4a51-ba00-c744d027b15c@postgrespro.ru
[2]
https://postgr.es/m/attachment/175030/0001-CV-correctly-handle-cv_sleep_target-change.patch

And I believe, his solution is more elegant. Doesn't it?

But in first step, I doubt there should be any thing that cancels condition
variable during WaitLatch. Most probably you did wrong thing.

We convinced the colleague to rework the code to not trigger the issue in
first place.

-- 
regards
Yura Sokolov aka funny-falcon



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Next
From: Andrei Lepikhov
Date:
Subject: Re: MergeAppend could consider sorting cheapest child path