Hello Thomas,
31.08.2023 14:15, Thomas Munro wrote:
> We have a signal that is pending and not blocked, so I don't
> immediately know why poll() hasn't returned control.
When I worked at the Postgres Pro company, we observed a similar lockup
under rather specific conditions (we used Elbrus CPU and the specific Elbrus
compiler (lcc) based on edg).
I managed to reproduce that lockup and Anton Voloshin investigated it.
The issue was caused by the compiler optimization in WaitEventSetWait():
waiting = true;
...
while (returned_events == 0)
{
...
if (set->latch && set->latch->is_set)
{
...
break;
}
In that case, compiler decided that it may place the read
"set->latch->is_set" before the write "waiting = true".
(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)
I can't provide more details for now, but maybe you could look at the binary
code generated on the target platform to confirm or reject my guess.
Best regards,
Alexander