Checkpointer sync queue fills up / loops around pg_usleep() are bad - Mailing list pgsql-hackers

From Andres Freund
Subject Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date
Msg-id 20220226213942.nb7uvb2pamyu26dj@alap3.anarazel.de
Whole thread Raw
Responses Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

In two recent investigations in occasional test failures
(019_replslot_limit.pl failures, AIO rebase) the problems are somehow tied to
checkpointer.

I don't yet know if actually causally related to precisely those failures, but
when running e.g. 027_stream_regress.pl, I see phases in which many backends
are looping in RegisterSyncRequest() repeatedly, each time sleeping with
pg_usleep(10000L).


Without adding instrumentation this is completely invisible at any log
level. There's no log messages, there's no wait events, nothing.

ISTM, we should not have any loops around pg_usleep(). And shorter term, we
shouldn't have any loops around pg_usleep() that don't emit log messages / set
wait events. Therefore I propose that we "prohibit" such loops without at
least a DEBUG2 elog() or so. It's just too hard to debug.


The reason for the sync queue filling up in 027_stream_regress.pl is actually
fairly simple:

1) The test runs with shared_buffers = 1MB, leading to a small sync queue of
   128 entries.
2) CheckpointWriteDelay() does pg_usleep(100000L)

ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync
queue is more than half full.

But at least on linux and freebsd that doesn't actually interrupt pg_usleep()
anymore (due to using signalfd / kqueue rather than a signal handler). And on
all platforms the signal might arrive just before the pg_usleep() rather than
during, also not causing usleep to be interrupted.

If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This
actually reduces the time for a single run of 027_stream_regress.pl on my
workstation noticably.  With default sleep time it's ~32s, with shortened time
it's ~27s.

I suspect we need to do something about this concrete problem for 14 and
master, because it's certainly worse than before on linux / freebsd.

I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd
require adding a new enum value to WaitEventTimeout in 14. Which probably is
fine?



Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Commitfest manager for 2022-03
Next
From: Chapman Flack
Date:
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file