Unfortunately there is no pg_stat_activity data available as we are unaware of the issue until it has already happened.
The version we are on is 12.11.
I don't think it is due to locks as there are none in the logs. Vacuums are logged also and none occur before or after this event. Checkpoint timeout is set to 1 hour and these events do not coincide with checkpoints.
We've started to observe instances of one of our databases stalling for a few seconds.
We see a spike in wal write locks then nothing for a few seconds. After which we have spike latency as processes waiting to get to the db can do so.
There is nothing in the postgres logs that give us any clues to what could be happening, no locks, unusually high/long running transactions, just a pause and resume.
Could anyone give me any advice as to what to look for when it comes to checking the underlying disk that the db is on?