recovery_min_apply_delay in archive recovery causes assertion failurein latch - Mailing list pgsql-hackers

From Fujii Masao
Subject recovery_min_apply_delay in archive recovery causes assertion failurein latch
Date
Msg-id CAHGQGwEyD6HdZLfdWc+95g=VQFPR4zQL4n+yHxQgGEGjaSVheQ@mail.gmail.com
Whole thread Raw
Responses Re: recovery_min_apply_delay in archive recovery causes assertionfailure in latch
Re: recovery_min_apply_delay in archive recovery causes assertionfailure in latch
List pgsql-hackers
Hi,

I got the following assertion failure when I enabled recovery_min_apply_delay
and started archive recovery (i.e., I put only recovery.signal not
standby.signal).

TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File:
"latch.c", Line: 522)

Here is the example to reproduce the issue:

----------------------------
initdb -D data
pg_ctl -D data start
psql -c "alter system set recovery_min_apply_delay to '60s'"
psql -c "alter system set archive_mode to on"
psql -c "alter system set archive_command to 'cp %p ../arch/%f'"
psql -c "alter system set restore_command to 'cp ../arch/%f %p'"
mkdir arch
pg_basebackup -D bkp -c fast
pgbench -i
pgbench -t 1000
pg_ctl -D data -m i stop
rm -rf bkp/pg_wal
mv data/pg_wal bkp
rm -rf data
mv bkp data
touch data/recovery.signal
pg_ctl -D data -W start
----------------------------

The latch that causes this assertion failure is recoveryWakeupLatch.
The ownership of this latch is taken only when standby mode is
requested. But this latch can be used when starting archive recovery
with recovery_min_apply_delay set even though it's unowned.
So the assertion failure happened.

Attached patch fixes this issue by making archive recovery always ignore
recovery_min_apply_delay. This change is OK because
recovery_min_apply_delay was introduced for standby mode, I think.

This issue is not new in v12. I observed that the issue was reproduced
in v11. So the back-patch is necessary.

Regards,

-- 
Fujii Masao

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [DOC] Document concurrent index builds waiting on each other
Next
From: Alexander Korotkov
Date:
Subject: Re: Connections hang indefinitely while taking a gin index's LWLockbuffer_content lock(PG10.7)