Failed assertion on standby while shutdown - Mailing list pgsql-hackers
From | Maxim Orlov |
---|---|
Subject | Failed assertion on standby while shutdown |
Date | |
Msg-id | ad4ce692cc1d89a093b471ab1d969b0b@postgrespro.ru Whole thread Raw |
Responses |
Re: Failed assertion on standby while shutdown
|
List | pgsql-hackers |
Hi, haсkers! Recently, I was doing some experiments with primary/standby instances interaction. In certain conditions I’ve got and was able to reproduce crash on failed assertion. The scenario is the following: 1. start primary server 2. start standby server by pg_basebackup -P -R -X stream -c fast -p5432 -D data 3. apply some load to the primary server by pgbench -p5432 -i -s 150 postgres 4. kill primary server (with kill -9) and keep it down 5. stop standby server by pg_ctl 6. run standby server Then any standby server termination will result in a failed assertion. The log with a backtrace is following: 2021-03-19 18:54:25.352 MSK [3508443] LOG: received fast shutdown request 2021-03-19 18:54:25.379 MSK [3508443] LOG: aborting any active transactions TRAP: FailedAssertion("SHMQueueEmpty(&(MyProc->myProcLocks[i]))", File: "/home/ziva/projects/pgpro/build-secondary/../postgrespro/src/backend/storage/lmgr/proc.c", Line: 592, PID: 3508452) postgres: walreceiver (ExceptionalCondition+0xd0)[0x555555d0526f] postgres: walreceiver (InitAuxiliaryProcess+0x31c)[0x555555b43e31] postgres: walreceiver (AuxiliaryProcessMain+0x54f)[0x55555574ae32] postgres: walreceiver (+0x530bff)[0x555555a84bff] postgres: walreceiver (+0x531044)[0x555555a85044] postgres: walreceiver (+0x530959)[0x555555a84959] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7ffff7a303c0] /lib/x86_64-linux-gnu/libc.so.6(__select+0x1a)[0x7ffff72a40da] postgres: walreceiver (+0x52bea4)[0x555555a7fea4] postgres: walreceiver (PostmasterMain+0x129f)[0x555555a7f7c1] postgres: walreceiver (+0x41ff1f)[0x555555973f1f] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffff71b30b3] postgres: walreceiver (_start+0x2e)[0x55555561abfe] After a brief investigation I found out that I can get this assert with 100% probability if I insert a sleep for about 5 sec into InitAuxiliaryProcess(void) in src/backend/storage/lmgr/proc.c: diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index 897045ee272..b5f365f426d 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -525,7 +525,7 @@ InitAuxiliaryProcess(void) if (MyProc != NULL) elog(ERROR, "you already exist"); - + pg_usleep(5000000L); /* * We use the ProcStructLock to protect assignment and releasing of * AuxiliaryProcs entries. Maybe, this kinda behaviour would appear if a computer hosting instances is under significant side load, which cause delay to start db-instances under a heavy load. Configuration for a primary server is default with "wal_level = logical" Configuration for a standby server is default with "wal_level = logical" and "primary_conninfo = 'port=5432'" I'm puzzled with this behavor. I'm pretty sure it is not what should be. Any ideas how this can be fixed? --- Best regards, Maxim Orlov.
pgsql-hackers by date: