BUG #15567: Wal receiver process restart failed when a damaged walrecord arrived at standby. - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15567: Wal receiver process restart failed when a damaged walrecord arrived at standby. |
Date | |
Msg-id | 15567-a9ff747c5e7c031c@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15567: Wal receiver process restart failed when a damagedwal record arrived at standby.
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15567 Logged by: movead Email address: lichuancheng@highgo.com PostgreSQL version: 10.5 Operating system: centos7 Description: [Problem description] We use pg9.5.7 with replica and i find that if some damaged wal record arrived at standby then the startup process on standby try to restart walreceiver process but never sucess and log below Infinite cycle on standby.If i try to restart database now,it sucess. (I try it on pg10.5,the same result.) ----------------------------------------------------------------------------------------------------------------- 2018-12-15 10:38:00.302 CST,,,30928,,5c1422d5.78d0,2,,2018-12-15 05:38:29 CST,,0,FATAL,57P01,"terminating walreceiver process due to administrator command",,,,,,,,,"" 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,8,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,9,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,10,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:05.308 CST,,,5546,,5c1276c1.15aa,11,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:10.314 CST,,,5546,,5c1276c1.15aa,12,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:15.319 CST,,,5546,,5c1276c1.15aa,13,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:20.325 CST,,,5546,,5c1276c1.15aa,14,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:25.331 CST,,,5546,,5c1276c1.15aa,15,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:30.336 CST,,,5546,,5c1276c1.15aa,16,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:35.342 CST,,,5546,,5c1276c1.15aa,17,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:40.348 CST,,,5546,,5c1276c1.15aa,18,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:45.353 CST,,,5546,,5c1276c1.15aa,19,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:50.359 CST,,,5546,,5c1276c1.15aa,20,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:38:55.364 CST,,,5546,,5c1276c1.15aa,21,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" 2018-12-15 10:39:00.370 CST,,,5546,,5c1276c1.15aa,22,,2018-12-13 23:12:01 CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at 5A9/11E0A90",,,,,,,,,"" ----------------------------------------------------------------------------------------------------------------- [Code review] I read the code about startup process and walreciver process. When a damaged wal record arrived: 1.The startup process shutdown the walreciver process use function ShutdownWalRcv(). 2.The startup process start the walreciver process use funtion RequestXLogStreaming() by signal PMSIGNAL_START_WALRECEIVER. 3.The startup process try to read wal record. And now reciver process does not startup completely,so the startup process read the damaged wal record another times. And startup process set walrcv->walRcvState = WALRCV_STOPPED use function ShutdownWalRcv(). 4.go to 2. That's the infinite cycle. [Question] I wonder that if it is a bug or you just don't want restart walreciver process.
pgsql-bugs by date: