Home > mailing lists

BUG #15567: Wal receiver process restart failed when a damaged walrecord arrived at standby. - Mailing list pgsql-bugs

From	PG Bug reporting form
Subject	BUG #15567: Wal receiver process restart failed when a damaged walrecord arrived at standby.
Date	December 28, 2018 02:01:23
Msg-id	15567-a9ff747c5e7c031c@postgresql.org Whole thread Raw
Responses	Re: BUG #15567: Wal receiver process restart failed when a damagedwal record arrived at standby.
List	pgsql-bugs

Tree view

The following bug has been logged on the website:

Bug reference:      15567
Logged by:          movead
Email address:      lichuancheng@highgo.com
PostgreSQL version: 10.5
Operating system:   centos7
Description:

[Problem description]
We use pg9.5.7 with replica and i find that if some damaged wal record
arrived at standby then the startup process on standby try to restart
walreceiver process but never sucess and log below Infinite cycle on
standby.If i try to restart database now,it sucess.
(I try it on pg10.5,the same result.)
-----------------------------------------------------------------------------------------------------------------
2018-12-15 10:38:00.302 CST,,,30928,,5c1422d5.78d0,2,,2018-12-15 05:38:29
CST,,0,FATAL,57P01,"terminating walreceiver process due to administrator
command",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,8,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,9,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:00.302 CST,,,5546,,5c1276c1.15aa,10,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:05.308 CST,,,5546,,5c1276c1.15aa,11,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:10.314 CST,,,5546,,5c1276c1.15aa,12,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:15.319 CST,,,5546,,5c1276c1.15aa,13,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:20.325 CST,,,5546,,5c1276c1.15aa,14,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:25.331 CST,,,5546,,5c1276c1.15aa,15,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:30.336 CST,,,5546,,5c1276c1.15aa,16,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:35.342 CST,,,5546,,5c1276c1.15aa,17,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:40.348 CST,,,5546,,5c1276c1.15aa,18,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:45.353 CST,,,5546,,5c1276c1.15aa,19,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:50.359 CST,,,5546,,5c1276c1.15aa,20,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:38:55.364 CST,,,5546,,5c1276c1.15aa,21,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
2018-12-15 10:39:00.370 CST,,,5546,,5c1276c1.15aa,22,,2018-12-13 23:12:01
CST,1/0,0,LOG,00000,"incorrect resource manager data checksum in record at
5A9/11E0A90",,,,,,,,,""
-----------------------------------------------------------------------------------------------------------------


[Code review]
I read the code about startup process and walreciver process.

When a damaged wal record arrived:
1.The startup process shutdown the walreciver process use function
ShutdownWalRcv().
2.The startup process start the walreciver process use funtion
RequestXLogStreaming() by signal PMSIGNAL_START_WALRECEIVER.
3.The startup process try to read wal record. 
    And now reciver process does not startup completely,so the startup process
read the damaged wal record another times.
    And startup process set walrcv->walRcvState = WALRCV_STOPPED use function
ShutdownWalRcv().
4.go to 2.
That's the infinite cycle.

[Question]
I wonder that if it is a bug or you just don't want restart walreciver
process.

pgsql-bugs by date:

From: Hugh Ranalli
Date: 27 December 2018, 16:21:52
Subject: Re: BUG #15548: Unaccent does not remove combining diacritical characters

From: PG Bug reporting form
Date: 28 December 2018, 10:26:04
Subject: BUG #15568: pg_dump error

BUG #15567: Wal receiver process restart failed when a damaged walrecord arrived at standby. - Mailing list pgsql-bugs

Previous

Next