Hot strandby fails to restart after pg_subtrans corruption (?) - Mailing list pgsql-general
From | Paul Guyot |
---|---|
Subject | Hot strandby fails to restart after pg_subtrans corruption (?) |
Date | |
Msg-id | FCCF0B65-62EA-4E4C-9E99-10C47FC339A4@ieee.org Whole thread Raw |
List | pgsql-general |
Hello, After a hardware reboot, a hot standby server fails to start. The log mentions pg_subtrans files. While the machine got a hard reboot, underlying filesystem is ZFS which did not detector report any corruption. For the first restart after the reboot, the server (then 9.1.0) complained that a file did not exist. May 10 14:16:49 citrinus postgres[1914]: [6-1] LOG: file "pg_subtrans/B7E5" doesn't exist, reading as zeroes Considering this may be related to this thread : http://archives.postgresql.org/pgsql-general/2011-12/msg00291.php postgres was cleanly restarted during the recovery : May 10 14:22:30 citrinus postgres[1489]: [1-1] LOG: received fast shutdown request May 10 14:22:30 citrinus postgres[1916]: [1-1] LOG: shutting down May 10 14:22:31 citrinus postgres[1916]: [2-1] LOG: database system is shut down and then updated to 9.1.3. Now, the server just fails later on the same file. ---- May 10 15:48:05 citrinus postgres[39203]: [1-1] LOG: database system was interrupted while in recovery at log time 2012-04-2507:53:33 UTC May 10 15:48:05 citrinus postgres[39203]: [1-2] HINT: If this has occurred more than once some data might be corrupted andyou might need to choose an earlier recovery target. May 10 15:48:05 citrinus postgres[39203]: [2-1] LOG: entering standby mode May 10 15:48:07 citrinus postgres[39203]: [3-1] LOG: restored log file "00000001000006D600000017" from archive May 10 15:48:10 citrinus postgres[39203]: [4-1] LOG: restored log file "00000001000006D500000068" from archive May 10 15:48:10 citrinus postgres[39203]: [5-1] LOG: redo starts at 6D5/684B13E8 May 10 15:49:47 citrinus postgres[39203]: [6-1] LOG: restored log file "00000001000006D500000069" from archive May 10 15:50:37 citrinus postgres[39203]: [7-1] LOG: restored log file "00000001000006D50000006A" from archive May 10 15:51:19 citrinus postgres[39203]: [8-1] LOG: restored log file "00000001000006D50000006B" from archive May 10 15:52:05 citrinus postgres[39203]: [9-1] LOG: restored log file "00000001000006D50000006C" from archive May 10 15:53:06 citrinus postgres[39203]: [10-1] LOG: restored log file "00000001000006D50000006D" from archive May 10 15:53:24 citrinus postgres[39203]: [11-1] FATAL: could not access status of transaction 3085299721 May 10 15:53:24 citrinus postgres[39203]: [11-2] DETAIL: Could not read from file "pg_subtrans/B7E5" at offset 245760: Noerror: 0. May 10 15:53:24 citrinus postgres[39203]: [11-3] CONTEXT: xlog redo xid assignment xtop 3085293107: subxacts: 30852994493085299451 3085299462 3085299465 3085299469 3085299473 3085299478 3085299485 3085299487 3085299489 3085299491 30852994943085299497 3085299498 3085299543 3085299545 3085299547 3085299549 3085299551 3085299553 3085299555 3085299557 30852995583085299559 3085299560 3085299561 3085299562 3085299563 3085299564 3085299567 3085299568 3085299569 3085299570 30852995713085299573 3085299574 3085299576 3085299578 3085299580 3085299582 3085299594 3085299596 3085299607 3085299608 30852996113085299635 3085299646 3085299648 3085299650 3085299652 3085299654 3085299656 3085299658 3085299660 3085299663 30852996653085299667 3085299669 3085299671 3085299673 3085299681 3085299683 3085299687 3085299721 May 10 15:53:24 citrinus postgres[39202]: [1-1] LOG: startup process (PID 39203) exited with exit code 1 May 10 15:53:24 citrinus postgres[39202]: [2-1] LOG: terminating any other active server processes ---- File B7E5 has exactly 245760 bytes. If I remove the B7E5 file as well as file B7F6 which was created during recovery, and retry, it fails later on like this: ---- May 10 16:15:53 citrinus postgres[41142]: [75-1] LOG: file "pg_subtrans/B7E6" doesn't exist, reading as zeroes May 10 16:15:53 citrinus postgres[41142]: [75-2] CONTEXT: xlog redo xid assignment xtop 3085346439: subxacts: 30853643073085364352 3085364355 3085364358 3085364363 3085364404 3085364407 3085364409 3085364496 3085364518 3085364532 30853645683085364571 3085364611 3085364625 3085364660 3085364663 3085364668 3085364671 3085364715 3085364719 3085364727 30853647383085364740 3085364768 3085364770 3085364773 3085364792 3085364800 3085364803 3085364806 3085364817 3085364819 30853648293085364830 3085364832 3085364854 3085364874 3085364877 3085364885 3085364886 3085364889 3085364890 3085364891 30853649113085364914 3085364918 3085364920 3085364964 3085364988 3085364998 3085365001 3085365013 3085365014 3085365015 30853650163085365068 3085365069 3085365071 3085365106 3085365183 3085365284 3085365287 3085365288 May 10 16:16:10 citrinus postgres[41142]: [76-1] LOG: restored log file "00000001000006D500000077" from archive May 10 16:16:13 citrinus postgres[41142]: [77-1] FATAL: could not access status of transaction 3085367299 May 10 16:16:13 citrinus postgres[41142]: [77-2] DETAIL: Could not read from file "pg_subtrans/B7E6" at offset 253952: Noerror: 0. May 10 16:16:13 citrinus postgres[41142]: [77-3] CONTEXT: xlog redo xid assignment xtop 3085346439: subxacts: 30853668203085366821 3085366822 3085366823 3085366824 3085366825 3085366826 3085366827 3085366828 3085366829 3085366830 30853668313085366832 3085366833 3085366834 3085366835 3085366836 3085366837 3085366838 3085366860 3085366905 3085366919 30853669203085366949 3085366965 3085367040 3085367057 3085367059 3085367062 3085367076 3085367168 3085367170 3085367174 30853671773085367180 3085367182 3085367183 3085367184 3085367185 3085367202 3085367205 3085367208 3085367212 3085367215 30853672643085367266 3085367269 3085367272 3085367274 3085367277 3085367299 3085367301 3085367305 3085367309 3085367311 30853673313085367334 3085367337 3085367338 3085367339 3085367341 3085367343 3085367345 3085367351 May 10 16:16:13 citrinus postgres[41141]: [1-1] LOG: startup process (PID 41142) exited with exit code 1 May 10 16:16:13 citrinus postgres[41141]: [2-1] LOG: terminating any other active server processes ---- At this point, file B7E6 was created and is 8KB long. -rw------- 1 pgsql wheel 8192 May 10 16:16 B7E6 What are the steps to recover the standby? Paul -- Semiocast http://semiocast.com/ Semioboard http://semioboard.com/ +33.183627948 - 20 rue Lacaze, 75014 Paris
Attachment
pgsql-general by date: