point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error - Mailing list pgsql-admin

From Radovan Jablonovsky
Subject point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error
Date
Msg-id CAJYcdTv9uLPejVdiL4hUmmAUL50jL2MkkMkp0qKAoc1_nfCWzg@mail.gmail.com
Whole thread Raw
List pgsql-admin
Hello,

We have issue with point-in-time recovery. When applying WAL files on restored postgresql server (we used pg_basebackup to create full cluster backup) we got "invalid magic number 0000 in log file" error. The recovery process applied WAL files from archive location and it was set to specific point in time.

In recovery.conf file were parameters set to: 
restore_command = 'cp /db/archive/transaction_logs/%f %p'
recovery_target_time = '2014-06-11 16:45:00'

The recovery stopped at 
2014-06-16 15:40:30.514 MDT [32021]: [16580-1] LOG:  restored log file "0000001A00001172000000D1" from archive
2014-06-16 15:40:35.957 MDT [32021]: [16581-1] LOG:  restored log file "0000001A00001172000000D2" from archive
2014-06-16 15:40:46.457 MDT [32021]: [16582-1] LOG:  invalid magic number 0000 in log file 4466, segment 210, offset 6594560
2014-06-16 15:40:46.457 MDT [32021]: [16583-1] LOG:  redo done at 1172/D2649068
2014-06-16 15:40:46.457 MDT [32021]: [16584-1] LOG:  last completed transaction was at log time 2014-06-11 12:34:13.496717-06
2014-06-16 15:40:46.481 MDT [32021]: [16585-1] LOG:  restored log file "0000001A00001172000000D2" from archive
cp: cannot stat `/db/archive/transaction_logs/0000001B.history': No such file or directory
2014-06-16 15:40:46.499 MDT [32021]: [16586-1] LOG:  selected new timeline ID: 27
cp: cannot stat `/db/archive/transaction_logs/0000001A.history': No such file or directory
2014-06-16 15:40:49.766 MDT [32021]: [16587-1] LOG:  archive recovery complete
2014-06-16 15:55:16.171 MDT [6791]: [1-1] LOG:  autovacuum launcher started
2014-06-16 15:55:16.172 MDT [32019]: [1-1] LOG:  database system is ready to accept connections

What is interesting, the postgresql server streaming replica/slave, which is set to asynchronous replication had no issue. When we checked the log from replica/slave there was no issue with applying the WAL stream on slave.

Streaming replica/slave log:
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26862-1] LOG:  restartpoint complete: wrote 77860 buffers (7.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=719.761 s, sync=0.983 s, total=720.748 s; sync files=5119, longest=0.261 s, average=0.000 s
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26863-1] LOG:  recovery restart point at 1172/6C678380
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26864-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:20:45.928542-06
2014-06-11 12:23:40.170 MDT [11787] - [] - [] - []: [26865-1] LOG:  restartpoint starting: time
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26866-1] LOG:  restartpoint complete: wrote 110827 buffers (10.6%); 0 transaction log file(s) added, 0 removed, 30 recycled; write=719.931 s, sync=3.832 s, total=723.766 s; sync files=6957, longest=1.349 s, average=0.000 s
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26867-1] LOG:  recovery restart point at 1172/88B75B48
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26868-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:35:49.145232-06
2014-06-11 12:35:44.095 MDT [11787] - [] - [] - []: [26869-1] LOG:  restartpoint starting: xlog
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26870-1] LOG:  restartpoint complete: wrote 176758 buffers (16.9%); 0 transaction log file(s) added, 0 removed, 28 recycled; write=670.320 s, sync=10.588 s, total=680.945 s; sync files=7398, longest=3.899 s, average=0.001 s
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26871-1] LOG:  recovery restart point at 1172/BE5F4928
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26872-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:47:10.216659-06
2014-06-11 12:47:05.224 MDT [11787] - [] - [] - []: [26873-1] LOG:  restartpoint starting: xlog
2014-06-11 12:58:27.067 MDT [11787] - [] - [] - []: [26874-1] LOG:  restartpoint complete: wrote 495857 buffers (47.3%); 0 transaction log file(s) added, 0 removed, 54 recycled; write=670.682 s, sync=11.148 s, total=681.843 s; sync files=5627, longest=4.773 s, average=0.001 s

What could cause this issue? I suspect one possibility could be the WAL file itself was corrupted when it was archived. 
Is it possible to recover/fix WAL file?

--

Radovan Jablonovsky | SaaS DBA | Phone 1-403-262-6519 (ext. 256) | Fax 1-403-233-8046


Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

pgsql-admin by date:

Previous
From: Oliver
Date:
Subject: Re: Best backup strategy for production systems
Next
From: Oliver
Date:
Subject: Postgresql not getting assigned memory