The following bug has been logged on the website:
Bug reference: 13876
Logged by: Greg Clough
Email address: postgresql.org@gclough.com
PostgreSQL version: 9.3.10
Operating system: CentOS v7.2
Description:
When pg_xlogdump encounters a timeline switch in WAL, it marks it as an
"error in WAL record". I'm just guessing that there is an end-of-timeline
marker in the WAL that pg_xlogdump hasn't been programmed to recognize, so
it marks it as an error.
Here is a dump of WAL file that's had a timeline switch as the first thing
after forcing an pg_swtich_xlog():
-bash-4.2$ pwd
/opt/postgres/9.3/db2/pg_xlog
-bash-4.2$ ls -al
total 114704
drwx------. 3 postgres postgres 4096 Jan 19 09:15 .
drwx------. 16 postgres postgres 4096 Jan 19 09:33 ..
-rw-------. 1 postgres postgres 16777216 Jan 19 08:50
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Jan 19 08:53
000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000003
-rw-------. 1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000004
-rw-------. 1 postgres postgres 16777216 Jan 19 09:10
000000020000000000000005
-rw-------. 1 postgres postgres 16777216 Jan 19 09:15
000000020000000000000006
-rw-------. 1 postgres postgres 41 Jan 19 08:55 00000002.history
-rw-------. 1 postgres postgres 16777216 Jan 19 09:33
000000030000000000000006
-rw-------. 1 postgres postgres 83 Jan 19 09:15 00000003.history
drwx------. 2 postgres postgres 78 Jan 19 09:15 archive_status
-bash-4.2$ pg_xlogdump 000000020000000000000006
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
0/06000028, prev 0/056370D0, bkp: 0000, desc: checkpoint: redo 0/6000028;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn:
0/06000090, prev 0/06000028, bkp: 0000, desc: checkpoint: redo 0/6000090;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
pg_xlogdump: FATAL: error in WAL record at 0/6000090: record with zero
length at 0/60000F8
That final message was mis-interpreted as corruption in the WAL, but without
knowing the internals of either the WAL or pg_xlogdump I can't say for sure
that it's not corruption. It's repeatable, so I doubt it's a problem in
PostgreSQL core.
Could someone confirm my understanding, and if so, then could we please get
pg_xlogdump updated to recognize an end-of-timeline?