BUG #13876: pg_xlogdump give an error on timeline switch - Mailing list pgsql-bugs

From postgresql.org@gclough.com
Subject BUG #13876: pg_xlogdump give an error on timeline switch
Date
Msg-id 20160119153448.2961.55971@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #13876: pg_xlogdump give an error on timeline switch
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13876
Logged by:          Greg Clough
Email address:      postgresql.org@gclough.com
PostgreSQL version: 9.3.10
Operating system:   CentOS v7.2
Description:

When pg_xlogdump encounters a timeline switch in WAL, it marks it as an
"error in WAL record".  I'm just guessing that there is an end-of-timeline
marker in the WAL that pg_xlogdump hasn't been programmed to recognize, so
it marks it as an error.

Here is a dump of WAL file that's had a timeline switch as the first thing
after forcing an pg_swtich_xlog():

-bash-4.2$ pwd
/opt/postgres/9.3/db2/pg_xlog

-bash-4.2$ ls -al
total 114704
drwx------.  3 postgres postgres     4096 Jan 19 09:15 .
drwx------. 16 postgres postgres     4096 Jan 19 09:33 ..
-rw-------.  1 postgres postgres 16777216 Jan 19 08:50
000000010000000000000002
-rw-------.  1 postgres postgres 16777216 Jan 19 08:53
000000010000000000000003
-rw-------.  1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000003
-rw-------.  1 postgres postgres 16777216 Jan 19 09:07
000000020000000000000004
-rw-------.  1 postgres postgres 16777216 Jan 19 09:10
000000020000000000000005
-rw-------.  1 postgres postgres 16777216 Jan 19 09:15
000000020000000000000006
-rw-------.  1 postgres postgres       41 Jan 19 08:55 00000002.history
-rw-------.  1 postgres postgres 16777216 Jan 19 09:33
000000030000000000000006
-rw-------.  1 postgres postgres       83 Jan 19 09:15 00000003.history
drwx------.  2 postgres postgres       78 Jan 19 09:15 archive_status

-bash-4.2$ pg_xlogdump 000000020000000000000006
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn:
0/06000028, prev 0/056370D0, bkp: 0000, desc: checkpoint: redo 0/6000028;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn:
0/06000090, prev 0/06000028, bkp: 0000, desc: checkpoint: redo 0/6000090;
tli 2; prev tli 2; fpw true; xid 0/1894; oid 24576; multi 1; offset 0;
oldest xid 1879 in DB 1; oldest multi 1 in DB 1; oldest running xid 0;
shutdown
pg_xlogdump: FATAL:  error in WAL record at 0/6000090: record with zero
length at 0/60000F8

That final message was mis-interpreted as corruption in the WAL, but without
knowing the internals of either the WAL or pg_xlogdump I can't say for sure
that it's not corruption.  It's repeatable, so I doubt it's a problem in
PostgreSQL core.

Could someone confirm my understanding, and if so, then could we please get
pg_xlogdump updated to recognize an end-of-timeline?

pgsql-bugs by date:

Previous
From: "Smith, Travis"
Date:
Subject: Re: BUG #13875: Error explaining query
Next
From: Andres Freund
Date:
Subject: Re: BUG #13876: pg_xlogdump give an error on timeline switch