BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog - Mailing list pgsql-bugs
| From | bohmer@visionlink.org |
|---|---|
| Subject | BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog |
| Date | |
| Msg-id | E1UOCIL-0004ll-Qt@wrigleys.postgresql.org Whole thread Raw |
| Responses |
Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only
looks in pg_xlog
|
| List | pgsql-bugs |
The following bug has been logged on the website:
Bug reference: 8043
Logged by: Jeff Bohmer
Email address: bohmer@visionlink.org
PostgreSQL version: 9.2.4
Operating system: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5
Description: =
Hi pgsql-bugs,
PG version: 9.2.4, from yum.postgresql.org
OS: CentOS 5.9 x86_64 kernel 2.6.18-348.3.1.el5
Upgrading from 9.2.3 to 9.2.4 has broken archive recovery for me. Probably
related to this 9.2.4 change:
Ensure we do crash recovery before entering archive recovery,
if the database was not stopped cleanly and a recovery.conf
file is present (Heikki Linnakangas, Kyotaro Horiguchi,
Mitsumasa Kondo)
When starting the PostgreSQL 9.2.4 service on a base backup, I get this:
2013-04-05 12:49:04 MDT [10302]: [1-1] user=3D,db=3D LOG: database system =
was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 12:49:04 MDT [10302]: [2-1] user=3D,db=3D LOG: starting archive
recovery
2013-04-05 12:49:04 MDT [10302]: [3-1] user=3D,db=3D LOG: could not open f=
ile
"pg_xlog/000000010000002F0000002D" (log file 47, segment 45): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [4-1] user=3D,db=3D LOG: invalid primary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [5-1] user=3D,db=3D LOG: could not open f=
ile
"pg_xlog/000000010000002F0000002C" (log file 47, segment 44): No such file
or directory
2013-04-05 12:49:04 MDT [10302]: [6-1] user=3D,db=3D LOG: invalid secondary
checkpoint record
2013-04-05 12:49:04 MDT [10302]: [7-1] user=3D,db=3D PANIC: could not loca=
te a
valid checkpoint record
2013-04-05 12:49:04 MDT [10297]: [2-1] user=3D,db=3D LOG: startup process =
(PID
10302) was terminated by signal 6: Aborted
2013-04-05 12:49:04 MDT [10297]: [3-1] user=3D,db=3D LOG: aborting startup=
due
to startup process failure
The WAL file 000000010000002F0000002D does exist in my WAL archive, but not
in pg_xlog. I exclude pg_xlog files when taking the base backup, per the
instructions from
<http://www.postgresql.org/docs/9.2/static/continuous-archiving.html>:
You can, however, omit from the backup dump the files within
the cluster's pg_xlog/ subdirectory. This slight adjustment
is worthwhile because it reduces the risk of mistakes when
restoring.
I use a custom base backup script to call pg_start/stop_backup() and make
the backup with rsync.
The restore_command in recovery.conf is never called by PG 9.2.4 during
startup. I confirmed this by adding a "touch /tmp/restore_command.`date
+%H:%M:%S`" line at the beginning of the shell script I use for my
restore_command. No such files are created when starting PG 9.2.4.
After downgrading back to 9.2.3, archive recovery works using the very same
base backup, recovery.conf file, and restore_command. The log indicates that
PG 9.2.3 begins recovery by pulling WAL files from the archive instead of
pg_xlog:
2013-04-05 13:01:14 MDT [16981]: [1-1] user=3D,db=3D LOG: database system =
was
interrupted; last known up at 2013-04-05 10:02:01 MDT
2013-04-05 13:01:14 MDT [16981]: [2-1] user=3D,db=3D LOG: starting archive
recovery
2013-04-05 13:01:14 MDT [16981]: [3-1] user=3D,db=3D LOG: restored log file
"000000010000002F0000002D" from archive
2013-04-05 13:01:14 MDT [16981]: [4-1] user=3D,db=3D LOG: consistent recov=
ery
state reached at 2F/2D000080
2013-04-05 13:01:14 MDT [16981]: [5-1] user=3D,db=3D LOG: redo starts at
2F/2D000080
2013-04-05 13:01:15 MDT [16981]: [6-1] user=3D,db=3D LOG: restored log file
"000000010000002F0000002E" from archive
2013-04-05 13:01:15 MDT [16981]: [7-1] user=3D,db=3D LOG: restored log file
"000000010000002F0000002F" from archive
<snip>
2013-04-05 13:01:17 MDT [16981]: [25-1] user=3D,db=3D LOG: redo done at
2F/3F07B4D0
2013-04-05 13:01:17 MDT [16981]: [26-1] user=3D,db=3D LOG: last completed
transaction was at log time 2013-04-05 12:53:01.069826-06
2013-04-05 13:01:17 MDT [16981]: [27-1] user=3D,db=3D LOG: restored log fi=
le
"000000010000002F0000003F" from archive
2013-04-05 13:01:17 MDT [16981]: [28-1] user=3D,db=3D LOG: selected new
timeline ID: 2
2013-04-05 13:01:17 MDT [16981]: [29-1] user=3D,db=3D LOG: archive recovery
complete
2013-04-05 13:01:17 MDT [16991]: [1-1] user=3D,db=3D LOG: checkpoint start=
ing:
end-of-recovery immediate wait
2013-04-05 13:01:17 MDT [16991]: [2-1] user=3D,db=3D LOG: checkpoint compl=
ete:
wrote 327 buffers (0.1%); 0 transaction log file(s) ad
ded, 0 removed, 0 recycled; write=3D0.015 s, sync=3D0.000 s, total=3D0.035 =
s; sync
files=3D0, longest=3D0.000 s, average=3D0.000 s
2013-04-05 13:01:17 MDT [16978]: [2-1] user=3D,db=3D LOG: database system =
is
ready to accept connections
So, the behavior has definitely changed between 9.2.3 and 9.2.4. Is this a
bug in 9.2.4?
Or, must I now include pg_xlog files when taking base backups with 9.2.4,
contrary to the documentation?
Or, is there a way to make PG 9.2.4 use restore_command for recovery, as
9.2.3 does?
Thank you,
- Jeff
pgsql-bugs by date: