Re: File system level backup of shut down standby does not work? - Mailing list pgsql-general
From | Antman, Jason (CMG-Atlanta) |
---|---|
Subject | Re: File system level backup of shut down standby does not work? |
Date | |
Msg-id | 53040588.9020803@coxinc.com Whole thread Raw |
In response to | File system level backup of shut down standby does not work? (Jürgen Fuchsberger <juergen.fuchsberger@uni-graz.at>) |
Responses |
Re: File system level backup of shut down standby does
not work?
Re: File system level backup of shut down standby does not work? |
List | pgsql-general |
Juergen, I've seen this quite a lot in the past, as we do this multiple times a day. Here's the procedure we use to prevent it: 1) read the PID from postmaster.pid in the data directory 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with -t 600) 3) loop until the PID is no longer running, or a timeout is exceeded (in which case we error out) 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database cluster state: *shut down"` - if pg_controldata output doesn't include "shut down" or "shut down in recovery", then something's amiss and the backup won't be clean (error in shutdown, etc.) 5) `sync` 6) now take the backup -Jason On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote: > Hi all, > > I have a master-slave configuration running the master with WAL > archiving enabled and the slave in recovery mode reading back the WAL > files from the master ("Log-shipping standby" as described in > http://www.postgresql.org/docs/9.1/static/warm-standby.html) > > I take frequent backups of the standby server: > > 1) Stop standby server (fast shutdown). > 2) Rsync to another fileserver > 3) Start standby server. > > I just tried to recover one of these backups which *failed* with the > following errors: > > 2014-02-17 14:27:28 CET LOG: incomplete startup packet > 2014-02-17 14:27:28 CET LOG: database system was shut down in recovery > at 2013-12-25 18:00:03 CET > 2014-02-17 14:27:28 CET LOG: could not open file > "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such > file or directory > 2014-02-17 14:27:28 CET LOG: invalid primary checkpoint record > 2014-02-17 14:27:28 CET LOG: could not open file > "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such > file or directory > 2014-02-17 14:27:28 CET LOG: invalid secondary checkpoint record > 2014-02-17 14:27:28 CET PANIC: could not locate a valid checkpoint record > 2014-02-17 14:27:29 CET FATAL: the database system is starting up > 2014-02-17 14:27:29 CET FATAL: the database system is starting up > 2014-02-17 14:27:30 CET FATAL: the database system is starting up > 2014-02-17 14:27:30 CET FATAL: the database system is starting up > 2014-02-17 14:27:31 CET FATAL: the database system is starting up > 2014-02-17 14:27:31 CET FATAL: the database system is starting up > 2014-02-17 14:27:32 CET FATAL: the database system is starting up > 2014-02-17 14:27:33 CET FATAL: the database system is starting up > 2014-02-17 14:27:33 CET FATAL: the database system is starting up > 2014-02-17 14:27:33 CET LOG: startup process (PID 26186) was terminated > by signal 6: Aborted > 2014-02-17 14:27:33 CET LOG: aborting startup due to startup process > failure > > > So it seems the server is missing some WAL files which are not > in the backup? Or is it simply not possible to take a backup of a > standby server in recovery? > > Best, > Juergen > > > -- Jason Antman | Systems Engineer | CMGdigital jason.antman@coxinc.com | p: 678-645-4155
pgsql-general by date: