Re: File system level backup of shut down standby does not work? - Mailing list pgsql-general
From | Jürgen Fuchsberger |
---|---|
Subject | Re: File system level backup of shut down standby does not work? |
Date | |
Msg-id | 530484F4.90202@uni-graz.at Whole thread Raw |
In response to | Re: File system level backup of shut down standby does not work? ("Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com>) |
Responses |
Re: File system level backup of shut down standby does
not work?
Re: File system level backup of shut down standby does not work? |
List | pgsql-general |
All, One very important thing I just noted when shutting down and restarting my standby server: My standby server *always needs the last WAL-file* from the archive directory, even when the shut down was "smart". Without this the consistent recovery state will not be reached. 2014-02-19 11:10:20 CET LOG: received smart shutdown request 2014-02-19 11:10:20 CET LOG: shutting down 2014-02-19 11:10:20 CET LOG: database system is shut down 2014-02-19 11:11:00 CET LOG: database system was shut down in recovery at 2014- 02-19 11:10:20 CET 2014-02-19 11:11:00 CET LOG: entering standby mode 2014-02-19 11:11:00 CET LOG: incomplete startup packet 2014-02-19 11:11:01 CET FATAL: the database system is starting up *2014-02-19 11:11:01 CET LOG: restored log file* *"00000001000002DE000000BF" from archive* 2014-02-19 11:11:01 CET LOG: redo starts at 2DE/BF036FA4 2014-02-19 11:11:01 CET FATAL: the database system is starting up 2014-02-19 11:11:01 CET LOG: consistent recovery state reached at 2DE/BFFFE53C 2014-02-19 11:11:01 CET LOG: database system is ready to accept read only connections So my question is, could there be something wrong with my configuration or is this normal? Juergen On 02/19/2014 02:14 AM, Antman, Jason (CMG-Atlanta) wrote: > Juergen, > > I've seen this quite a lot in the past, as we do this multiple times a day. > > Here's the procedure we use to prevent it: > 1) read the PID from postmaster.pid in the data directory > 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with > -t 600) > 3) loop until the PID is no longer running, or a timeout is exceeded (in > which case we error out) > 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database > cluster state: *shut down"` - if pg_controldata output doesn't include > "shut down" or "shut down in recovery", then something's amiss and the > backup won't be clean (error in shutdown, etc.) > 5) `sync` > 6) now take the backup > > -Jason > > On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote: >> Hi all, >> >> I have a master-slave configuration running the master with WAL >> archiving enabled and the slave in recovery mode reading back the WAL >> files from the master ("Log-shipping standby" as described in >> http://www.postgresql.org/docs/9.1/static/warm-standby.html) >> >> I take frequent backups of the standby server: >> >> 1) Stop standby server (fast shutdown). >> 2) Rsync to another fileserver >> 3) Start standby server. >> >> I just tried to recover one of these backups which *failed* with the >> following errors: >> >> 2014-02-17 14:27:28 CET LOG: incomplete startup packet >> 2014-02-17 14:27:28 CET LOG: database system was shut down in recovery >> at 2013-12-25 18:00:03 CET >> 2014-02-17 14:27:28 CET LOG: could not open file >> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such >> file or directory >> 2014-02-17 14:27:28 CET LOG: invalid primary checkpoint record >> 2014-02-17 14:27:28 CET LOG: could not open file >> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such >> file or directory >> 2014-02-17 14:27:28 CET LOG: invalid secondary checkpoint record >> 2014-02-17 14:27:28 CET PANIC: could not locate a valid checkpoint record >> 2014-02-17 14:27:29 CET FATAL: the database system is starting up >> 2014-02-17 14:27:29 CET FATAL: the database system is starting up >> 2014-02-17 14:27:30 CET FATAL: the database system is starting up >> 2014-02-17 14:27:30 CET FATAL: the database system is starting up >> 2014-02-17 14:27:31 CET FATAL: the database system is starting up >> 2014-02-17 14:27:31 CET FATAL: the database system is starting up >> 2014-02-17 14:27:32 CET FATAL: the database system is starting up >> 2014-02-17 14:27:33 CET FATAL: the database system is starting up >> 2014-02-17 14:27:33 CET FATAL: the database system is starting up >> 2014-02-17 14:27:33 CET LOG: startup process (PID 26186) was terminated >> by signal 6: Aborted >> 2014-02-17 14:27:33 CET LOG: aborting startup due to startup process >> failure >> >> >> So it seems the server is missing some WAL files which are not >> in the backup? Or is it simply not possible to take a backup of a >> standby server in recovery? >> >> Best, >> Juergen >> >> >> > > -- | Jürgen Fuchsberger, M.Sc. | Wegener Center for Climate and Global Change | University of Graz | Brandhofgasse 5, A-8010 Graz, Austria | phone: +43-316-380-8438 | web: www.wegcenter.at/wegenernet | www.wegenernet.org
Attachment
pgsql-general by date: