Re: File system level backup of shut down standby does not work? - Mailing list pgsql-general

From Jürgen Fuchsberger
Subject Re: File system level backup of shut down standby does not work?
Date
Msg-id 530484F4.90202@uni-graz.at
Whole thread Raw
In response to Re: File system level backup of shut down standby does not work?  ("Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com>)
Responses Re: File system level backup of shut down standby does not work?  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Re: File system level backup of shut down standby does not work?  ("Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com>)
List pgsql-general
All,

One very important thing I just noted when shutting down and restarting
my standby server:

My standby server *always needs the last WAL-file* from the archive
directory, even when the shut down was "smart". Without this the
consistent recovery state will not be reached.

2014-02-19 11:10:20 CET LOG:  received smart shutdown request
2014-02-19 11:10:20 CET LOG:  shutting down
2014-02-19 11:10:20 CET LOG:  database system is shut down
2014-02-19 11:11:00 CET LOG:  database system was shut down in recovery
at 2014-
02-19 11:10:20 CET
2014-02-19 11:11:00 CET LOG:  entering standby mode
2014-02-19 11:11:00 CET LOG:  incomplete startup packet
2014-02-19 11:11:01 CET FATAL:  the database system is starting up

*2014-02-19 11:11:01 CET LOG:  restored log file*
*"00000001000002DE000000BF" from archive*

2014-02-19 11:11:01 CET LOG:  redo starts at 2DE/BF036FA4
2014-02-19 11:11:01 CET FATAL:  the database system is starting up
2014-02-19 11:11:01 CET LOG:  consistent recovery state reached at
2DE/BFFFE53C
2014-02-19 11:11:01 CET LOG:  database system is ready to accept read
only connections

So my question is, could there be something wrong with my configuration
or is this normal?

Juergen

On 02/19/2014 02:14 AM, Antman, Jason (CMG-Atlanta) wrote:
> Juergen,
>
> I've seen this quite a lot in the past, as we do this multiple times a day.
>
> Here's the procedure we use to prevent it:
> 1) read the PID from postmaster.pid in the data directory
> 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with
> -t 600)
> 3) loop until the PID is no longer running, or a timeout is exceeded (in
> which case we error out)
> 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database
> cluster state: *shut down"` - if pg_controldata output doesn't include
> "shut down" or "shut down in recovery", then something's amiss and the
> backup won't be clean (error in shutdown, etc.)
> 5) `sync`
> 6) now take the backup
>
> -Jason
>
> On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote:
>> Hi all,
>>
>> I have a master-slave configuration running the master with WAL
>> archiving enabled and the slave in recovery mode reading back the WAL
>> files from the master ("Log-shipping standby" as described in
>> http://www.postgresql.org/docs/9.1/static/warm-standby.html)
>>
>> I take frequent backups of the standby server:
>>
>> 1) Stop standby server (fast shutdown).
>> 2) Rsync to another fileserver
>> 3) Start standby server.
>>
>> I just tried to recover one of these backups which *failed* with the
>> following errors:
>>
>> 2014-02-17 14:27:28 CET LOG:  incomplete startup packet
>> 2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
>> at 2013-12-25 18:00:03 CET
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
>> 2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:32 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
>> by signal 6: Aborted
>> 2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
>> failure
>>
>>
>> So it seems the server is missing some WAL files which are not
>> in the backup? Or is it simply not possible to take a backup of a
>> standby server in recovery?
>>
>> Best,
>> Juergen
>>
>>
>>
>
>

--
| Jürgen Fuchsberger, M.Sc.
| Wegener Center for Climate and Global Change
| University of Graz
| Brandhofgasse 5, A-8010 Graz, Austria
| phone: +43-316-380-8438
|   web: www.wegcenter.at/wegenernet
|        www.wegenernet.org


Attachment

pgsql-general by date:

Previous
From: Jürgen Fuchsberger
Date:
Subject: Re: File system level backup of shut down standby does not work?
Next
From: Herouth Maoz
Date:
Subject: Re: How do I track down a possible locking problem?