Thread: File system level backup of shut down standby does not work?

File system level backup of shut down standby does not work?

From
Jürgen Fuchsberger
Date:
Hi all,

I have a master-slave configuration running the master with WAL
archiving enabled and the slave in recovery mode reading back the WAL
files from the master ("Log-shipping standby" as described in
http://www.postgresql.org/docs/9.1/static/warm-standby.html)

I take frequent backups of the standby server:

1) Stop standby server (fast shutdown).
2) Rsync to another fileserver
3) Start standby server.

I just tried to recover one of these backups which *failed* with the
following errors:

2014-02-17 14:27:28 CET LOG:  incomplete startup packet
2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
at 2013-12-25 18:00:03 CET
2014-02-17 14:27:28 CET LOG:  could not open file
"pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
file or directory
2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
2014-02-17 14:27:28 CET LOG:  could not open file
"pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
file or directory
2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
2014-02-17 14:27:29 CET FATAL:  the database system is starting up
2014-02-17 14:27:29 CET FATAL:  the database system is starting up
2014-02-17 14:27:30 CET FATAL:  the database system is starting up
2014-02-17 14:27:30 CET FATAL:  the database system is starting up
2014-02-17 14:27:31 CET FATAL:  the database system is starting up
2014-02-17 14:27:31 CET FATAL:  the database system is starting up
2014-02-17 14:27:32 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
by signal 6: Aborted
2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
failure


So it seems the server is missing some WAL files which are not
in the backup? Or is it simply not possible to take a backup of a
standby server in recovery?

Best,
Juergen




Attachment

Re: File system level backup of shut down standby does not work?

From
Haribabu Kommi
Date:
On Mon, Feb 17, 2014 at 7:02 PM, Jürgen Fuchsberger <juergen.fuchsberger@uni-graz.at> wrote:
Hi all,

I have a master-slave configuration running the master with WAL
archiving enabled and the slave in recovery mode reading back the WAL
files from the master ("Log-shipping standby" as described in
http://www.postgresql.org/docs/9.1/static/warm-standby.html)

I take frequent backups of the standby server:

1) Stop standby server (fast shutdown).
2) Rsync to another fileserver
3) Start standby server.

I just tried to recover one of these backups which *failed* with the
following errors:

2014-02-17 14:27:28 CET LOG:  incomplete startup packet
2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
at 2013-12-25 18:00:03 CET
2014-02-17 14:27:28 CET LOG:  could not open file
"pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
file or directory
2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
2014-02-17 14:27:28 CET LOG:  could not open file
"pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
file or directory
2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
2014-02-17 14:27:29 CET FATAL:  the database system is starting up
2014-02-17 14:27:29 CET FATAL:  the database system is starting up
2014-02-17 14:27:30 CET FATAL:  the database system is starting up
2014-02-17 14:27:30 CET FATAL:  the database system is starting up
2014-02-17 14:27:31 CET FATAL:  the database system is starting up
2014-02-17 14:27:31 CET FATAL:  the database system is starting up
2014-02-17 14:27:32 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET FATAL:  the database system is starting up
2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
by signal 6: Aborted
2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
failure


So it seems the server is missing some WAL files which are not
in the backup? Or is it simply not possible to take a backup of a
standby server in recovery?

From version 9.2, you can take backups from standby also using pg_basebackup utility.

Is the WAL file is present in archive folder? if yes, Did you provided the archive command in 
recovery.conf file?

I am not sure what happened? During fast shutdown of standby it should create a restart point
for further replay of WAL. Can you please enable log_checkpoints guc and check whether any restart point is
getting created or not during fast shutdown.
 
Regards,
Hari Babu
Fujitsu Australia

Re: File system level backup of shut down standby does not work?

From
"Antman, Jason (CMG-Atlanta)"
Date:
Juergen,

I've seen this quite a lot in the past, as we do this multiple times a day.

Here's the procedure we use to prevent it:
1) read the PID from postmaster.pid in the data directory
2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with
-t 600)
3) loop until the PID is no longer running, or a timeout is exceeded (in
which case we error out)
4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database
cluster state: *shut down"` - if pg_controldata output doesn't include
"shut down" or "shut down in recovery", then something's amiss and the
backup won't be clean (error in shutdown, etc.)
5) `sync`
6) now take the backup

-Jason

On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote:
> Hi all,
>
> I have a master-slave configuration running the master with WAL
> archiving enabled and the slave in recovery mode reading back the WAL
> files from the master ("Log-shipping standby" as described in
> http://www.postgresql.org/docs/9.1/static/warm-standby.html)
>
> I take frequent backups of the standby server:
>
> 1) Stop standby server (fast shutdown).
> 2) Rsync to another fileserver
> 3) Start standby server.
>
> I just tried to recover one of these backups which *failed* with the
> following errors:
>
> 2014-02-17 14:27:28 CET LOG:  incomplete startup packet
> 2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
> at 2013-12-25 18:00:03 CET
> 2014-02-17 14:27:28 CET LOG:  could not open file
> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
> file or directory
> 2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
> 2014-02-17 14:27:28 CET LOG:  could not open file
> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
> file or directory
> 2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
> 2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:32 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
> 2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
> by signal 6: Aborted
> 2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
> failure
>
>
> So it seems the server is missing some WAL files which are not
> in the backup? Or is it simply not possible to take a backup of a
> standby server in recovery?
>
> Best,
> Juergen
>
>
>


--

Jason Antman | Systems Engineer | CMGdigital
jason.antman@coxinc.com | p: 678-645-4155


Re: File system level backup of shut down standby does not work?

From
Jürgen Fuchsberger
Date:

On 02/19/2014 02:14 AM, Antman, Jason (CMG-Atlanta) wrote:
> Juergen,
>
> I've seen this quite a lot in the past, as we do this multiple times a day.
>
> Here's the procedure we use to prevent it:
> 1) read the PID from postmaster.pid in the data directory
> 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with
> -t 600)
> 3) loop until the PID is no longer running, or a timeout is exceeded (in
> which case we error out)
> 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database
> cluster state: *shut down"` - if pg_controldata output doesn't include
> "shut down" or "shut down in recovery", then something's amiss and the
> backup won't be clean (error in shutdown, etc.)
> 5) `sync`
> 6) now take the backup
>
> -Jason
Jason,

Thanks a lot! That's exactly what I need.

Juergen.

>
> On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote:
>> Hi all,
>>
>> I have a master-slave configuration running the master with WAL
>> archiving enabled and the slave in recovery mode reading back the WAL
>> files from the master ("Log-shipping standby" as described in
>> http://www.postgresql.org/docs/9.1/static/warm-standby.html)
>>
>> I take frequent backups of the standby server:
>>
>> 1) Stop standby server (fast shutdown).
>> 2) Rsync to another fileserver
>> 3) Start standby server.
>>
>> I just tried to recover one of these backups which *failed* with the
>> following errors:
>>
>> 2014-02-17 14:27:28 CET LOG:  incomplete startup packet
>> 2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
>> at 2013-12-25 18:00:03 CET
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
>> 2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:32 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
>> by signal 6: Aborted
>> 2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
>> failure
>>
>>
>> So it seems the server is missing some WAL files which are not
>> in the backup? Or is it simply not possible to take a backup of a
>> standby server in recovery?
>>
>> Best,
>> Juergen
>>


Attachment

Re: File system level backup of shut down standby does not work?

From
Jürgen Fuchsberger
Date:
All,

One very important thing I just noted when shutting down and restarting
my standby server:

My standby server *always needs the last WAL-file* from the archive
directory, even when the shut down was "smart". Without this the
consistent recovery state will not be reached.

2014-02-19 11:10:20 CET LOG:  received smart shutdown request
2014-02-19 11:10:20 CET LOG:  shutting down
2014-02-19 11:10:20 CET LOG:  database system is shut down
2014-02-19 11:11:00 CET LOG:  database system was shut down in recovery
at 2014-
02-19 11:10:20 CET
2014-02-19 11:11:00 CET LOG:  entering standby mode
2014-02-19 11:11:00 CET LOG:  incomplete startup packet
2014-02-19 11:11:01 CET FATAL:  the database system is starting up

*2014-02-19 11:11:01 CET LOG:  restored log file*
*"00000001000002DE000000BF" from archive*

2014-02-19 11:11:01 CET LOG:  redo starts at 2DE/BF036FA4
2014-02-19 11:11:01 CET FATAL:  the database system is starting up
2014-02-19 11:11:01 CET LOG:  consistent recovery state reached at
2DE/BFFFE53C
2014-02-19 11:11:01 CET LOG:  database system is ready to accept read
only connections

So my question is, could there be something wrong with my configuration
or is this normal?

Juergen

On 02/19/2014 02:14 AM, Antman, Jason (CMG-Atlanta) wrote:
> Juergen,
>
> I've seen this quite a lot in the past, as we do this multiple times a day.
>
> Here's the procedure we use to prevent it:
> 1) read the PID from postmaster.pid in the data directory
> 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with
> -t 600)
> 3) loop until the PID is no longer running, or a timeout is exceeded (in
> which case we error out)
> 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database
> cluster state: *shut down"` - if pg_controldata output doesn't include
> "shut down" or "shut down in recovery", then something's amiss and the
> backup won't be clean (error in shutdown, etc.)
> 5) `sync`
> 6) now take the backup
>
> -Jason
>
> On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote:
>> Hi all,
>>
>> I have a master-slave configuration running the master with WAL
>> archiving enabled and the slave in recovery mode reading back the WAL
>> files from the master ("Log-shipping standby" as described in
>> http://www.postgresql.org/docs/9.1/static/warm-standby.html)
>>
>> I take frequent backups of the standby server:
>>
>> 1) Stop standby server (fast shutdown).
>> 2) Rsync to another fileserver
>> 3) Start standby server.
>>
>> I just tried to recover one of these backups which *failed* with the
>> following errors:
>>
>> 2014-02-17 14:27:28 CET LOG:  incomplete startup packet
>> 2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
>> at 2013-12-25 18:00:03 CET
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
>> 2014-02-17 14:27:28 CET LOG:  could not open file
>> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
>> file or directory
>> 2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
>> 2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:32 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>> 2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
>> by signal 6: Aborted
>> 2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
>> failure
>>
>>
>> So it seems the server is missing some WAL files which are not
>> in the backup? Or is it simply not possible to take a backup of a
>> standby server in recovery?
>>
>> Best,
>> Juergen
>>
>>
>>
>
>

--
| Jürgen Fuchsberger, M.Sc.
| Wegener Center for Climate and Global Change
| University of Graz
| Brandhofgasse 5, A-8010 Graz, Austria
| phone: +43-316-380-8438
|   web: www.wegcenter.at/wegenernet
|        www.wegenernet.org


Attachment

Re: File system level backup of shut down standby does not work?

From
Albe Laurenz
Date:
Jürgen Fuchsberger wrote:
> One very important thing I just noted when shutting down and restarting
> my standby server:
> 
> My standby server *always needs the last WAL-file* from the archive
> directory, even when the shut down was "smart". Without this the
> consistent recovery state will not be reached.
> 
> 2014-02-19 11:10:20 CET LOG:  received smart shutdown request
> 2014-02-19 11:10:20 CET LOG:  shutting down
> 2014-02-19 11:10:20 CET LOG:  database system is shut down
> 2014-02-19 11:11:00 CET LOG:  database system was shut down in recovery
> at 2014-
> 02-19 11:10:20 CET
> 2014-02-19 11:11:00 CET LOG:  entering standby mode
> 2014-02-19 11:11:00 CET LOG:  incomplete startup packet
> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
> 
> *2014-02-19 11:11:01 CET LOG:  restored log file*
> *"00000001000002DE000000BF" from archive*
> 
> 2014-02-19 11:11:01 CET LOG:  redo starts at 2DE/BF036FA4
> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
> 2014-02-19 11:11:01 CET LOG:  consistent recovery state reached at
> 2DE/BFFFE53C
> 2014-02-19 11:11:01 CET LOG:  database system is ready to accept read
> only connections
> 
> So my question is, could there be something wrong with my configuration
> or is this normal?

As far as I can tell, that is as expected:

Upon restart, the recovery process will continue from the last restart point.
So if the restart point happened after replay of WAL record X in WAL
file Y, the recovery process will need WAL file Y again to replay
X+1 and the following records.

Yours,
Laurenz Albe

Re: File system level backup of shut down standby does not work?

From
"Antman, Jason (CMG-Atlanta)"
Date:
That is correct. When the source server is shut down, it may or may not
be caught up for replication (most likely not). You'll need all of the
WAL files that haven't been applied yet; if for some reason you've got a
large amount of replication lag, you might need more than just the first.

I'm not sure the best way to achieve this - we write WALs to a shared
NFS mount, one directory per slave, so that gets mounted on the
destination/standby and it has access. I'm not sure if this is practical
without specialized hardware - our NFS is served from a NetApp storage
appliance, so we have microsecond write latency.

If you want to have a "standalone" backup that doesn't need WAL files,
it's a bit more complex. This is the best that I can figure out from our
current setup. Please be cautious with it, I'm not 100% positive this
includes everything, so I'd only try it on an isolated server that can
risk having problems. This is also a bit of a hack...

Assuming PGDATA is /var/lib/pgsql/9.0/data, as the postgres user:
1) `touch /var/lib/pgsql/9.0/data/stop.replication`
2) make sure your recovery.conf includes:
trigger_file='/var/lib/pgsql/9.0/data/stop.replication'
recovery_end_command='/var/lib/pgsql/9.0/data/stop.replication'
standby_mode='on'
3) Start the database

When recovery finishes, it will attempt to execute recovery_end_command.
Since this isn't actually executable, it raises an error, and stops
postgres. This should result in a stopped database instance, in a
consistent recovery state with all WAL files applied. You can confirm
this by checking that `pg_controldata /var/lib/pgsql/9.0/data` output
includes "Database cluster state: *shut down".

Be aware that we use an intermediate host for this. i.e. we stop a
slave, copy PGDATA *and* ALL WAL files over to another machine, start
the slave back up, and then do the recovery to a consistent state there,
to end up with a standalone PGDATA directory that can be copied wherever
we need it without any WAL files.

-Jason


On 02/19/2014 05:18 AM, Jürgen Fuchsberger wrote:
> All,
>
> One very important thing I just noted when shutting down and restarting
> my standby server:
>
> My standby server *always needs the last WAL-file* from the archive
> directory, even when the shut down was "smart". Without this the
> consistent recovery state will not be reached.
>
> 2014-02-19 11:10:20 CET LOG:  received smart shutdown request
> 2014-02-19 11:10:20 CET LOG:  shutting down
> 2014-02-19 11:10:20 CET LOG:  database system is shut down
> 2014-02-19 11:11:00 CET LOG:  database system was shut down in recovery
> at 2014-
> 02-19 11:10:20 CET
> 2014-02-19 11:11:00 CET LOG:  entering standby mode
> 2014-02-19 11:11:00 CET LOG:  incomplete startup packet
> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
>
> *2014-02-19 11:11:01 CET LOG:  restored log file*
> *"00000001000002DE000000BF" from archive*
>
> 2014-02-19 11:11:01 CET LOG:  redo starts at 2DE/BF036FA4
> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
> 2014-02-19 11:11:01 CET LOG:  consistent recovery state reached at
> 2DE/BFFFE53C
> 2014-02-19 11:11:01 CET LOG:  database system is ready to accept read
> only connections
>
> So my question is, could there be something wrong with my configuration
> or is this normal?
>
> Juergen
>
> On 02/19/2014 02:14 AM, Antman, Jason (CMG-Atlanta) wrote:
>> Juergen,
>>
>> I've seen this quite a lot in the past, as we do this multiple times a day.
>>
>> Here's the procedure we use to prevent it:
>> 1) read the PID from postmaster.pid in the data directory
>> 2) Issue "service postgresql-9.0 stop" (this does a fast shutdown with
>> -t 600)
>> 3) loop until the PID is no longer running, or a timeout is exceeded (in
>> which case we error out)
>> 4) the IMPORTANT part: `pg_controldata /path/to/data | grep "Database
>> cluster state: *shut down"` - if pg_controldata output doesn't include
>> "shut down" or "shut down in recovery", then something's amiss and the
>> backup won't be clean (error in shutdown, etc.)
>> 5) `sync`
>> 6) now take the backup
>>
>> -Jason
>>
>> On 02/17/2014 08:32 AM, Jürgen Fuchsberger wrote:
>>> Hi all,
>>>
>>> I have a master-slave configuration running the master with WAL
>>> archiving enabled and the slave in recovery mode reading back the WAL
>>> files from the master ("Log-shipping standby" as described in
>>> http://www.postgresql.org/docs/9.1/static/warm-standby.html)
>>>
>>> I take frequent backups of the standby server:
>>>
>>> 1) Stop standby server (fast shutdown).
>>> 2) Rsync to another fileserver
>>> 3) Start standby server.
>>>
>>> I just tried to recover one of these backups which *failed* with the
>>> following errors:
>>>
>>> 2014-02-17 14:27:28 CET LOG:  incomplete startup packet
>>> 2014-02-17 14:27:28 CET LOG:  database system was shut down in recovery
>>> at 2013-12-25 18:00:03 CET
>>> 2014-02-17 14:27:28 CET LOG:  could not open file
>>> "pg_xlog/00000001000001E300000061" (log file 483, segment 97): No such
>>> file or directory
>>> 2014-02-17 14:27:28 CET LOG:  invalid primary checkpoint record
>>> 2014-02-17 14:27:28 CET LOG:  could not open file
>>> "pg_xlog/00000001000001E300000060" (log file 483, segment 96): No such
>>> file or directory
>>> 2014-02-17 14:27:28 CET LOG:  invalid secondary checkpoint record
>>> 2014-02-17 14:27:28 CET PANIC:  could not locate a valid checkpoint record
>>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:29 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:30 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:31 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:32 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:33 CET FATAL:  the database system is starting up
>>> 2014-02-17 14:27:33 CET LOG:  startup process (PID 26186) was terminated
>>> by signal 6: Aborted
>>> 2014-02-17 14:27:33 CET LOG:  aborting startup due to startup process
>>> failure
>>>
>>>
>>> So it seems the server is missing some WAL files which are not
>>> in the backup? Or is it simply not possible to take a backup of a
>>> standby server in recovery?
>>>
>>> Best,
>>> Juergen
>>>
>>>
>>>
>>


--

Jason Antman | Systems Engineer | CMGdigital
jason.antman@coxinc.com | p: 678-645-4155


Re: File system level backup of shut down standby does not work?

From
Jürgen Fuchsberger
Date:

On 02/19/2014 11:43 AM, Albe Laurenz wrote:
> Jürgen Fuchsberger wrote:
>> One very important thing I just noted when shutting down and restarting
>> my standby server:
>>
>> My standby server *always needs the last WAL-file* from the archive
>> directory, even when the shut down was "smart". Without this the
>> consistent recovery state will not be reached.
>>
>> 2014-02-19 11:10:20 CET LOG:  received smart shutdown request
>> 2014-02-19 11:10:20 CET LOG:  shutting down
>> 2014-02-19 11:10:20 CET LOG:  database system is shut down
>> 2014-02-19 11:11:00 CET LOG:  database system was shut down in recovery
>> at 2014-
>> 02-19 11:10:20 CET
>> 2014-02-19 11:11:00 CET LOG:  entering standby mode
>> 2014-02-19 11:11:00 CET LOG:  incomplete startup packet
>> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
>>
>> *2014-02-19 11:11:01 CET LOG:  restored log file*
>> *"00000001000002DE000000BF" from archive*
>>
>> 2014-02-19 11:11:01 CET LOG:  redo starts at 2DE/BF036FA4
>> 2014-02-19 11:11:01 CET FATAL:  the database system is starting up
>> 2014-02-19 11:11:01 CET LOG:  consistent recovery state reached at
>> 2DE/BFFFE53C
>> 2014-02-19 11:11:01 CET LOG:  database system is ready to accept read
>> only connections
>>
>> So my question is, could there be something wrong with my configuration
>> or is this normal?
>
> As far as I can tell, that is as expected:
>
> Upon restart, the recovery process will continue from the last restart point.
> So if the restart point happened after replay of WAL record X in WAL
> file Y, the recovery process will need WAL file Y again to replay
> X+1 and the following records.
>
> Yours,
> Laurenz Albe
>
OK, so it is important to back up the WAL archive dir too to have a
consistent backup. This was not that clear to me and
https://wiki.postgresql.org/wiki/Incrementally_Updated_Backups
does not mention the WAL archive dir neither. I might update the Wiki entry.

Best,
Juergen



Attachment