Re: archiving wal files after PITR - Mailing list pgsql-admin

From Jason Mathis
Subject Re: archiving wal files after PITR
Date
Msg-id etPan.5331ff9d.1f16e9e8.fb4f@palos
Whole thread Raw
In response to archiving wal files after PITR  (Jason Mathis <jmathis@redzonesoftware.com>)
List pgsql-admin
I was really hoping some one would chime in on whats going on here. Its not too late let me know;)

Anyways after some more testing today I came to the conclusion that if you simply do a restart after the recovery process completes then this problem fixes itself. Which I like slightly better than deleting the .ready file. I suppose if you changed your hba file before the restore, you would need to restart anyways to allow connections back in. Although I really wish this was documented somewhere. 

If anyone comes up with some info please contribute to this thread.

thanks!


On March 24, 2014 at 3:59:32 PM, Jason Mathis (jmathis@redzonesoftware.com) wrote:
Hi All,

This is a difficult question to frame since I am currently new to postgres and trying to implement a continuous archiving strategy at this new company. 

Postgresql 9.2.7
Centos 6.5 

I am using pg_basebackup to create a backup. I am not backing up the xlog dir. I am using a script for the archiving process which is using this command:

usr/bin/test ! -f $PGARCHIVE_DIR/$2 && /usr/bin/sudo cp $1 $PGARCHIVE_DIR/$2

Pretty straight forward at this point. I also have a restore script that works well. Gets the base backup and restores the needed wal files. The restores completes, is successful and starts a new timeline. All good! But……

One strange problem exists. I now start getting errors about the last restored wal files cannot be archived. Because its already archived which sounds right to me after all the postgres know it got it from the archive. Now in the case of the last restore test I did, I restored two wal files but only the last one cannot be archived (seems to always be the last one). If I look in the pg_xlog/archive_status I will see something like:

-rw------- 1 postgres postgres    0 Mar 24 20:52 00000006000000010000002B.done
-rw------- 1 postgres postgres    0 Mar 24 20:54 00000006000000010000002C.done
-rw------- 1 postgres postgres    0 Mar 24 20:54 00000006000000010000002C.ready
-rw------- 1 postgres postgres    0 Mar 24 20:49 00000006.history.done
-rw------- 1 postgres postgres    0 Mar 24 20:54 00000007.history.read

See it? What the heck. 00000006* was from the restore AND are all marked *.done except 00000006000000010000002C.* which has two entries! one “done” and one “ready.” 

Its perfectly possible I am doing something wrong. I know could manually fix the issue by deleting the .ready file, but I need this to be right and to understand what is happening. Can anyone explain this behavior? 

Thanks!

This transmission contains confidential and privileged information intended solely for the party identified above. If you receive this message in error, you must not use it or convey it to others. Please destroy it immediately and contact the sender at (303) 386-3955 or by return e-mail to the sender.

pgsql-admin by date:

Previous
From: desmodemone
Date:
Subject: Re: copy / backup question - copying a data directory while the db is shutdown
Next
From: Willy-Bas Loos
Date:
Subject: [GENERAL] openvz and shared memory trouble