Thread: Prevent new timeline during archive recovery

Prevent new timeline during archive recovery

From

"Jonatan Evald Buus"

Date:

17 December 2008, 13:25:11

Greetings,
We're currently developing an alternative to pg_standby, internally named pg_ha.
pg_ha uses Record Based Log Shipping fetched by the warm standby server at millisecond intervals and will automatically restore completed WAL files on the standby server to minimize the time required when restoring it to production status after a failure on the primary server.
Once completed we'd be happy to donate the code to the PostGreSQL community to assist in promoting the database server as a High Availability option.

During each restore cycle however we notice that PostGreSQL spawns a new timeline for the recovered archive as described in section 24.3.4 at http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-BASE-BACKUP.
Is it possible to specify that no new timeline should be spawned during the archiver recovery process?

The only workaround otherwise appears to be to detect the current timeline number using pg_controldata and rename copied WAL files accordingly prior to initiating the archive recovery process.

Any input would be most appreciated.

Cheers
Jona

Re: Prevent new timeline during archive recovery

From

Simon Riggs

Date:

17 December 2008, 19:04:25

On Wed, 2008-12-17 at 17:54 +0100, Jonatan Evald Buus wrote:
> Greetings,
> We're currently developing an alternative to pg_standby, internally
> named pg_ha.
> pg_ha uses Record Based Log Shipping fetched by the warm standby
> server at millisecond intervals and will automatically restore
> completed WAL files on the standby server to minimize the time
> required when restoring it to production status after a failure on the
> primary server.
> Once completed we'd be happy to donate the code to the PostGreSQL
> community to assist in promoting the database server as a High
> Availability option.

We're reworking replication now for 8.4 and it sounds like we're already
some way ahead of you on that development.

Have a look at Skytools' walmgr if you want a stream using 8.3. It uses
production APIs built into the server in 8.2.

> During each restore cycle however we notice that PostGreSQL spawns a
> new timeline for the recovered archive as described in section 24.3.4
> at
> http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-BASE-BACKUP.
> Is it possible to specify that no new timeline should be spawned
> during the archiver recovery process?
>
> The only workaround otherwise appears to be to detect the current
> timeline number using pg_controldata and rename copied WAL files
> accordingly prior to initiating the archive recovery process.

That sounds like a bad idea, for minor but annoying technical reasons.
Please check the archives if you're interested further.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: Prevent new timeline during archive recovery

From

"Jonatan Evald Buus"

Date:

18 December 2008, 05:18:18

Cheers for the input Simon, please find a couple of thoughts / questions below

/Jona

On Thu, Dec 18, 2008 at 12:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2008-12-17 at 17:54 +0100, Jonatan Evald Buus wrote:
> Greetings,
> We're currently developing an alternative to pg_standby, internally
> named pg_ha.
> pg_ha uses Record Based Log Shipping fetched by the warm standby
> server at millisecond intervals and will automatically restore
> completed WAL files on the standby server to minimize the time
> required when restoring it to production status after a failure on the
> primary server.
> Once completed we'd be happy to donate the code to the PostGreSQL
> community to assist in promoting the database server as a High
> Availability option.

We're reworking replication now for 8.4 and it sounds like we're already
some way ahead of you on that development.

We're quite looking forward to this work being completed, a simple replication option that would transfer changes from a master node to one or more slave nodes every couple of seconds would go a long way to solve most HA scenarios.
I suspect we'll abandon our own solution and use the built-in replication features of 8.4 once it's released.

Have a look at Skytools' walmgr if you want a stream using 8.3. It uses
production APIs built into the server in 8.2.

We did have a look at walmgr originally but it didn't seem to meet our needs as it doesn't appear to be doing automatic restore of completed WAL files on a continual basis. I wager that replaying a year's worth of WAL files from a busy server would take quite a while thus decreasing the value of having a Warm Standby Server.
It does however have a restore option but upon a closer look of that specific part of the code it doesn't appear to address the automatically increasing timeline during archive recovery.
Essentially it takes the same approach to restoring WAL archives as we have opted for:
1) Create recovery.conf file in PGDATA, specifying restore_command = 'cp [ARCHIVE DIR] /%f %p' (walmgr specifies itself as the restore command but essentially simply does the copying)
2) Start the Postmaster process using pc_ctl start

The recovery.conf file used by walmgr is as follows:
restore_command = '%s %s %s'\n" % (self.script, cf_file, 'xrestore %f "%p"')
# recovery_target_time=''
# recovery_target_xid=''
# recovery_target_inclusive=true
# recovery_target_timeline=''

In comparison we specify:
restore_command = 'cp [ARCHIVE DIR]/%f %p'
recovery_target_inclusive = 'true'

Unless I'm missing something, none of of these recovery.conf files will address the incrementing timeline during archive recovery?

> During each restore cycle however we notice that PostGreSQL spawns a
> new timeline for the recovered archive as described in section 24.3.4
> at
> http://www.postgresql.org/docs/8.3/static/continuous-archiving.html#BACKUP-BASE-BACKUP.
> Is it possible to specify that no new timeline should be spawned
> during the archiver recovery process?
>
> The only workaround otherwise appears to be to detect the current
> timeline number using pg_controldata and rename copied WAL files
> accordingly prior to initiating the archive recovery process.

That sounds like a bad idea, for minor but annoying technical reasons.
Please check the archives if you're interested further.

Agreed, it's a pretty appauling approach but it does appear to work.
An alternative would be to call "pg_resetxlog" after each archive restore using the -l switch to reset the timeline.
i.e. pg_resetxlog -l 0x01,0x[CURRENT FILE ID],0x[NEXT LOG SEGMENT] [DATA DIR]
This also seems like a bad idea which could have unforseen consequences due to the internal cleanups performed.

Doing automatic restore of completed WAL files on the standby server would greatly increase the timeline id as an archive restore might be executed every couple of minutes depending on the load on the primary server.
Thus the best approach by far would be an optiong to specifying that the timeline shouldn't be changed during archive recovery.
I haven't been able to find anything in relation to this topic in the archives, seems that the timeline feature is discussed very seldom but I might simply be searching for the wrong keywords.
If you could point me in the right direction I'd greatly appreciate it.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support