Thread: standby with a fixed lag behind the master

standby with a fixed lag behind the master

From
Alexey Klyukin
Date:
Hello,

I've recently come across the task of setting up a PostgreSQL 9.1 standby server that is N hours behind the master,
i.e.only transactions that finished N hours in the past or older should be replayed on a standby. The goal is to have a
known good state server to perform backups from and possibly revert to in case of unwanted changes on primary. It seems
thatthere is no mechanism in PostgreSQL to just ask the standby to keep a fixed distance (in terms of either WAL
segmentsor time) between the primary, so these are possible solutions: 

1. Use restore command on standby to fetch the current WAL segment only if it has been created not less than N hours in
thepast (according to ctime). 
2. Pause the restore process on standby if the lag * is less than N hours (with pg_xlog_replay_pause()) and resume if
itis more than that. 
3. Set recovery_target_time to current - 6 hours and pause_at_recovery_target to true, periodically check whether the
recoveryis paused, reset the recovery target time to a new value (and restart the standby) if it is. 

* - the lag would be calculated as now() - pg_last_xact_replay_timestamp() on standby.

Both 2 and 3 requires external cron job to pause/resume the recovery, and 1, while being the easiest of all, doesn't
workwith SR (unless it's combined with WAL shipping). I wonder if there are other well established approaches at
solvingthis problem and if there is an interest for adding such feature to the -core? 

Thank you,
--
Alexey Klyukin        http://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.





Re: standby with a fixed lag behind the master

From
Jerry Sievers
Date:
Alexey Klyukin <alexk@commandprompt.com> writes:

> Hello,
>
> I've recently come across the task of setting up a PostgreSQL 9.1
> standby server that is N hours behind the master, i.e. only
> transactions that finished N hours in the past or older should be
> replayed on a standby. The goal is to have a known good state server
> to perform backups from and possibly revert to in case of unwanted
> changes on primary. It seems that there is no mechanism in
> PostgreSQL to just ask the standby to keep a fixed distance (in
> terms of either WAL segments or time) between the primary, so these
> are possible solutions:

A not unreasonable problem to want options for solving.

> 1. Use restore command on standby to fetch the current WAL segment
> only if it has been created not less than N hours in the past
> (according to ctime).

This is the approach I implemented a few years ago by writing a custom
version of pg_standby that implements all the standard logic plus
checking for a configurable age of WAL segment before applying same.
I chose to do the program in Python rather than trying to hack the
feature into the C language native version but of course this was due
to personal preference.

I have seen this feature requested a few times before but not sure
ever added to the community release.

> 2. Pause the restore process on standby if the lag * is less than N
> hours (with pg_xlog_replay_pause()) and resume if it is more than
> that.

> 3. Set recovery_target_time to current - 6 hours and
> pause_at_recovery_target to true, periodically check whether the
> recovery is paused, reset the recovery target time to a new value
> (and restart the standby) if it is.
>
> * - the lag would be calculated as now() - pg_last_xact_replay_timestamp() on standby.
>
> Both 2 and 3 requires external cron job to pause/resume the
> recovery, and 1, while being the easiest of all, doesn't work with
> SR (unless it's combined with WAL shipping). I wonder if there are
> other well established approaches at solving this problem and if
> there is an interest for adding such feature to the -core?

Right and I"m not sure such a strategy would play nice with
hot-standby either.

One strategy if you have enough storage, is to have multiple standby
instances perhaps all running on the same standby host.  One of them
is a streamer and the other a lagger.  You would go live on whichever
instance was necessary to recover from whatever disaster you had.

In practice, we've never had to use any of our lagged standbys for the
intended purpose, always unlagging them and bringing online  at latest
transaction.

But...  If you are ever faced with a case of like junior DBA drops
important table in production, deployment code mangles DB  or
whatever, you may be glad you had a lagged standby to migrate to.

In the case of guarding against risk of disaster from  a deployment
gone bad, and given just one or more regular standbys in the typical
near real-time mode, it's much easier to just halt one of them after
apps are down and immediatly prior to the deployment.  If things go
badly, online that standby and start running on it.

HTH

> Thank you,
> --
> Alexey Klyukin        http://www.commandprompt.com
> The PostgreSQL Company – Command Prompt, Inc.
>
>
>
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 732.216.7255

Re: standby with a fixed lag behind the master

From
Steve Crawford
Date:
On 07/26/2012 02:24 AM, Alexey Klyukin wrote:
> Hello,
>
> I've recently come across the task of setting up a PostgreSQL 9.1 standby server that is N hours behind the master,
i.e.only transactions that finished N hours in the past or older should be replayed on a standby. The goal is to have a
known good state server to perform backups from and possibly revert to in case of unwanted changes on primary. It seems
thatthere is no mechanism in PostgreSQL to just ask the standby to keep a fixed distance (in terms of either WAL
segmentsor time) between the primary, so these are possible solutions: 
>
> 1. Use restore command on standby to fetch the current WAL segment only if it has been created not less than N hours
inthe past (according to ctime). 
> 2. Pause the restore process on standby if the lag * is less than N hours (with pg_xlog_replay_pause()) and resume if
itis more than that. 
> 3. Set recovery_target_time to current - 6 hours and pause_at_recovery_target to true, periodically check whether the
recoveryis paused, reset the recovery target time to a new value (and restart the standby) if it is. 
>
> * - the lag would be calculated as now() - pg_last_xact_replay_timestamp() on standby.
>
> Both 2 and 3 requires external cron job to pause/resume the recovery, and 1, while being the easiest of all, doesn't
workwith SR (unless it's combined with WAL shipping). I wonder if there are other well established approaches at
solvingthis problem and if there is an interest for adding such feature to the -core? 
>
This sounds like a weird way to approach this problem. If you want a
snapshot n-hours ago why not just schedule a dump or base-backup to run
at the desired times? Or continuously archive your WAL so you can
recover up to any point you want rather than just a pre-defined n-hours?

I'm currently looking at using the recently announced pgbarman to
simplify the overall process. (http://www.pgbarman.org/)

Cheers,
Steve


Re: standby with a fixed lag behind the master

From
Amador Alvarez
Date:
Hi there,

And why not shipping older WAL files to the target on a regular basis ?.
On the master you can control with a crontab job to ship the wanted WAL
files (n hours older than current time and clean the shipped up, check
rsync options up) in a regular basis.

A.A.



On 07/26/2012 02:24 AM, Alexey Klyukin wrote:
> Hello,
>
> I've recently come across the task of setting up a PostgreSQL 9.1 standby server that is N hours behind the master,
i.e.only transactions that finished N hours in the past or older should be replayed on a standby. The goal is to have a
known good state server to perform backups from and possibly revert to in case of unwanted changes on primary. It seems
thatthere is no mechanism in PostgreSQL to just ask the standby to keep a fixed distance (in terms of either WAL
segmentsor time) between the primary, so these are possible solutions: 
>
> 1. Use restore command on standby to fetch the current WAL segment only if it has been created not less than N hours
inthe past (according to ctime). 
> 2. Pause the restore process on standby if the lag * is less than N hours (with pg_xlog_replay_pause()) and resume if
itis more than that. 
> 3. Set recovery_target_time to current - 6 hours and pause_at_recovery_target to true, periodically check whether the
recoveryis paused, reset the recovery target time to a new value (and restart the standby) if it is. 
>
> * - the lag would be calculated as now() - pg_last_xact_replay_timestamp() on standby.
>
> Both 2 and 3 requires external cron job to pause/resume the recovery, and 1, while being the easiest of all, doesn't
workwith SR (unless it's combined with WAL shipping). I wonder if there are other well established approaches at
solvingthis problem and if there is an interest for adding such feature to the -core? 
>
> Thank you,
> --
> Alexey Klyukin        http://www.commandprompt.com
> The PostgreSQL Company – Command Prompt, Inc.
>
>
>
>
>