Re: archive_command - Mailing list pgsql-admin

From Simon Riggs
Subject Re: archive_command
Date
Msg-id 1128420570.8603.279.camel@localhost.localdomain
Whole thread Raw
In response to Re: archive_command  (Jeff Frost <jeff@frostconsultingllc.com>)
Responses Re: archive_command
List pgsql-admin
On Mon, 2005-10-03 at 20:00 -0700, Jeff Frost wrote:
> On Sun, 2 Oct 2005, Simon Riggs wrote:
>
> > Probably the best idea is to backup the last WAL file for each timeline
> > seen. Keep track of that, so when the current file changes you'll know
> > which timeline you're on and stick to that from then on. Or more simply,
> > put some notes with your program saying "if you ever use a complex
> > recovery situation, make sure to clear all archive_status files for
> > higher timeline ids before using this program".
>
> Tell me if you think this is the most reasonable way to determine the in use
> WAL file:
>
> ls -rt $PGDATA/pg_xlog/ | grep -v "backup\|archive\|RECOVERY" | tail -1

If you trust the times your filesystem hands you. PostgreSQL doesn't use
the file times it uses the sequential number naming, so for paranoid
accuracy, this should too so perhaps take the t off.

> > This would not be necessary had I completed my logswitch patch in time
> > for 8.1 freeze, but I regret that I was unable to do that.
>
> What's the logswitch patch going to accomplish?

Nothing until I finish it. :-)

> Also, I have an interesting and unrelated question...this past weekend, we had
> a failure in which we had to restore from backup.  I have created a script
> which makes a base backup every two weeks and we keep the last two.  Also, we
> use PITR for replication, making a base backup between the primary and
> secondary server every 8 hours, and running my rsynclastlog script once/minute
> to keep as up to date as possible.

Sounds like a good setup.

> Now here's the problem...the servers
> switched roles on Sep 21.  We switched them back a little while after that.
> When I went to replay the WAL files using the Sep 15 base backup, it happily
> played back the WAL files to Sep 21 and stopped.  I presume that this means
> whenever the servers switch roles I need to create a fresh base backup?

Yes, but more generally if I lost one node I would always be inclined to
fully backup the remaining one just in case.

> Is there any possible way to replay the other WAL files after the Sep 21
> switchover/switchback?  I'm going to guess this is similar to having another
> timeline.

That may be your exact case, it depends upon how you did recovery. Both
systems think they are the same one, so you need to be careful.

Look for the last log file of the went-down node. That file should exist
twice, once as finally written by the went-down node and once as
continued to be written to by the stayed-up node after switchover. It
sounds like the wrong file was used to recover with and as a result
stopped recovery on Sep 21. But if you're running with the system now,
you'll need to do an intermediate rebuild and recover the data manually.

(I recommend testing recovery procedures before you go live and then at
least every 6 months, to ensure that they actually still work when you
need 'em. Trying to do a recovery with any confidence is not easy at 4am
on a Sunday morning under maximum stress, from experience.)

> We restored from a nightly pg_dump which we have kept doing, but that means we
> lost about 20 hrs of data. :-(

But sounds recoverable.

Best Regards, Simon Riggs


pgsql-admin by date:

Previous
From: Jeff Frost
Date:
Subject: Re: archive_command
Next
From:
Date:
Subject: Altering WAL Segment File Size