Re: WARNINGs after starting backup server created with PITR - Mailing list pgsql-general
From | Erik Jones |
---|---|
Subject | Re: WARNINGs after starting backup server created with PITR |
Date | |
Msg-id | 8175BC3D-D9FD-4DA5-87B7-CC9A3E5E0005@myemma.com Whole thread Raw |
In response to | Re: WARNINGs after starting backup server created with PITR (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: WARNINGs after starting backup server created with
PITR
Re: WARNINGs after starting backup server created with PITR |
List | pgsql-general |
On Jan 19, 2008, at 1:24 PM, Simon Riggs wrote: > So we definitely have missing data. I think the multi-phase rsync is > definitely suspect and should be avoided until we get to the bottom of > this. By this, do you mean the way we ran the rsync multiple times before performing the actual base backup? Easy enough. > Can you examine the server logs for that 11 hour period and see if > there > are any other messages that might be relevant on both primary and > standby servers? On the primary, all relevant log entries are just entries for successfully logged WALs. > Can you save the WAL files covering that period also. We may want to > inspect them later to confirm whether the data was actually in them or > not. Can you save the very first WAL files after the recovery started? > We can use those to examine the first data blocks touched, which would > help confirm when the gun was fired. Sure, I still have all of the WALs that were shipped from just before the beginning of the base back up until about 8 or 9 hours after the missing WAL file. I'm not sure if I'll be able to get to it today as I've got lots of household chores to get to, but either tomorrow of first thing Monday I'll zip them and post them here. > Has rsync produced any messages of note? Might your script be ignoring > errors? What versions of rsync are you using? Version 2.6.9 on all three servers (1 primary, 2 standbys). Here's the python script I was using to ship to both servers, right now I'm back to a direct rsync call for my archive_command to sb1. What's really weird, is that for the two WALs that disappeared, or didn't make it, even though the primary's log said they were successfully archived, they weren't on either server. ##### #!/usr/bin/env python from commands import getstatusoutput import sys source_path = sys.argv[1] file_name = sys.argv[2] # db2 res, out = getstatusoutput("ssh db2 'test -f /mnt/xfer/wal_archive/% s'" % file_name) if res != 0: res, out = getstatusoutput("rsync -a %s db2:/mnt/xfer/wal_archive/" % source_path) if res != 0: sys.exit(res) # sb1 res, out = getstatusoutput("test -f /mnt/xfer/wal_archive/%s" % file_name) if res != 0: res, out = getstatusoutput("rsync -a %s /mnt/xfer/wal_archive", source_path) sys.exit(res) ##### > Is the History table Insert only? We might use that fact to examine > the > LSNs of the equivalent blocks on the Primary. If the LSNs are prior to > the start of the recovery, as noted in the backup label file of the > original base backup, then we can confirm rsync or postgres as the > cause. Do you still have the base backup label file or base backup? I still have the backup label but not the base backup. The history tables are about 90-95% insert only, on average, but they do see the occasional update if a message's delivery status is a bounce. If you think this would still be an option, I'd be happy to follow whatever instructions, or supply any files, needed. Erik Jones DBA | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
pgsql-general by date: