Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date
Msg-id 5193D8EF.3060000@vmware.com
Whole thread Raw
In response to Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (David Powers <dpowers@janestreet.com>)
Responses Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (Benedikt Grundmann <bgrundmann@janestreet.com>)
List pgsql-hackers
On 15.05.2013 15:42, David Powers wrote:
> First, thanks for the replies.  This sort of thing is frustrating and hard
> to diagnose at a distance, and any help is appreciated.
>
> Here is some more background:
>
> We have 3 9.2.4 databases using the following setup:

The subject says 9.2.3. Are you sure you're running 9.2.4 on all the 
servers? There was a fix to a bug related to starting a standby server 
from a filesystem snapshot. I don't think it was quite the case you 
have, but pretty close.

> - A primary box
> - A standby box running as a hot streaming replica from the primary
> - A testing box restored nightly from a static backup
>
> As noted, the static backup is taken off of the standby by taking an LVM
> snapshot of the database filesystem and rsyncing.  I don't think it's a
> likely problem but the rsync leverages the previous backup (using
> --link-dest) to make the rsync faster and the resulting backup smaller.

I guess that should work, as long as you make a full copy when you 
restore the backup, and never start a postgres server directly against a 
backup that contains the hardlinks.

> We've been using the same system for quite some time, but previously (~ 1
> month ago) had been taking the backup off of the primary (still using the
> LVM snapshot).  The replication is a recent addition, and a very helpful
> one.  LVM snapshots aren't lightweight in the face of writes and in some
> circumstances a long running rsync would spike the IO load on the
> production box.
>
> Results of some additional tests:
>
> After the user noticed that the test restore showed the original problem we
> ran `vacuum analyze` on all three testing databases thinking that it had a
> good chance of quickly touching most of the underlying files.  That gave us
> errors on two of the testing restores similar to:
>
> ERROR:  invalid page header in block 5427 of relation base/16417/199732075

Huh, that's surprising. You have to be quite unlucky to end with a 
corrupt page header, even if there's something wrong with the underlying 
storage or backup routines. Could you grab a copy of that block? 
Something like

dd skip=5427 bs=8192 count=1 if=base/16417/199732075 of=corrupt-block

I'm especially curious what the first 20 bytes or so look like. I'm 
guessing that it's all zeros, while some later part of the page contains 
data. Ie. the page was torn at some point in the process.

> I'm also running the vacuum analyze on the production machines to double
> check that the base databases don't have a subtle corruption that simply
> hasn't been noticed.  They run with normal autovacuum settings, so I
> suspect that they are fine/this won't show anything because we should have
> seen this from the autovacuum daemon before.

Vacuum analyze doesn't always scan the whole table with default 
settings, so that's no proof that are no missing pages. "set 
vacuum_freeze_table_age=0; vacuum <table>" will force a full scan of the 
table.

> I'm happy to share the scripts we use for the backup/restore process if the
> information above isn't enough, as well as the logs - though the postgres
> logs don't seem to contain much of interest (the database system doesn't
> really get involved).

Yeah, please share the scripts. What you've described seems correct to 
me, but maybe there's something I'm missing.

> I also have the rsyncs of the failed snapshots available and could restore
> them for testing purposes.  It's also easy to look in them (they are just
> saved as normal directories on a big SAN) if I know what to look for.

Great. Not sure what to look at right now, but good that you have kept 
the evidence.

- Heikki



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Parallel Sort
Next
From: Peter Geoghegan
Date:
Subject: Re: Parallel Sort