Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM) - Mailing list pgsql-hackers

From Benedikt Grundmann
Subject Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date
Msg-id CADbMkNMurWJMUmXAKqtFi1p40=G0ncVvOKXjnixhX5Bjb4-8BQ@mail.gmail.com
Whole thread Raw
In response to Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (David Powers <dpowers@janestreet.com>)
Responses Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
We are seeing these errors on a regular basis on the testing box now.  We have even changed the backup script to
shutdown the hot standby, take lvm snapshot, restart the hot standby, rsync the lvm snapshot.  It still happens.

We have never seen this before we introduced the hot standby.  So we will now revert to taking the backups from lvm snapshots on the production database.  If you have ideas of what else we should try / what information we can give you to debug this let us know and we will try to so.

Until then we will sadly operate on the assumption that the combination of hot standby and "frozen snapshot" backup of it is not production ready.

Thanks,

Bene




On Thu, May 16, 2013 at 8:10 AM, David Powers <dpowers@janestreet.com> wrote:
I'll try to get the primary upgraded over the weekend when we can afford a restart.

In the meantime I have a single test showing that a shutdown, snapshot, restart produces a backup that passes the vacuum analyze test.  I'm going to run a full vacuum today.

-David


On Wed, May 15, 2013 at 3:53 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 15.05.2013 22:50, Benedikt Grundmann wrote:
On Wed, May 15, 2013 at 2:50 PM, Heikki Linnakangas<hlinnakangas@vmware.com
The subject says 9.2.3. Are you sure you're running 9.2.4 on all the

servers? There was a fix to a bug related to starting a standby server from
a filesystem snapshot. I don't think it was quite the case you have, but
pretty close.

So this is delightfully embarrassing I just went back to double check and

- primary box is 9.2.3
- standby is 9.2.4
- testing is 9.2.4

I guess that alone could possibly explain it?

Hmm, no, it should still work. There haven't been any changes in the WAL format. I do recommend upgrading the primary, of course, but I don't really see how that would explain what you're seeing.

- Heikki


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: fast promotion and log_checkpoints
Next
From: Fujii Masao
Date:
Subject: pg_export_snapshot on standby side