On Thu, May 24, 2012 at 10:13 AM, Joachim Wieland <joe@mcknight.de> wrote:
> On Tue, May 22, 2012 at 9:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Hmm. I think that if you do it this way, the minimum recovery point
>> won't be respected, which could leave you with a corrupted database.
>> Now, if all the WAL files that you need are present in pg_xlog anyway,
>> then they ought to get replayed anyway, but I think that if you are
>> using restore_command (as opposed to streaming replication) we restore
>> WAL segments under a different file name, which might cause this
>> problem.
>
> Uhm, maybe I add some more details, so you get a better idea of what I
> did: The idea was to promote the standby to be the new master. There
> was streaming replication active but at some time I had to take the
> master down. IIRC from the log I saw that after the master went down,
> the standby continued recovering from a bunch of archived log files
> (via recovery_command), I had suspected that either the standby was
> lagging behind a bit or that the master archived them during shutdown.
> When the standby didn't have anything else left to recover from
> (saying both "xlog file foo doesn't exist" and "cannot connect to
> master"), I deleted recovery.conf on the standby and restarted it.
>
> I wouldn't have assumed any corruption was possible given that I did
> clean shutdowns on both sides...
The thing that's worrying me is that there's not really any such thing
as a "clean" shutdown on a standby. When you shut down the master, it
checkpoints. When you shut down the standby, it can't checkpoint, so
I think it's still going to enter recovery at startup. It'd be
interesting to know where that recovery began and ended as compared
with the minimum recovery point just before the shutdown.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company