Re: BUG #7500: hot-standby replica crash after an initial rsync - Mailing list pgsql-bugs
From | Stuart Bishop |
---|---|
Subject | Re: BUG #7500: hot-standby replica crash after an initial rsync |
Date | |
Msg-id | CADmi=6P6VT=6sW0XjW6cn35bW_uW=365TU8_Ssbd8Oepn-Cacw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #7500: hot-standby replica crash after an initial rsync (Andres Freund <andres@2ndquadrant.com>) |
List | pgsql-bugs |
On Wed, Aug 29, 2012 at 10:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On Wednesday, August 29, 2012 05:32:31 PM Stuart Bishop wrote: >> I believe I just hit this same issue, but with PG 9.1.3: >> >> <@:32407> 2012-08-29 10:02:09 UTC LOG: shutting down >> <@:32407> 2012-08-29 10:02:09 UTC LOG: database system is shut down >> <[unknown]@[unknown]:31687> 2012-08-29 13:34:03 UTC LOG: connection >> received: host=[local] >> <[unknown]@[unknown]:31687> 2012-08-29 13:34:03 UTC LOG: incomplete >> startup packet >> <@:31686> 2012-08-29 13:34:03 UTC LOG: database system was >> interrupted; last known up at 2012-08-29 13:14:47 UTC >> <@:31686> 2012-08-29 13:34:03 UTC LOG: entering standby mode >> <@:31686> 2012-08-29 13:34:03 UTC LOG: redo starts at A92/5F000020 >> <@:31686> 2012-08-29 13:34:03 UTC FATAL: could not access status of >> transaction 208177034 >> <@:31686> 2012-08-29 13:34:03 UTC DETAIL: Could not read from file >> "pg_multixact/offsets/0C68" at offset 131072: Success. >> <@:31686> 2012-08-29 13:34:03 UTC CONTEXT: xlog redo create multixact >> 208177034 offset 1028958730: 1593544329 1593544330 >> <@:31681> 2012-08-29 13:34:03 UTC LOG: startup process (PID 31686) >> exited with exit code 1 >> <@:31681> 2012-08-29 13:34:03 UTC LOG: terminating any other active >> server processes >> >> This was attempting to rebuild a hot standby after switching my master >> to a new server. In between the shutdown and the attempt to restart: >> >> - The master was put into backup mode. >> - The datadir was rsynced over, using rsync -ahhP --delete-before >> --exclude=postmaster.pid --exclude=pg_xlog >> - The master was taken out of backup mode. >> - The pg_xlog directory was emptied >> - The pg_xlog directory was rsynced across from the master. This >> included all the WAL files from before the promotion, throughout >> backup mode, and a few from after backup mode was left. > Thats not valid, you cannot easily guarantee that youve not copied files that > were in the progress of being written to. Use a recovery_command if you do not > want all files to be transferred via the replication connection. But do that > only for files that have been archived via an archive_command beforehand. Ok. I had assumed this was fine, as the docs explicitly tell me to copy across any unarchived WAL files when doing failover. I think my confusion is because the docs for building a standby refer to the section on recovering from a backup, but I have a live server. I'll just let the WAL files get sucked over the replication connection if that works - this seems much simpler. I don't think I saw this mentioned in the docs. I had been assuming enough WAL needed to be available to bring the DB up to a consistent state before streaming replication would start. > Did you have a backup label in the rsync'ed datadir? In Maxim's case I could > detect that he had not via line numbers, but I do not see them here... Yes, the backup_label copied across (confirmed in scrollback from the rsync). -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
pgsql-bugs by date: