Re: BUG #7753: Cannot promote out of hot standby - Mailing list pgsql-bugs
From | Heikki Linnakangas |
---|---|
Subject | Re: BUG #7753: Cannot promote out of hot standby |
Date | |
Msg-id | 515181AB.8090302@vmware.com Whole thread Raw |
In response to | BUG #7753: Cannot promote out of hot standby (daniel@heroku.com) |
List | pgsql-bugs |
(cleaning up my inbox..) Did you ever figure out this one? On 12.12.2012 23:36, daniel@heroku.com wrote: > The following bug has been logged on the website: > > Bug reference: 7753 > Logged by: Daniel Farina > Email address: daniel@heroku.com > PostgreSQL version: 9.1.6 > Operating system: Ubuntu 10.04 > Description: > > Touching a trigger file will not cause promotion out of hot standby. > Basically, an apparently normally-working hot-standby database will > not leave hot standby. The database emitting WAL is version 9.1.4. > > Everything appears normal in the log (downloads and restoring of > archived segments), and the server seems to take no notice of the > trigger file. > > To force the issue, I introduced an error into the configuration of > the restoration program to cause it to exit. Normally that's no > problem; postgres would just keep on trying to restore a segment over > and over until the error is fixed. > > Instead, the server crashes: > > [413-1] [COPPER] LOG: restored log file "000000010000034D00000050" from > archive > wal_e.worker.s3_worker INFO MSG: completed download and > decompression#012 DETAIL: Downloaded and decompressed > "s3://archive-root/wal_005/000000010000034D00000051.lzo" to > "pg_xlog/RECOVERYXLOG" > [414-1] [COPPER] LOG: restored log file "000000010000034D00000051" from > archive > wal_e.worker.s3_worker INFO MSG: completed download and > decompression#012 DETAIL: Downloaded and decompressed > "s3://archive-root/wal_005/000000010000034D00000052.lzo" to > "pg_xlog/RECOVERYXLOG" > [415-1] [COPPER] LOG: restored log file "000000010000034D00000052" from > archive > > # I introduce the failure here > > wal_e.main ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT: > Define the environment variable AWS_SECRET_ACCESS_KEY. > LOG: trigger file found: /etc/postgresql/wal-e.d/pull-env/STANDBY_OFF > LOG: redo done at 34D/52248590 > LOG: last completed transaction was at log time 2012-12-10 wal_e.main > ERROR MSG: no AWS_SECRET_ACCESS_KEY defined#012 HINT: Define the > environment variable AWS_SECRET_ACCESS_KEY. > PANIC: could not open file "pg_xlog/000000010000034D00000052" (log file > 845, segment 82): No such file or directory > LOG: startup process (PID 7) was terminated by signal 6: Aborted > LOG: terminating any other active server processes > WARNING: terminating connection because of crash of another server process > WARNING: terminating connection because of crash of another server process > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another server process exited > abnormally and possibly corrupted shared memory. > > I can fix the configuration and restart the server, and everything is > as fine as before. Next, I try removing recovery.conf and restarting > the server as an alternative way of promoting...but, no avail; > however, a slightly different error message: > > # Server begins starting > LOG: loaded library "auto_explain" > LOG: loaded library "pg_stat_statements" > LOG: database system was interrupted while in recovery at log time > 2012-12-10 15:20:03 UTC > HINT: If this has occurred more than once some data might be corrupted and > you might need to choose an earlier recovery target. > LOG: could not open file "pg_xlog/000000010000034E0000001A" (log file 846, > segment 26): No such file or directory > LOG: invalid primary checkpoint record > LOG: could not open file "pg_xlog/000000010000034D000000F2" (log file 845, > segment 242): No such file or directory > LOG: invalid secondary checkpoint record > PANIC: could not locate a valid checkpoint record > LOG: startup process (PID 7) was terminated by signal 6: Aborted > LOG: aborting startup due to startup process failure > main process (24284) terminated with status 1 > > pg_control looks like this around the same time, for reference: > > pg_control version number: 903 > Catalog version number: 201105231 > Database cluster state: in archive recovery > pg_control last modified: Wed 12 Dec 2012 09:22:30 PM UTC > Latest checkpoint location: 351/1FE194C0 > Prior checkpoint location: 351/FD64A78 > Latest checkpoint's REDO location: 351/131848C8 > Latest checkpoint's TimeLineID: 1 > Latest checkpoint's NextXID: 0/652342033 > Latest checkpoint's NextOID: 103224 > Latest checkpoint's NextMultiXactId: 1 > Latest checkpoint's NextMultiOffset: 0 > Latest checkpoint's oldestXID: 455900714 > Latest checkpoint's oldestXID's DB: 16385 > Latest checkpoint's oldestActiveXID: 652311442 > Time of latest checkpoint: Mon 10 Dec 2012 07:19:23 PM UTC > Minimum recovery ending location: 351/4BFFFE20 > Backup start location: 0/0 > Current wal_level setting: hot_standby > Current max_connections setting: 500 > Current max_prepared_xacts setting: 500 > Current max_locks_per_xact setting: 64 > Maximum data alignment: 8 > Database block size: 8192 > Blocks per segment of large relation: 131072 > WAL block size: 8192 > Bytes per WAL segment: 16777216 > Maximum length of identifiers: 64 > Maximum columns in an index: 32 > Maximum size of a TOAST chunk: 1996 > Date/time type storage: 64-bit integers > Float4 argument passing: by value > Float8 argument passing: by value > > In the course of all this messing around, it has never been a problem to go > back to archive recovery. > > > -- - Heikki
pgsql-bugs by date: