warm standby resume and take online problems - Mailing list pgsql-general
From | Michal Bicz |
---|---|
Subject | warm standby resume and take online problems |
Date | |
Msg-id | E934F7423F81FC4FB06E112B21B491ABC7FA2CC8AD@EXVMBX003-5.exch003intermedia.net Whole thread Raw |
Responses |
Re: warm standby resume and take online problems
|
List | pgsql-general |
Hi, I have chain of warm stanby servers. One let's say db-01 is pushing updates to db-02 and then they are fetched to db-03. I decided to bring up online db-04 and stopped db-03 from warm standby with pg_ctl stop -m fast $PG_DATA And copied data over from db-03 to db-04. So now I have backup "data + binaries" that was taken from warm stanby when shut down. I have created recovery.conf with recovery_command, created recovery.sh (for recovery command), adjusted postgresql.confwith apropriate port + IP. recovery.sh is just a blind 'while' that is looking for trigger file then is ending. So I started: Removed everything from pg_xlog on backup that is going to be live. pg_controldata output: v pg_control version number: 822 Catalog version number: 200611241 Database system identifier: 5309237009736268543 Database cluster state: in archive recovery pg_control last modified: Thu Oct 29 11:30:04 2009 Current log file ID: 389 Next log file segment: 225 Latest checkpoint location: 2FA/BBA6B710 Prior checkpoint location: 2FA/AE916D60 Latest checkpoint's REDO location: 2FA/BBA38478 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 3/824035978 Latest checkpoint's NextOID: 59442871 Latest checkpoint's NextMultiXactId: 510637 Latest checkpoint's NextMultiOffset: 2076981 Time of latest checkpoint: Thu Oct 29 09:02:31 2009 Minimum recovery ending location: 186/80DCC48 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 First start ( no wal files in wal_recovery directory) 2009-11-01 16:09:10 PST : LOG: could not open file "pg_xlog/00000001000002FA000000BB" (log file 762, segment 187): No such file or directory 2009-11-01 16:09:10 PST : LOG: invalid primary checkpoint record 2009-11-01 16:09:10 PST : LOG: could not open file "pg_xlog/00000001000002FA000000AE" (log file 762, segment 174): No such file or directory 2009-11-01 16:09:10 PST : LOG: invalid secondary checkpoint record 2009-11-01 16:09:10 PST : PANIC: could not locate a valid checkpoint record 2009-11-01 16:09:10 PST : LOG: startup process (PID 1651) was terminated by signal 6 2009-11-01 16:09:10 PST : LOG: aborting startup due to startup process failure 2009-11-01 16:09:10 PST : LOG: logger shutting down Shipped it with everything from AE-BB to wal_recovery. It started in recovery mode asking for more WAL files. I started applying wal files and everything OK. Recovery in progress. When I feeded it with files up to ..2FB.08 (time around the oryginal data directory from warm standby server was copied)and triggered it came up online. Can connect select on some but when selected on logging.agentpagehit (35GB+) it crashed. It throwed on console: saturn=# select count(*) from logging.agentpagehit; ERROR: xlog flush request 2FB/45E1B8D0 is not satisfied --- flushed only to 2FB/8FFEA60 CONTEXT: writing block 874822 of relation 1663/20863/21548 Now it is saying constantly in log : 2009-11-04 04:57:39 PST : ERROR: XX000: xlog flush request 2FB/28CE63A8 is not satisfied --- flushed only to 2FB/8FFEA60 2009-11-04 04:57:39 PST : CONTEXT: writing block 874937 of relation 1663/20863/21548 2009-11-04 04:57:39 PST : LOCATION: XLogFlush, xlog.c:1865 2009-11-04 04:57:39 PST : WARNING: 58030: could not write block 874937 of 1663/20863/21548 2009-11-04 04:57:39 PST : DETAIL: Multiple failures --- write error may be permanent. 2009-11-04 04:57:39 PST : LOCATION: AbortBufferIO, bufmgr.c:2129 What am I missing? - Should I ship it with more WAL files from the past/future (if future until when) ? - Did 1st start without wal files broke it? - Did start without pg_xlog files broke it? - According to some post on the Web "Minimum recovery ending location: 186/80DCC48" means I should ship it with wal filessince 188..80, is this correct? I havent checked yet what is first file it is asking (%f) when started without any WAL files in wal_recovery, will know itin few hours as now copying data over once again. Any thoughts? Michal
pgsql-general by date: