Thread: Reseting undo/redo logs
I have this issue on Greenplum which is a MPP hybrid build from postgres 8.2, and the issue I am seeing is 100% from pg code. One of the Greenplum segments went down and it cannot recover because "PANIC XX000 invalid redo/undo record in shutdown checkpoint (xlog.c:6576)" I am posting this question here because most casual users of Postgres/Greenplum are telling me that database is hosed, but I think that with pg_resetxlog and some (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data loss I could at least "hack" database to come back up. What I am asking for help here is to help me calculate the reset values - where to find the most recent valid one and how to *specifically* calculate the reset ones. Please advise, Edmon 2012-06-12 13:16:18.614912 EDT p14611 th802662304 0 seg-1 LOG 0 mirror transition, primary address(port) 'boxgp10a(41001)' mirror address(port) 'boxgp02a(51001)' mirroring role 'primary role' mirroring state 'change tracking' segment state 'not initialized' process name(pid) 'filerep main process(14611)' filerep state 'not initialized' 0 cdbfilerep.c 3371 2012-06-12 13:16:18.617047 EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToInsync() found insync_transition_completed:'false' full resync:'false' 0 cdbresynchronizechangetracking.c 2522 2012-06-12 13:16:18.617113 EDT p14612 th802662304 0 seg-1 LOG 0 CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToResync() found resync_transition_completed:'false' full resync:'false' 0 cdbresynchronizechangetracking.c 2559 2012-06-12 13:16:18.746870 EDT p14612 th802662304 0 seg-1 LOG 0 searching for last checkpoint location for creating the initial resynchronize changetracking 0 xlog.c 10836 2012-06-12 13:16:18.747318 EDT p14612 th802662304 0 seg-1 LOG 0 record with zero length at 14/48000070 0 xlog.c 4182 2012-06-12 13:16:18.747491 EDT p14612 th802662304 0 seg-1 LOG 0 scanned through 1 initial xlog records since last checkpoint for writing into the resynchronize change log 0 cdbresynchronizechangetracking.c 206 2012-06-12 13:16:18.750830 EDT p14624 th802662304 0 seg-1 LOG 0 database system was shut down at 2012-06-12 11:00:13 EDT 0 xlog.c 6326 2012-06-12 13:16:18.750987 EDT p14624 th802662304 0 seg-1 LOG 0 checkpoint record is at 14/48000020 0 xlog.c 6425 2012-06-12 13:16:18.751016 EDT p14624 th802662304 0 seg-1 LOG 0 redo record is at 14/48000020; undo record is at 14/42AC2118; shutdown TRUE 0 xlog.c 6534 2012-06-12 13:16:18.751041 EDT p14624 th802662304 0 seg-1 LOG 0 next transaction ID: 0/4553423; next OID: 241771 0 xlog.c 6538 2012-06-12 13:16:18.751065 EDT p14624 th802662304 0 seg-1 LOG 0 next MultiXactId: 271; next MultiXactOffset: 549 0 xlog.c 6541 2012-06-12 13:16:18.796637 EDT p14624 th802662304 0 seg-1 PANIC XX000 invalid redo/undo record in shutdown checkpoint (xlog.c:6576) 0 xlog.c 6576 "Stack trace: 1 0xa59f75 postgres errstart + 0x595 2 0x50f7ac postgres StartupXLOG + 0x1b8c 3 0x51778d postgres StartupProcessMain + 0x2fd 4 0x590746 postgres AuxiliaryProcessMain + 0x796 5 0x85fe54 postgres <symbol not found> + 0x85fe54 6 0x86003a postgres StartMasterOrPrimaryPostmasterProcesses + 0x3a 7 0x86ffaf postgres doRequestedPrimaryMirrorModeTransitions + 0xd9f 8 0x86bc4a postgres PostmasterMain + 0x1f8a 9 0x772bda postgres main + 0x4da 10 0x2af72ebc7994 libc.so.6 __libc_start_main + 0xf4 11 0x47bf49 postgres <symbol not found> + 0x47bf49
Edmon Begoli <ebegoli@gmail.com> writes: > One of the Greenplum segments went down and it cannot recover because > "PANIC XX000 invalid redo/undo record in shutdown checkpoint > (xlog.c:6576)" > I am posting this question here because most casual users of > Postgres/Greenplum are telling me that database is hosed, but I think > that with pg_resetxlog and some > (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data > loss I could at least "hack" database to come back up. > What I am asking for help here is to help me calculate the reset > values - where to find the most recent valid one and how to > *specifically* calculate the reset ones. pg_controldata should give you useful starting points. I don't think we can offer any more help than what is on the pg_resetxlog reference page as to what to do with them. (Though you might try reading the more recent releases' versions of that page to see if anything's been clarified.) regards, tom lane
Thanks. I was going down this route, so just your confirmation that this is the right path is helpful. Edmon On Thu, Jun 21, 2012 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Edmon Begoli <ebegoli@gmail.com> writes: >> One of the Greenplum segments went down and it cannot recover because >> "PANIC XX000 invalid redo/undo record in shutdown checkpoint >> (xlog.c:6576)" > >> I am posting this question here because most casual users of >> Postgres/Greenplum are telling me that database is hosed, but I think >> that with pg_resetxlog and some >> (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data >> loss I could at least "hack" database to come back up. > >> What I am asking for help here is to help me calculate the reset >> values - where to find the most recent valid one and how to >> *specifically* calculate the reset ones. > > pg_controldata should give you useful starting points. I don't think we > can offer any more help than what is on the pg_resetxlog reference page > as to what to do with them. (Though you might try reading the more > recent releases' versions of that page to see if anything's been > clarified.) > > regards, tom lane