Thread: Reseting undo/redo logs

Reseting undo/redo logs

From
Edmon Begoli
Date:
I have this issue on Greenplum which is a MPP hybrid build from
postgres 8.2, and the issue I am seeing is 100% from pg code.

One of the Greenplum segments went down and it cannot recover because
"PANIC    XX000 invalid redo/undo record in shutdown checkpoint
(xlog.c:6576)"

I am posting this question here because most casual users of
Postgres/Greenplum are telling me that database is hosed, but I think
that with pg_resetxlog and some
(http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
loss I could at least "hack" database to come back up.

What I am asking for help here is to help me calculate the reset
values - where to find the most recent valid one and how to
*specifically* calculate the reset ones.

Please advise,
Edmon


2012-06-12 13:16:18.614912
EDT            p14611    th802662304                0            seg-1                    LOG    0    mirror
transition,
primary address(port) 'boxgp10a(41001)' mirror address(port)
'boxgp02a(51001)'                    mirroring role 'primary role' mirroring state
'change tracking' segment state 'not initialized' process name(pid)
'filerep main process(14611)' filerep state 'not initialized'    0        cdbfilerep.c    3371
2012-06-12 13:16:18.617047
EDT            p14612    th802662304                0            seg-1                    LOG    0    CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToInsync() found
insync_transition_completed:'false' full
resync:'false'                            0        cdbresynchronizechangetracking.c    2522
2012-06-12 13:16:18.617113
EDT            p14612    th802662304                0            seg-1                    LOG    0    CHANGETRACKING:
ChangeTracking_RetrieveIsTransitionToResync() found
resync_transition_completed:'false' full
resync:'false'                            0        cdbresynchronizechangetracking.c    2559
2012-06-12 13:16:18.746870
EDT            p14612    th802662304                0            seg-1                    LOG    0    searching for
last
checkpoint location for creating the initial resynchronize
changetracking                            0        xlog.c    10836
2012-06-12 13:16:18.747318
EDT            p14612    th802662304                0            seg-1                    LOG    0    record with zero
length at 14/48000070                            0        xlog.c    4182
2012-06-12 13:16:18.747491
EDT            p14612    th802662304                0            seg-1                    LOG    0    scanned through
1
initial xlog records since last checkpoint for writing into the
resynchronize change log                            0        cdbresynchronizechangetracking.c    206
2012-06-12 13:16:18.750830
EDT            p14624    th802662304                0            seg-1                    LOG    0    database system
was
shut down at 2012-06-12 11:00:13 EDT                            0        xlog.c    6326
2012-06-12 13:16:18.750987
EDT            p14624    th802662304                0            seg-1                    LOG    0    checkpoint record
is
at 14/48000020                            0        xlog.c    6425
2012-06-12 13:16:18.751016
EDT            p14624    th802662304                0            seg-1                    LOG    0    redo record is
at
14/48000020; undo record is at 14/42AC2118; shutdown
TRUE                            0        xlog.c    6534
2012-06-12 13:16:18.751041
EDT            p14624    th802662304                0            seg-1                    LOG    0    next transaction
ID:
0/4553423; next OID: 241771                            0        xlog.c    6538
2012-06-12 13:16:18.751065
EDT            p14624    th802662304                0            seg-1                    LOG    0    next MultiXactId:
271;
next MultiXactOffset: 549                            0        xlog.c    6541
2012-06-12 13:16:18.796637
EDT            p14624    th802662304                0            seg-1                    PANIC    XX000    invalid
redo/undo record in shutdown checkpoint
(xlog.c:6576)                            0        xlog.c    6576    "Stack trace:
1    0xa59f75 postgres errstart + 0x595
2    0x50f7ac postgres StartupXLOG + 0x1b8c
3    0x51778d postgres StartupProcessMain + 0x2fd
4    0x590746 postgres AuxiliaryProcessMain + 0x796
5    0x85fe54 postgres <symbol not found> + 0x85fe54
6    0x86003a postgres StartMasterOrPrimaryPostmasterProcesses + 0x3a
7    0x86ffaf postgres doRequestedPrimaryMirrorModeTransitions + 0xd9f
8    0x86bc4a postgres PostmasterMain + 0x1f8a
9    0x772bda postgres main + 0x4da
10   0x2af72ebc7994 libc.so.6 __libc_start_main + 0xf4
11   0x47bf49 postgres <symbol not found> + 0x47bf49


Re: Reseting undo/redo logs

From
Tom Lane
Date:
Edmon Begoli <ebegoli@gmail.com> writes:
> One of the Greenplum segments went down and it cannot recover because
> "PANIC    XX000 invalid redo/undo record in shutdown checkpoint
> (xlog.c:6576)"

> I am posting this question here because most casual users of
> Postgres/Greenplum are telling me that database is hosed, but I think
> that with pg_resetxlog and some
> (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
> loss I could at least "hack" database to come back up.

> What I am asking for help here is to help me calculate the reset
> values - where to find the most recent valid one and how to
> *specifically* calculate the reset ones.

pg_controldata should give you useful starting points.  I don't think we
can offer any more help than what is on the pg_resetxlog reference page
as to what to do with them.  (Though you might try reading the more
recent releases' versions of that page to see if anything's been
clarified.)
        regards, tom lane


Re: Reseting undo/redo logs

From
Edmon Begoli
Date:
Thanks. I was going down this route, so just your confirmation that
this is the right path is helpful.

Edmon

On Thu, Jun 21, 2012 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Edmon Begoli <ebegoli@gmail.com> writes:
>> One of the Greenplum segments went down and it cannot recover because
>> "PANIC        XX000 invalid redo/undo record in shutdown checkpoint
>> (xlog.c:6576)"
>
>> I am posting this question here because most casual users of
>> Postgres/Greenplum are telling me that database is hosed, but I think
>> that with pg_resetxlog and some
>> (http://www.postgresql.org/docs/8.2/static/app-pgresetxlog.html) data
>> loss I could at least "hack" database to come back up.
>
>> What I am asking for help here is to help me calculate the reset
>> values - where to find the most recent valid one and how to
>> *specifically* calculate the reset ones.
>
> pg_controldata should give you useful starting points.  I don't think we
> can offer any more help than what is on the pg_resetxlog reference page
> as to what to do with them.  (Though you might try reading the more
> recent releases' versions of that page to see if anything's been
> clarified.)
>
>                        regards, tom lane