Re: after using pg_resetxlog, db lost - Mailing list pgsql-performance

From Shea,Dan [CIS]
Subject Re: after using pg_resetxlog, db lost
Date
Msg-id 644D07D3D59D8F408CD01AC2F833D8C62B9217@cisxa.cmc.int.ec.gc.ca
Whole thread Raw
In response to after using pg_resetxlog, db lost  ("Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca>)
Responses Re: after using pg_resetxlog, db lost
List pgsql-performance
I determined the largest was 12,293,162 and set it to
pg_resetxlog -x 15000000 /var/lib/pgsql/data

I am now able to see all the data.

I actually checked the log for the previous successfull startup before it the pg_control file was reset and it reported

Jun 22 11:55:44 pascal postgres[24993]: [5-1] LOG:  next transaction ID: 14820367; next OID: 727013114

So I entered
pg_resetxlog -o 750000000 /var/lib/pgsql/data  Setting oid value

I couldn't set 10000/0, so tried below
pg_resetxlog -l 10000,0 /var/lib/pgsql/data

This seems to be wrong because the databse is complaining and shutting down
Jun 24 15:02:05 murphy postgres[28061]: [6-1] LOG:  checkpoint record is at 2710/1000050
Jun 24 15:02:05 murphy postgres[28061]: [7-1] LOG:  redo record is at 2710/1000050; undo record is at 0/0; shutdown
TRUE
Jun 24 15:02:05 murphy postgres[28061]: [8-1] LOG:  next transaction ID: 15000010; next OID: 750000000
Jun 24 15:02:05 murphy postgres[28061]: [9-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Jun 24 15:02:05 murphy postgres[28062]: [5-1] FATAL:  the database system is starting up
Jun 24 15:02:05 murphy postgres[28063]: [5-1] FATAL:  the database system is starting up
Jun 24 15:02:05 murphy postgres[28061]: [10-1] LOG:  redo starts at 2710/1000090
Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC:  could not access status of transaction 15000030
Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E" at
offset73728: Success 
Jun 24 15:02:05 murphy postgres[24771]: [5-1] LOG:  startup process (PID 28061) was terminated by signal 6
Jun 24 15:02:05 murphy postgres[24771]: [6-1] LOG:  aborting startup due to startup process failure
Jun 24 15:50:51 murphy sshd(pam_unix)[690]: session opened for user root by (uid=0)
Jun 24 15:54:47 murphy su(pam_unix)[1541]: session opened for user postgres by root(uid=0)
Jun 24 16:03:47 murphy su(pam_unix)[2911]: session opened for user postgres by root(uid=0)
Jun 24 16:03:48 murphy su(pam_unix)[2911]: session closed for user postgres
Jun 24 16:03:48 murphy postgres[3182]: [1-1] LOG:  could not create IPv6 socket: Address family not supported by
protocol
Jun 24 16:03:48 murphy postgres[3188]: [2-1] LOG:  database system was interrupted while in recovery at 2004-06-24
15:02:05GMT 
Jun 24 16:03:48 murphy postgres[3188]: [2-2] HINT:  This probably means that some data is corrupted and you will have
touse the last backup for recovery. 
Jun 24 16:03:48 murphy postgres[3188]: [3-1] LOG:  checkpoint record is at 2710/1000050
Jun 24 16:03:48 murphy postgres[3188]: [4-1] LOG:  redo record is at 2710/1000050; undo record is at 0/0; shutdown TRUE
Jun 24 16:03:48 murphy postgres[3188]: [5-1] LOG:  next transaction ID: 15000010; next OID: 750000000
Jun 24 16:03:48 murphy postgres[3188]: [6-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Jun 24 16:03:48 murphy postgres[3188]: [7-1] LOG:  redo starts at 2710/1000090
Jun 24 16:03:48 murphy postgres[3188]: [8-1] PANIC:  could not access status of transaction 15000030
Jun 24 16:03:48 murphy postgres[3188]: [8-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E" at
offset73728: Success 
Jun 24 16:03:48 murphy postgres[3182]: [2-1] LOG:  startup process (PID 3188) was terminated by signal 6
Jun 24 16:03:48 murphy postgres[3182]: [3-1] LOG:  aborting startup due to startup process failure

How do I set the xlog properly, or rather to 10000/0?
Dan.
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 23, 2004 11:41 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] after using pg_resetxlog, db lost


"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> Tom I see you from past emails that you reference using -i -f with
> pg_filedump.  I have tried this, but do not know what I am looking at.

What you want to look at is valid XMIN and XMAX values.  In this
example:

>  Item   1 -- Length:  196  Offset: 4292 (0x10c4)  Flags: USED
>   XID: min (2)  CMIN|XMAX: 211  CMAX|XVAC: 469
>   Block Id: 0  linp Index: 1   Attributes: 24   Size: 28
>  infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

the infomask shows XMIN_COMMITTED, so xmin (here 2) is valid, but it also
shows XMAX_INVALID, so the putative XMAX (211) should be ignored.

In general the xmin field should be valid, but xmax shares storage with
cmin and so you have to look at the infomask bits to know whether to
believe that the cmin/xmax field represents a transaction ID.

The cmax/xvac field could also hold a transaction ID.  If I had only
the above data to go on, I'd guess that the current transaction counter
is at least 469.

Under normal circumstances, command counter values (cmin or cmax) are
unlikely to exceed a few hundred, while the transaction IDs you are
looking for are likely to be much larger.  So you could get away with
just computing the max of *all* the numbers you see in xmin, cmin/xmax,
or cmax/cvac, and then using something a million or so bigger for safety
factor.

            regards, tom lane

pgsql-performance by date:

Previous
From: Laurent Martelli
Date:
Subject: Re: Pas la samedi
Next
From: Tom Lane
Date:
Subject: Re: after using pg_resetxlog, db lost