Thread: after using pg_resetxlog, db lost

after using pg_resetxlog, db lost

From

"Shea,Dan [CIS]"

Date:

22 June 2004, 15:43:15

The pg_resetxlog was run as root. It caused ownership problems of
pg_control and xlog files.
Now we have no access to the data now through psql.  The data is still
there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
there is no reference to 36 of our tables in pg_class.  Also the 18
other tables that are reported in this database have no data in them.
Is there anyway to have the database resync or make it aware of the data
under /var/lib/pgsql/data/base/17347?
How can this problem be resolved?

There is actually 346 db files adding up to 134 GB in this database.


Below are error messages of when the database trying to be started.  I
am not sure of the when pg_resetxlog was run.  I suspect it was run to
get rid ot the "invalid primary checkpoint record".

The postgresql DB had an error trying to be started up.
The error was
Jun 22 13:17:53 murphy postgres[27430]: [4-1] LOG:  invalid primary
checkpoint record
Jun 22 13:17:53 murphy postgres[27430]: [5-1] LOG:  could not open file
"/var/lib/pgsql/data/pg_xlog/0000000000000000" (log file 0, segment 0):
No such file or directory
Jun 22 13:18:49 murphy postgres[28778]: [6-1] LOG:  invalid secondary
checkpoint record
Jun 22 13:18:49 murphy postgres[28778]: [7-1] PANIC:  could not locate a
valid checkpoint record


Jun 22 13:26:01 murphy postgres[30770]: [6-1] LOG:  database system is
ready
Jun 22 13:26:02 murphy postgresql: Starting postgresql service:
succeeded
Jun 22 13:26:20 murphy postgres[30789]: [2-1] PANIC:  could not access
status of transaction 553
Jun 22 13:26:20 murphy postgres[30789]: [2-2] DETAIL:  could not open
file "/var/lib/pgsql/data/pg_clog/0000": No such file or directory
Jun 22 13:26:20 murphy postgres[30789]: [2-3] STATEMENT:  COMMIT

and
Jun 22 13:26:20 murphy postgres[30791]: [10-1] LOG:  redo starts at
0/2000050
Jun 22 13:26:20 murphy postgres[30791]: [11-1] LOG:  file
"/var/lib/pgsql/data/pg_clog/0000" doesn't exist, reading as zeroes
Jun 22 13:26:20 murphy postgres[30791]: [12-1] LOG:  record with zero
length at 0/2000E84
Jun 22 13:26:20 murphy postgres[30791]: [13-1] LOG:  redo done at
0/2000E60
Jun 22 13:26:20 murphy postgres[30791]: [14-1] WARNING:  xlog flush
request 213/7363F354 is not satisfied --- flushed only to 0/2000E84
Jun 22 13:26:20 murphy postgres[30791]: [14-2] CONTEXT:  writing block
840074 of relation 17347/356768772
Jun 22 13:26:20 murphy postgres[30791]: [15-1] WARNING:  xlog flush
request 213/58426648 is not satisfied --- flushed only to 0/2000E84

and
Jun 22 13:38:23 murphy postgres[1460]: [2-1] ERROR:  xlog flush request
210/E757F150 is not satisfied --- flushed only to 0/2074CA0
Jun 22 13:38:23 murphy postgres[1460]: [2-2] CONTEXT:  writing block
824605 of relation 17347/356768772

We are using a san for our storage device.

Re: after using pg_resetxlog, db lost

From

Tom Lane

Date:

22 June 2004, 16:36:12

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> The pg_resetxlog was run as root. It caused ownership problems of
> pg_control and xlog files.
> Now we have no access to the data now through psql.  The data is still
> there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
> there is no reference to 36 of our tables in pg_class.  Also the 18
> other tables that are reported in this database have no data in them.
> Is there anyway to have the database resync or make it aware of the data
> under /var/lib/pgsql/data/base/17347?
> How can this problem be resolved?

What this sounds like is that you reset the transaction counter along
with the xlog, so that those tables appear to have been created by
transactions "in the future".  This could be repaired by doing
pg_resetxlog with a more appropriate initial transaction ID, but
figuring out what that value should be is not easy :-(

What I'd suggest is grabbing pg_filedump from
http://sources.redhat.com/rhdb/
and using it to look through pg_class (which will be file
$PGDATA/base/yourdbnumber/1259) to see the highest transaction ID
mentioned in any row of pg_class.  Then pg_resetxlog with a value
a bit larger than that.  Now you should be able to see all the rows
in pg_class ... but this doesn't get you out of the woods yet, unless
there are very-recently-created tables shown in pg_class.  I'd suggest
next looking through whichever tables you know to be recently modified
to find the highest transaction ID mentioned in them, and finally doing
another pg_resetxlog with a value a few million greater than that.  Then
you should be okay.

The reason you need to do this in two steps is that you'll need to look
at pg_class.relfilenode to get the file names of your recently-modified
tables.  Do NOT modify the database in any way while you are running
with the intermediate transaction ID setting.

> Jun 22 13:38:23 murphy postgres[1460]: [2-1] ERROR:  xlog flush request
> 210/E757F150 is not satisfied --- flushed only to 0/2074CA0

Looks like you also need a larger initial WAL offset in your
pg_resetxlog command.  Unlike the case with transaction IDs, there's
no need to try to be somewhat accurate in the setting --- I'd just
use a number WAY beyond what you had, maybe like 10000/0.

Finally, the fact that all this happened suggests that you lost the
contents of pg_control (else pg_resetxlog would have picked up the right
values from it).  Be very sure that you run pg_resetxlog under the same
locale settings (LC_COLLATE,LC_CTYPE) that you initially initdb'd with.
Otherwise you're likely to have nasty index-corruption problems later.

Good luck.  Next time, don't let amateurs fool with pg_resetxlog (and
anyone who'd run it as root definitely doesn't know what they're doing).
It is a wizard's tool.  Get knowledgeable advice from the PG lists
before you use it rather than after.

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

Richard Huxton

Date:

22 June 2004, 17:05:23

Tom Lane wrote:
> "Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
>
>>The pg_resetxlog was run as root. It caused ownership problems of
>>pg_control and xlog files.
>>Now we have no access to the data now through psql.  The data is still
>>there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
>>there is no reference to 36 of our tables in pg_class.  Also the 18
>>other tables that are reported in this database have no data in them.
>>Is there anyway to have the database resync or make it aware of the data
>>under /var/lib/pgsql/data/base/17347?
>>How can this problem be resolved?
>
>
> What this sounds like is that you reset the transaction counter along
> with the xlog, so that those tables appear to have been created by
> transactions "in the future".  This could be repaired by doing
> pg_resetxlog with a more appropriate initial transaction ID, but
> figuring out what that value should be is not easy :-(

Tom - would there be any value in adding this to a pg_dump? I'm assuming
the numbers attached to tables etc are their OIDs anyway, so it might be
a useful reference in cases like this.

--
   Richard Huxton
   Archonet Ltd

Re: after using pg_resetxlog, db lost

From

Tom Lane

Date:

22 June 2004, 17:25:30

Richard Huxton <dev@archonet.com> writes:
> Tom Lane wrote:
>> This could be repaired by doing
>> pg_resetxlog with a more appropriate initial transaction ID, but
>> figuring out what that value should be is not easy :-(

> Tom - would there be any value in adding this to a pg_dump?

Possibly.  CVS tip pg_dump has been changed to not output OIDs by
default, as a result of gripes from people who wanted to be able to
"diff" dumps from different servers and not have the diff cluttered
by irrelevant OID differences.  But a single header line showing
current XID and OID values doesn't seem like it would be a big problem.
We could put current timestamp there too, which was another recent topic
of discussion.

Bring it up on pghackers and see if anyone has an objection...

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

"Shea,Dan [CIS]"

Date:

23 June 2004, 23:47:21

Tom I see you from past emails that you reference using -i -f with pg_filedump.  I have tried this, but do not know
whatI am looking at.  What would be the the transaction id? What parameter am I supposed to pass to find it? 

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0
*
* File: /npmu_base/data/base/17347/1259
* Options used: -i -f
*
* Dump created on: Thu Jun 24 02:44:59 2004
*******************************************************************

Block    0 ********************************************************
<Header> -----
 Block Offset: 0x00000000         Offsets: Lower     232 (0x00e8)
 Block: Size 8192  Version    1            Upper     268 (0x010c)
 LSN:  logid      0 recoff 0x00632c08      Special  8192 (0x2000)
 Items:   53                   Free Space:   36
 Length (including item array): 236

  0000: 00000000 082c6300 0b000000 e8000c01  .....,c.........
  0010: 00200120 c4908801 00908801 3c8f8801  . . ........<...
  0020: 788e8801 b48d8801 f08c8801 2c8c8801  x...........,...
  0030: 689f3001 688b8801 a48a8801 e0898801  h.0.h...........
  0040: 1c898801 58888801 94878801 d0868801  ....X...........
  0050: 3c862801 a8852801 e4848801 50842801  <.(...(.....P.(.
  0060: bc832801 f8828801 64822801 d0812801  ..(.....d.(...(.
  0070: 0c818801 6c110000 d8100000 44100000  ....l.......D...
  0080: b00f0000 1c0f0000 d49e2801 409e2801  ..........(.@.(.
  0090: ac9d2801 189d2801 849c2801 f09b2801  ..(...(...(...(.
  00a0: 5c9b2801 c89a2801 349a2801 a0992801  \.(...(.4.(...(.
  00b0: 0c992801 78982801 e4972801 50972801  ..(.x.(...(.P.(.
  00c0: bc962801 28962801 94952801 00952801  ..(.(.(...(...(.
  00d0: 6c942801 d8932801 44932801 b0922801  l.(...(.D.(...(.
  00e0: 1c922801 88912801 00000000           ..(...(.....

<Data> ------
 Item   1 -- Length:  196  Offset: 4292 (0x10c4)  Flags: USED
  XID: min (2)  CMIN|XMAX: 211  CMAX|XVAC: 469
  Block Id: 0  linp Index: 1   Attributes: 24   Size: 28
 infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  10c4: 02000000 d3000000 d5010000 00000000  ................
  10d4: 01001800 12291c00 cc420000 7461626c  .....)...B..tabl
  10e4: 655f636f 6e737472 61696e74 73000000  e_constraints...
  10f4: 00000000 00000000 00000000 00000000  ................
  1104: 00000000 00000000 00000000 00000000  ................
  1114: 00000000 00000000 00000000 51420000  ............QB..
  1124: cd420000 01000000 00000000 cc420000  .B...........B..
  1134: 00000000 00000000 00000000 00000000  ................
  1144: 00007600 09000000 00000000 00000000  ..v.............
  1154: 00000100 30000000 01000000 00000000  ....0...........
  1164: 09040000 02000000 00000000 01000000  ................
  1174: 01000000 7f803f40 00000000 01000000  ......?@........
  1184: 02000000                             ....

 Item   2 -- Length:  196  Offset: 4096 (0x1000)  Flags: USED
  XID: min (2)  CMIN|XMAX: 215  CMAX|XVAC: 469
  Block Id: 0  linp Index: 2   Attributes: 24   Size: 28
  infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

  1000: 02000000 d7000000 d5010000 00000000  ................
  1010: 02001800 12291c00 d0420000 7461626c  .....)...B..tabl
  1020: 655f7072 6976696c 65676573 00000000  e_privileges....
  1030: 00000000 00000000 00000000 00000000  ................
  1040: 00000000 00000000 00000000 00000000  ................
  1050: 00000000 00000000 00000000 51420000  ............QB..
  1060: d1420000 01000000 00000000 d0420000  .B...........B..
  1070: 00000000 00000000 00000000 00000000  ................
  1080: 00007600 08000000 00000000 00000000  ..v.............
  1090: 00000100 30000000 01000000 00000000  ....0...........
  10a0: 09040000 02000000 00000000 01000000  ................
  10b0: 01000000 7f803f40 00000000 01000000  ......?@........
  10c0: 02000000                             ....

 Item   3 -- Length:  196  Offset: 3900 (0x0f3c)  Flags: USED
  XID: min (2)  CMIN|XMAX: 219  CMAX|XVAC: 469
  Block Id: 0  linp Index: 3   Attributes: 24   Size: 28

Dan.
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, June 22, 2004 3:36 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] after using pg_resetxlog, db lost

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> The pg_resetxlog was run as root. It caused ownership problems of
> pg_control and xlog files.
> Now we have no access to the data now through psql.  The data is still
> there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
> there is no reference to 36 of our tables in pg_class.  Also the 18
> other tables that are reported in this database have no data in them.
> Is there anyway to have the database resync or make it aware of the data
> under /var/lib/pgsql/data/base/17347?
> How can this problem be resolved?

What this sounds like is that you reset the transaction counter along
with the xlog, so that those tables appear to have been created by
transactions "in the future".  This could be repaired by doing
pg_resetxlog with a more appropriate initial transaction ID, but
figuring out what that value should be is not easy :-(

What I'd suggest is grabbing pg_filedump from
http://sources.redhat.com/rhdb/
and using it to look through pg_class (which will be file
$PGDATA/base/yourdbnumber/1259) to see the highest transaction ID
mentioned in any row of pg_class.  Then pg_resetxlog with a value
a bit larger than that.  Now you should be able to see all the rows
in pg_class ... but this doesn't get you out of the woods yet, unless
there are very-recently-created tables shown in pg_class.  I'd suggest
next looking through whichever tables you know to be recently modified
to find the highest transaction ID mentioned in them, and finally doing
another pg_resetxlog with a value a few million greater than that.  Then
you should be okay.

The reason you need to do this in two steps is that you'll need to look
at pg_class.relfilenode to get the file names of your recently-modified
tables.  Do NOT modify the database in any way while you are running
with the intermediate transaction ID setting.

> Jun 22 13:38:23 murphy postgres[1460]: [2-1] ERROR:  xlog flush request
> 210/E757F150 is not satisfied --- flushed only to 0/2074CA0

Looks like you also need a larger initial WAL offset in your
pg_resetxlog command.  Unlike the case with transaction IDs, there's
no need to try to be somewhat accurate in the setting --- I'd just
use a number WAY beyond what you had, maybe like 10000/0.

Finally, the fact that all this happened suggests that you lost the
contents of pg_control (else pg_resetxlog would have picked up the right
values from it).  Be very sure that you run pg_resetxlog under the same
locale settings (LC_COLLATE,LC_CTYPE) that you initially initdb'd with.
Otherwise you're likely to have nasty index-corruption problems later.

Good luck.  Next time, don't let amateurs fool with pg_resetxlog (and
anyone who'd run it as root definitely doesn't know what they're doing).
It is a wizard's tool.  Get knowledgeable advice from the PG lists
before you use it rather than after.

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

Tom Lane

Date:

24 June 2004, 00:41:48

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> Tom I see you from past emails that you reference using -i -f with
> pg_filedump.  I have tried this, but do not know what I am looking at.

What you want to look at is valid XMIN and XMAX values.  In this
example:

>  Item   1 -- Length:  196  Offset: 4292 (0x10c4)  Flags: USED
>   XID: min (2)  CMIN|XMAX: 211  CMAX|XVAC: 469
>   Block Id: 0  linp Index: 1   Attributes: 24   Size: 28
>  infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

the infomask shows XMIN_COMMITTED, so xmin (here 2) is valid, but it also
shows XMAX_INVALID, so the putative XMAX (211) should be ignored.

In general the xmin field should be valid, but xmax shares storage with
cmin and so you have to look at the infomask bits to know whether to
believe that the cmin/xmax field represents a transaction ID.

The cmax/xvac field could also hold a transaction ID.  If I had only
the above data to go on, I'd guess that the current transaction counter
is at least 469.

Under normal circumstances, command counter values (cmin or cmax) are
unlikely to exceed a few hundred, while the transaction IDs you are
looking for are likely to be much larger.  So you could get away with
just computing the max of *all* the numbers you see in xmin, cmin/xmax,
or cmax/cvac, and then using something a million or so bigger for safety
factor.

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

"Shea,Dan [CIS]"

Date:

24 June 2004, 13:15:22

I determined the largest was 12,293,162 and set it to
pg_resetxlog -x 15000000 /var/lib/pgsql/data

I am now able to see all the data.

I actually checked the log for the previous successfull startup before it the pg_control file was reset and it reported

Jun 22 11:55:44 pascal postgres[24993]: [5-1] LOG:  next transaction ID: 14820367; next OID: 727013114

So I entered
pg_resetxlog -o 750000000 /var/lib/pgsql/data  Setting oid value

I couldn't set 10000/0, so tried below
pg_resetxlog -l 10000,0 /var/lib/pgsql/data

This seems to be wrong because the databse is complaining and shutting down
Jun 24 15:02:05 murphy postgres[28061]: [6-1] LOG:  checkpoint record is at 2710/1000050
Jun 24 15:02:05 murphy postgres[28061]: [7-1] LOG:  redo record is at 2710/1000050; undo record is at 0/0; shutdown
TRUE
Jun 24 15:02:05 murphy postgres[28061]: [8-1] LOG:  next transaction ID: 15000010; next OID: 750000000
Jun 24 15:02:05 murphy postgres[28061]: [9-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Jun 24 15:02:05 murphy postgres[28062]: [5-1] FATAL:  the database system is starting up
Jun 24 15:02:05 murphy postgres[28063]: [5-1] FATAL:  the database system is starting up
Jun 24 15:02:05 murphy postgres[28061]: [10-1] LOG:  redo starts at 2710/1000090
Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC:  could not access status of transaction 15000030
Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E" at
offset73728: Success 
Jun 24 15:02:05 murphy postgres[24771]: [5-1] LOG:  startup process (PID 28061) was terminated by signal 6
Jun 24 15:02:05 murphy postgres[24771]: [6-1] LOG:  aborting startup due to startup process failure
Jun 24 15:50:51 murphy sshd(pam_unix)[690]: session opened for user root by (uid=0)
Jun 24 15:54:47 murphy su(pam_unix)[1541]: session opened for user postgres by root(uid=0)
Jun 24 16:03:47 murphy su(pam_unix)[2911]: session opened for user postgres by root(uid=0)
Jun 24 16:03:48 murphy su(pam_unix)[2911]: session closed for user postgres
Jun 24 16:03:48 murphy postgres[3182]: [1-1] LOG:  could not create IPv6 socket: Address family not supported by
protocol
Jun 24 16:03:48 murphy postgres[3188]: [2-1] LOG:  database system was interrupted while in recovery at 2004-06-24
15:02:05GMT 
Jun 24 16:03:48 murphy postgres[3188]: [2-2] HINT:  This probably means that some data is corrupted and you will have
touse the last backup for recovery. 
Jun 24 16:03:48 murphy postgres[3188]: [3-1] LOG:  checkpoint record is at 2710/1000050
Jun 24 16:03:48 murphy postgres[3188]: [4-1] LOG:  redo record is at 2710/1000050; undo record is at 0/0; shutdown TRUE
Jun 24 16:03:48 murphy postgres[3188]: [5-1] LOG:  next transaction ID: 15000010; next OID: 750000000
Jun 24 16:03:48 murphy postgres[3188]: [6-1] LOG:  database system was not properly shut down; automatic recovery in
progress
Jun 24 16:03:48 murphy postgres[3188]: [7-1] LOG:  redo starts at 2710/1000090
Jun 24 16:03:48 murphy postgres[3188]: [8-1] PANIC:  could not access status of transaction 15000030
Jun 24 16:03:48 murphy postgres[3188]: [8-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E" at
offset73728: Success 
Jun 24 16:03:48 murphy postgres[3182]: [2-1] LOG:  startup process (PID 3188) was terminated by signal 6
Jun 24 16:03:48 murphy postgres[3182]: [3-1] LOG:  aborting startup due to startup process failure

How do I set the xlog properly, or rather to 10000/0?
Dan.
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 23, 2004 11:41 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] after using pg_resetxlog, db lost

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> Tom I see you from past emails that you reference using -i -f with
> pg_filedump.  I have tried this, but do not know what I am looking at.

What you want to look at is valid XMIN and XMAX values.  In this
example:

>  Item   1 -- Length:  196  Offset: 4292 (0x10c4)  Flags: USED
>   XID: min (2)  CMIN|XMAX: 211  CMAX|XVAC: 469
>   Block Id: 0  linp Index: 1   Attributes: 24   Size: 28
>  infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

the infomask shows XMIN_COMMITTED, so xmin (here 2) is valid, but it also
shows XMAX_INVALID, so the putative XMAX (211) should be ignored.

In general the xmin field should be valid, but xmax shares storage with
cmin and so you have to look at the infomask bits to know whether to
believe that the cmin/xmax field represents a transaction ID.

The cmax/xvac field could also hold a transaction ID.  If I had only
the above data to go on, I'd guess that the current transaction counter
is at least 469.

Under normal circumstances, command counter values (cmin or cmax) are
unlikely to exceed a few hundred, while the transaction IDs you are
looking for are likely to be much larger.  So you could get away with
just computing the max of *all* the numbers you see in xmin, cmin/xmax,
or cmax/cvac, and then using something a million or so bigger for safety
factor.

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

Tom Lane

Date:

24 June 2004, 13:34:18

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> I determined the largest was 12,293,162 and set it to
> pg_resetxlog -x 15000000 /var/lib/pgsql/data

Okay, but it looks like you will also need to adjust pg_clog to cover
that transaction ID range.  (I had thought pg_resetxlog would handle
this for you, but it looks like not.)

> Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC:  could not access status of transaction 15000030
> Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E"
atoffset 73728: Success 

You need to append zeroes (8K at a time) to
/var/lib/pgsql/data/pg_clog/000E until it's longer than 73728 bytes.
I'd use something like
    dd bs=8k count=1 </dev/zero >>/var/lib/pgsql/data/pg_clog/000E
assuming that your system has /dev/zero.

            regards, tom lane

Re: after using pg_resetxlog, db lost

From

"Shea,Dan [CIS]"

Date:

24 June 2004, 14:17:31

Tom, thank you for your help.
I increased 000E to 81920 and the databse is working now.
We are using RHAS 3.0 and it does have /dev/zero.

Dan.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, June 24, 2004 12:34 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] after using pg_resetxlog, db lost


"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:
> I determined the largest was 12,293,162 and set it to
> pg_resetxlog -x 15000000 /var/lib/pgsql/data

Okay, but it looks like you will also need to adjust pg_clog to cover
that transaction ID range.  (I had thought pg_resetxlog would handle
this for you, but it looks like not.)

> Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC:  could not access status of transaction 15000030
> Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/000E"
atoffset 73728: Success 

You need to append zeroes (8K at a time) to
/var/lib/pgsql/data/pg_clog/000E until it's longer than 73728 bytes.
I'd use something like
    dd bs=8k count=1 </dev/zero >>/var/lib/pgsql/data/pg_clog/000E
assuming that your system has /dev/zero.

            regards, tom lane