Thread: how to use pg_resetxlog

how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

Howdy

Running PostgreSQL 7.2.1 on RedHat Linux

This past weekend, the RAID array took a hit and crashed
the database.  I get a few errors like:

[snip]
DEBUG:  invalid secondary checkpoint record
FATAL 2:  unable to locate a valid checkpoint record
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
DEBUG:  reaping dead processes
DEBUG:  startup process (pid 23543) exited with exit code 2
DEBUG:  aborting startup due to startup process failure
[/snip]

I'm reading around and I see a bit of information about
using pg_resetxlog, but I don't know much about it.

1) How do you use  pg_resetxlog?
2) Am I at risk of losing all of my data
   if I use this?
3) Some info at groups.google.com suggest that I will
   have to initdb - I figured that would be good anyway, but
   how does pg_resetxlog play into this?

TIA!

-X

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> 1) How do you use  pg_resetxlog?

Read the README file for it.

> 2) Am I at risk of losing all of my data
>    if I use this?

You're at risk of having corrupted data, but I think that's true already
because of the disk failure.

> 3) Some info at groups.google.com suggest that I will
>    have to initdb - I figured that would be good anyway, but
>    how does pg_resetxlog play into this?

After pg_resetxlog you should be able to dump your database.  I'd
recommend doing that, then initdb, reload the dump, look for
inconsistencies ...

            regards, tom lane

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> Running PostgreSQL 7.2.1 on RedHat Linux

PS: if you are still using 7.2.* then you should *definitely* be on
7.2.4.  Since your DB is down anyway, this seems like a good time for an
upgrade ...

            regards, tom lane

Re: how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

--Thanks to all that replied

--I did the pg_resetxlog -f $PGDATA and
--it generated a new log, but when I try to
--restart the database, it fails.  I look in the
--logs and I see:

[snip]
Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2: 
control file context is broken
Oct 21 01:32:14 localsvr postgresql: Starting postgresql
service:  failed
[/snip]

--There doesn't seem to be much info in groups.google.com
--about this message.  What does this mean and how can
--I fix the DB (so I can dump it into a new version)?

--Thanks!

[snip from past email]
This past weekend, the RAID array took a hit and crashed
the database.  I get a few errors like:
[snip]
DEBUG:  invalid secondary checkpoint record
FATAL 2:  unable to locate a valid checkpoint record
DEBUG:  proc_exit(2)
DEBUG:  shmem_exit(2)
DEBUG:  exit(2)
DEBUG:  reaping dead processes
DEBUG:  startup process (pid 23543) exited with exit code 2
DEBUG:  aborting startup due to startup process failure
[/snip]

[/snip from past email]

-X

Re: how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

--silly question:

--is it possible to build a new DB and point it to
--the existing location  / files of the old DB
--and mount data that way?

--i know ...  i'm grasping for straws ... this is
--just about a retarded / rhetorical question ...

-X

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --I did the pg_resetxlog -f $PGDATA and
> --it generated a new log, but when I try to
> --restart the database, it fails.  I look in the
> --logs and I see:

> [snip]
> Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2:
> control file context is broken

Are you sure you used the right version of pg_resetxlog?  This seems to
indicate that pg_resetxlog produced a bogus pg_control file.
pg_controldata might help to identify the problem.

            regards, tom lane

Re: how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

 
--you're right about the contrib version - it's wrong.
--can i d/l an old version and recompile it?

-X
-----Original Message-----
From: Tom Lane
To: Johnson, Shaunn
Cc: pgsql-general@postgresql.org
Sent: 10/21/03 10:17 AM
Subject: Re: [GENERAL] how to use pg_resetxlog

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --I did the pg_resetxlog -f $PGDATA and
> --it generated a new log, but when I try to
> --restart the database, it fails.  I look in the
> --logs and I see:

> [snip]
> Oct 21 01:32:13 localsvr postgres[2598]: [1] FATAL 2: 
> control file context is broken

Are you sure you used the right version of pg_resetxlog?  This seems to
indicate that pg_resetxlog produced a bogus pg_control file.
pg_controldata might help to identify the problem.

                        regards, tom lane

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --you're right about the contrib version - it's wrong.
> --can i d/l an old version and recompile it?

Should work.  You'll need a complete source tarball for 7.2.*.

            regards, tom lane

Re: how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

 --howdy:

--installed the 7.2.1 tarball and configured / gmake / install
--the pg_resetxlog.  when i restarted postmaster, it says 'ok',
--but when i tried to get in via the command line, i got this:
[snip]
psql: FATAL 1:  cannot open pg_attribute: No such file or directory
[/snip]

--did i miss a step?

--thanks again for all of your help!

-X

-----Original Message-----
From: Tom Lane
To: Johnson, Shaunn
Cc: 'pgsql-general@postgresql.org '
Sent: 10/21/03 11:15 AM
Subject: Re: [GENERAL] how to use pg_resetxlog

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --you're right about the contrib version - it's wrong.
> --can i d/l an old version and recompile it?

Should work.  You'll need a complete source tarball for 7.2.*.

                        regards, tom lane

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --installed the 7.2.1 tarball and configured / gmake / install
> --the pg_resetxlog.  when i restarted postmaster, it says 'ok',
> --but when i tried to get in via the command line, i got this:
> [snip]
> psql: FATAL 1:  cannot open pg_attribute: No such file or directory
> [/snip]

> --did i miss a step?

You probably used too small a value for the initial XID setting.  You
need to redo pg_resetxlog with a -x value larger than the number of
transactions elapsed in your database.  You can determine a suitable
number to use by looking in $PGDATA/pg_clog to see the largest file
number there (beware, they are in hex).  Add one, then multiply by 1024*1024.

            regards, tom lane

Re: how to use pg_resetxlog

From
"Johnson, Shaunn"
Date:

 --i'm still not sure what's going on.

--i did the following
[snip]
bash-2.05$ export PGDATA=/var/lib/pgsql/data/
bash-2.05$ echo $PGDATA
/var/lib/pgsql/data/
bash-2.05$ ./pg_resetxlog -x 11534336 $PGDATA
XLOG reset.

---

bash-2.05$ ./pg_resetxlog -n /var/lib/pgsql/data/
pg_control values:

pg_control version number:            71
Catalog version number:               200201121
Current log file id:                  1014
Next log file segment:                57
Latest checkpoint's StartUpID:        25
Latest checkpoint's NextXID:          11534337
Latest checkpoint's NextOID:          2528235005
Database block size:                  8192
Blocks per segment of large relation: 131072
LC_COLLATE:                           C
LC_CTYPE:                             C

[/snip]

--and even though the db says it's okay, i still
--get this :

[snip]
psql: FATAL 1:  cannot open pg_attribute: No such file or directory
[/snip]

--i realize i must be getting rather boring right about
--now, but i really appreciate the info / guidance.
--thanks!

-X

-----Original Message-----
From: Tom Lane
To: Johnson, Shaunn
Cc: ''pgsql-general@postgresql.org ' '
Sent: 10/21/03 12:03 PM
Subject: Re: [GENERAL] how to use pg_resetxlog

"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --installed the 7.2.1 tarball and configured / gmake / install
> --the pg_resetxlog.  when i restarted postmaster, it says 'ok',
> --but when i tried to get in via the command line, i got this:
> [snip]
> psql: FATAL 1:  cannot open pg_attribute: No such file or directory
> [/snip]

> --did i miss a step?

You probably used too small a value for the initial XID setting.  You
need to redo pg_resetxlog with a -x value larger than the number of
transactions elapsed in your database.  You can determine a suitable
number to use by looking in $PGDATA/pg_clog to see the largest file
number there (beware, they are in hex).  Add one, then multiply by
1024*1024.

                        regards, tom lane

Re: how to use pg_resetxlog

From
Tom Lane
Date:
"Johnson, Shaunn" <SJohnson6@bcbsm.com> writes:
> --i realize i must be getting rather boring right about
> --now, but i really appreciate the info / guidance.

Somewhere along here you may need to give up and revert to whatever your
most recent backup is :-(.

Taking a second look at the error message:

>> psql: FATAL 1:  cannot open pg_attribute: No such file or directory

I realized that this probably isn't an XID issue as I first thought.
Either the pg_attribute file is actually missing (is there a file named
'1249' in the database directory?) or the relfilenode value for it is
clobbered in pg_class.  In the latter case you could maybe find and fix
the erroneous data with a hex editor, but one would have to wonder what
else has been clobbered during the disk crash.

Thinking about it, it could be that the clobber is only in the database's
pg_internal.init cache file --- you could try renaming that out of the
way to see if the system can successfully make a new one.

Also, this failure should only affect one database in the cluster ---
can you connect to any other ones?

            regards, tom lane

Re: how to use pg_resetxlog

From
Dennis Gearon
Date:
Tom Lane wrote:

>I realized that this probably isn't an XID issue as I first thought.
>Either the pg_attribute file is actually missing (is there a file named
>'1249' in the database directory?) or the relfilenode value for it is
>clobbered in pg_class.  In the latter case you could maybe find and fix
>the erroneous data with a hex editor, but one would have to wonder what
>else has been clobbered during the disk crash.
>
>Thinking about it, it could be that the clobber is only in the database's
>pg_internal.init cache file --- you could try renaming that out of the
>way to see if the system can successfully make a new one.
>
>Also, this failure should only affect one database in the cluster ---
>can you connect to any other ones?
>
>            regards, tom lane
>
>
Is there in any graphical picture of the system files, utilities, and
how the interact with each other?

Or in how to set up Postgres?

I would be willing to draw it up if someone wanted to send me pencil
drawings and go throught the editing iterations with me.