Thread: A problem

A problem

From
Sam Barnett-Cormack
Date:
I have a problem partly caused by stupidity, which I desperately need to
fix if at all possible. Me and a colleague have tried fixing it and
may've made it worse...

I have a reasonably big db (21GB on disk, fully vaccuumed). I upgraded
from 7.3.2 to 7.3.3, forgetting to shut it down cleanly first. It then
got killed, for reasons best forgotten, with signal 15 to th main
postmaster.

On attempting to restart it, an error with the initscript, which came in
teh source RPM, led to it partially stomping some bits of the DB,
including pg_control in global. I have the old xlogs, clogs, everything.
I removed the 'new' data subdirectory. The init script was pointed at
/db/data when it should've been just /db, I still have the old
/data/base dir, all that has been altered in there (recent mtime) is
base/1175459/pg_internal.init

Attemtping to use the pg_resetxlog command (having backed up pg_control
and all the xlogs), we managed to start the server again - however, on
connecting to it with psql, it still had the right DB in teh cluster
('stats') but it claimed this db was completely empty.

I have kept everything I've tried reversible as far as my and my
colleagues knowledge allowed.

I hope someone can suggest something - I'm desperate! It'll take weeks
to re-input that data automatically.

Thanks

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University

Re: A problem

From
Tom Lane
Date:
Sam Barnett-Cormack <s.barnett-cormack@lancaster.ac.uk> writes:
> [ accidentally clobbered pg_control ]

It sounds like the main problem is too small a value in the XID
(transaction ID) counter, so that all your data tuples appear to be in
the future.  pg_resetxlog has an option to force an initial XID but
you need to use it.  You can guess a suitable XID by looking at the
names of the old clog files --- you want something that will be past
the end of the newest clog file, keeping in mind that there are 1M
(1024^2) XIDs per clog file and that the clog file names are in hex.
(I am not sure, but you might also need to provide a zero-filled clog
file spanning the point you start XID at.)

You will also need to be sure that the WAL file position
is greater than the end of the old WAL --- again, look at the existing
WAL files to see the largest number, and pick something bigger.

BTW, make real sure that pg_control is set up with the same locale
values as you were using before, or you'll be in a world of hurt
with corrupt indexes.  Use pg_controldata to look at what you have put
into pg_control.  (If you're not sure what was there before, get an
old copy of pg_control off your last backup tape ... you do have a
filesystem backup someplace, I trust ...)

> I still have the old
> /data/base dir, all that has been altered in there (recent mtime) is
> base/1175459/pg_internal.init

You can and probably should blow away pg_internal.init --- it's just a
cache file and can be rebuilt.  I'd not trust it at this point.

            regards, tom lane

Re: A problem

From
Sam Barnett-Cormack
Date:
Late as it is, I'll try this monday. I hope it works, other suggestions
still welcome of course, thanks very much.

On Fri, 27 Jun 2003, Tom Lane wrote:

> Sam Barnett-Cormack <s.barnett-cormack@lancaster.ac.uk> writes:
> > [ accidentally clobbered pg_control ]
>
> It sounds like the main problem is too small a value in the XID
> (transaction ID) counter, so that all your data tuples appear to be in
> the future.  pg_resetxlog has an option to force an initial XID but
> you need to use it.  You can guess a suitable XID by looking at the
> names of the old clog files --- you want something that will be past
> the end of the newest clog file, keeping in mind that there are 1M
> (1024^2) XIDs per clog file and that the clog file names are in hex.
> (I am not sure, but you might also need to provide a zero-filled clog
> file spanning the point you start XID at.)
>
> You will also need to be sure that the WAL file position
> is greater than the end of the old WAL --- again, look at the existing
> WAL files to see the largest number, and pick something bigger.
>
> BTW, make real sure that pg_control is set up with the same locale
> values as you were using before, or you'll be in a world of hurt
> with corrupt indexes.  Use pg_controldata to look at what you have put
> into pg_control.  (If you're not sure what was there before, get an
> old copy of pg_control off your last backup tape ... you do have a
> filesystem backup someplace, I trust ...)
>
> > I still have the old
> > /data/base dir, all that has been altered in there (recent mtime) is
> > base/1175459/pg_internal.init
>
> You can and probably should blow away pg_internal.init --- it's just a
> cache file and can be rebuilt.  I'd not trust it at this point.
>
>             regards, tom lane
>

--

Sam Barnett-Cormack
Software Developer                           |  Student of Physics & Maths
UK Mirror Service (http://www.mirror.ac.uk)  |  Lancaster University