Re: Hosed PostGreSQL Installation - Mailing list pgsql-hackers

From Pete St. Onge
Subject Re: Hosed PostGreSQL Installation
Date
Msg-id 20020924002936.H31893@moria.seul.org
Whole thread Raw
In response to Re: Hosed PostGreSQL Installation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Just following up on Tom Lane's email - 

A couple of things that I hadn't mentioned: After bringing up the
machine, the first thing I did before mucking about with PostGreSQL was
to tarball $PGDATA so that I'd have a second chance if I messed up. I
then ran pg_resetlog -f the first time, as Tom surmised, with the
unwanted results. 

That done, I sent out the email, and followed Tom's instructions (yay
backups!) and did it properly.

On Sat, Sep 21, 2002 at 11:13:44AM -0400, Tom Lane wrote:
> "Pete St. Onge" <pete@seul.org> writes:
> 
> That should not have been a catastrophic mistake in any version >= 7.1.
> I suspect you had disk problems or other problems.    We did, but these were on a different disk according to the
logs,
AFAIK. 

> These numbers are suspiciously small for an installation that's been
> in production awhile.  I suspect you have not told us the whole story;
> in particular I suspect you already tried "pg_resetxlog -f", which was
> probably not a good idea.    *raises hand* Yep.

Here's the contents of the pg_xlog directory. PGSQL has only been used
here for approximately 4 months of fairly light use, so perhaps the
numbers aren't as strange as they could be (this is from the backup).

-rw-------    1 postgres postgres 16777216 Sep 19 22:09 000000020000007E


> Yeah, your xlog positions should be a great deal higher than they are,
> if segment 2/7E was previously in use.
> 
> It is likely that you can recover (with some uncertainty about integrity
> of recent transactions) if you proceed as follows:
> 
> 1. Get contrib/pg_resetxlog/pg_resetxlog.c from the 7.2.2 release ...
[Chomp]

The compile worked without a hitch after doing ./configure in the
top-level directory. I just downloaded the src for both trees, made the
changes manually, copied the file into the 7.1.3 tree and compiled it
there. 

> 2. Run the hacked-up pg_resetxlog like this:
> 
>     pg_resetxlog -l 2 127 -x 1000000000 $PGDATA
> 
> (the -l position is next beyond what we see in pg_xlog, the 1-billion
> XID is just a guess at something past where you were.  Actually, can
> you give us the size of pg_log, ie, $PGDATA/global/1269?  That would
> allow computing a correct next-XID to use.  Figure 4 XIDs per byte,
> thus if pg_log is 1 million bytes you need -x at least 4 million.)
    -rw-------    1 postgres postgres 11870208 Sep 19 17:00 1269
    This gives a min WAL starting location of 47480832. I used
47500000.


> 3. The postmaster should start now.    I had to use pg_resetxlog's force option, but yeah, it worked like
you said it would.

> 4. *Immediately* attempt to do a pg_dumpall.  Do not pass GO, do not
> collect $200, do not let in any interactive clients until you've done
> it. (I'd suggest tweaking pg_hba.conf to disable all logins but your
> own.)    I did not pass go, I did not collect $200. I *did* do a pg_dumpall
right there and then, and was able to dump everything I needed. One
of the projects uses large objects - image files and html files (don't
ask, I've already tried to dissuade the Powers-That-Be) - and these
didn't come out. However, since this stuff is entered via script, the
project leader was fine with re-running the scripts tomorrow.


> 5. If pg_dumpall succeeds and produces sane-looking output, then you've
> survived.  initdb, reload the dump file, re-open for business, go have
> a beer.  (Recommended: install 7.2.2 and reload into that, not 7.1.*.)
> You will probably still need to check for partially-applied recent
> transactions, but for the most part you should be OK.    rpm -Uvh'ed the 7.2.2 RPMs, initdb'd and reloaded data into
thenew
 
installation. Pretty painless. I've just sent out an email to folks here
to let them know the situation, and we should know in the next day or so
what is up.


> 6. If pg_dumpall fails then let us know what the symptoms are, and we'll
> see if we can figure out a workaround for whatever the corruption is.    I've kept the tarball with the corrupted
data.I'll hold onto it
 
for a bit, in case, but will likely expunge it in the next week or so.
If this can have a use for the project (whatever it may be), let me know
and I can burn it to DVD.
    Of course, without your help, Tom, there would be a lot of Very
Unhappy People here, me only being one of them. Many thanks for your
help and advice!
   Cheers,
   Pete 


-- 
Pete St. Onge
Research Associate, Computational Biologist, UNIX Admin
Banting and Best Institute of Medical Research
Program in Bioinformatics and Proteomics
University of Toronto
http://www.utoronto.ca/emililab/


pgsql-hackers by date:

Previous
From: Gavin Sherry
Date:
Subject: Web site
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Web site