Re: Corrupt index - Mailing list pgsql-general

From Amir Becher
Subject Re: Corrupt index
Date
Msg-id 20030410203247.63208.qmail@web13902.mail.yahoo.com
Whole thread Raw
In response to Re: Corrupt index  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Corrupt index  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
The source of the problem was the VERITAS Backup Exec.
I was able to replicate the index corruption in under
a minute by running a backup and updating the database
at the same time. I have never been able to replicate
the problem before because I was testing during the
day, when the backup was not running.

As far as backups are concerned, we will no longer
backup the data directory itself - that was clearly a
dumb thing to do in the first place. We actually have
been backing up the data using pg_dumpall as well (so
there is still hope for us).

Thanks for all the help - I greatly appreciate it.


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amir Becher <abecher@yahoo.com> writes:
> > I don't know if this may have something to do with
> it,
> > but we do backup the data every night using
> VERITAS
> > Backup Exec. We are not restoring anything, though
> > (the data is backed up to tape). The VERITAS
> software
> > runs on Windows, but there is an agent that runs
> on
> > our Linux box where the PostgreSQL data is stored.
> I
> > should also mention that the backup is running
> while
> > the database is being modified (we modify the
> database
> > 24/7).
>
> You're wasting your time making such a backup --- if
> you ever have to
> use it, it'll be corrupt, because the individual
> files in the database
> won't be in sync.  But that's not the immediate
> problem.
>
> > There is another unexpected behavior that I
> noticed
> > for the first time this morning (so I am not sure
> if
> > it's recurring, related or relevant). The database
> > "blinked" in the sense that all database
> connections
> > were lost - but new connections could be obtained
> > immediately after the "blink". The error message
> that
> > I got said something about possible "corrupted
> shared
> > memory" and I guess the shutting down of the
> > connections was a precautionary measure.
>
> That sounds like a backend crash, all right.  Given
> that, I'm thinking
> that you have more extensive problems than just this
> one symptom.  The
> odds are good that it's a hardware issue, because we
> haven't heard any
> reports of comparable misbehavior from anyone else.
>
> I'd recommend running some hardware diagnostics ---
> memtest86 and
> badblocks seem to be the most widely used, although
> they aren't always
> able to find problems.
>
> It would also be a good idea to start taking some
> *real* backups, using
> pg_dump or pg_dumpall.  You will be lucky if you
> don't find any more
> serious corruption in the database, if I'm right
> that there's hardware
> flakiness involved.  You may find yourself forced to
> initdb and restore
> from a backup, so you'd better have one.
>
>             regards, tom lane


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


pgsql-general by date:

Previous
From: "Roderick A. Anderson"
Date:
Subject: Re: Pg and Stunnel
Next
From: Tom Lane
Date:
Subject: Re: Pg and Stunnel