Thread: Corrupt index

Corrupt index

From

Amir Becher

Date:

10 April 2003, 15:22:06

I am experiencing frequent index corruptions (almost
daily). The tables have about a million records. I
solve the problem by using the REINDEX TABLE command
and rerunning my programs. Does anyone know why I
might be experiencing these problems? I am using batch
mode to insert a few thousand records at a time in
order to improve performance - could the index
corruption be caused by a bug in the batch mode? Also,
is there a way to determine whether an index is
corrupt (other than seeing inconsistencies in the
data)?

I am running PostgreSQL 7.3.1 on Red Hat 8.0. I access
and modify the database through Java (using the JDBC driver).

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Re: Corrupt index

From

Doug McNaught

Date:

10 April 2003, 15:39:30

Amir Becher <abecher@yahoo.com> writes:

> I am experiencing frequent index corruptions (almost
> daily).

It's very likely to be a hardware problem, but try upgrading to the
current 7.3 release (7.3.2 I think) to make sure it's not an
already-fixed bug.

-Doug

Re: Corrupt index

From

Tom Lane

Date:

10 April 2003, 15:39:50

Amir Becher <abecher@yahoo.com> writes:
> I am experiencing frequent index corruptions (almost
> daily).

Please define your problem: what misbehavior are you actually seeing?
Give us facts, not interpretations.

            regards, tom lane

Re: Corrupt index

From

Amir Becher

Date:

10 April 2003, 16:21:24

Here is an example of the misbehavior that I am
seeing:

A simple table with 3 colums (and approximately a
million records):
column_a INTEGER PRIMARY KEY
column_b DATE NOT NULL
column_c DOUBLE NOT NULL

When the index (primary key) gets corrupted, I get the
following behavior when I do a simple SELECT (using
the psql program):

SELECT * FROM example_table WHERE (column_a = 12345);

The result looks like:

column_a | column_b   | column_c
--------------------------------
67890    | 2001-01-01 | 100

The main point is that "column_a" is the wrong number.

When I use the EXPLAIN command, it tells me that it's
doing an index scan using the primary key index.
Issuing a "REINDEX TABLE example_table" command fixes
the problem.

I usually detect the problem when an INSERT fails
(batch insert fails due to duplicate key). I also
delete a few thousand records before the inserts. All
of the inserts and deletes are done throught the JDBC3
driver.

Please let me know what additional information you
would like me to provide.

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amir Becher <abecher@yahoo.com> writes:
> > I am experiencing frequent index corruptions
> (almost
> > daily).
>
> Please define your problem: what misbehavior are you
> actually seeing?
> Give us facts, not interpretations.
>
>             regards, tom lane

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Re: Corrupt index

From

Dennis Gearon

Date:

10 April 2003, 16:51:27

exactly what are your indexes on this, as well as any triggers?

Amir Becher wrote:
> Here is an example of the misbehavior that I am
> seeing:
>
> A simple table with 3 colums (and approximately a
> million records):
> column_a INTEGER PRIMARY KEY
> column_b DATE NOT NULL
> column_c DOUBLE NOT NULL
>
> When the index (primary key) gets corrupted, I get the
> following behavior when I do a simple SELECT (using
> the psql program):
>
> SELECT * FROM example_table WHERE (column_a = 12345);
>
> The result looks like:
>
> column_a | column_b   | column_c
> --------------------------------
> 67890    | 2001-01-01 | 100
>
> The main point is that "column_a" is the wrong number.
>
> When I use the EXPLAIN command, it tells me that it's
> doing an index scan using the primary key index.
> Issuing a "REINDEX TABLE example_table" command fixes
> the problem.
>
> I usually detect the problem when an INSERT fails
> (batch insert fails due to duplicate key). I also
> delete a few thousand records before the inserts. All
> of the inserts and deletes are done throught the JDBC3
> driver.
>
> Please let me know what additional information you
> would like me to provide.
>
> --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>Amir Becher <abecher@yahoo.com> writes:
>>
>>>I am experiencing frequent index corruptions
>>
>>(almost
>>
>>>daily).
>>
>>Please define your problem: what misbehavior are you
>>actually seeing?
>>Give us facts, not interpretations.
>>
>>            regards, tom lane
>
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - File online, calculators, forms, and more
> http://tax.yahoo.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

Re: Corrupt index

From

Tom Lane

Date:

10 April 2003, 17:04:05

Amir Becher <abecher@yahoo.com> writes:
> Here is an example of the misbehavior that I am
> seeing:
> [snip]
> The main point is that "column_a" is the wrong number.

Hmm.  This looks like you must have an index entry pointing at the wrong
heap tuple.  Which is not something we hear of regularly, and certainly
not an error that I'd expect to see appearing in isolation.

You didn't say in so many words, but should I take it that (a) this
pattern of misbehavior (wrong row returned) occurs repeatedly for you,
and (b) there are no other signs of problems (backend unexpectedly
quits, strange error messages, that sort of thing)?

I can't really think of any plausible mechanism for this to happen,
except for the index and table files getting out of sync somehow.
Are you doing anything that operates directly on the database files
(for example, doing backup/restore with cp or tar or some such)?

            regards, tom lane

Re: Corrupt index

From

Amir Becher

Date:

10 April 2003, 17:55:21

This pattern of misbehavior does occur repeatedly for
me.

I don't know if this may have something to do with it,
but we do backup the data every night using VERITAS
Backup Exec. We are not restoring anything, though
(the data is backed up to tape). The VERITAS software
runs on Windows, but there is an agent that runs on
our Linux box where the PostgreSQL data is stored. I
should also mention that the backup is running while
the database is being modified (we modify the database
24/7).

There is another unexpected behavior that I noticed
for the first time this morning (so I am not sure if
it's recurring, related or relevant). The database
"blinked" in the sense that all database connections
were lost - but new connections could be obtained
immediately after the "blink". The error message that
I got said something about possible "corrupted shared
memory" and I guess the shutting down of the
connections was a precautionary measure.

I hope this additional information helps.

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amir Becher <abecher@yahoo.com> writes:
> > Here is an example of the misbehavior that I am
> > seeing:
> > [snip]
> > The main point is that "column_a" is the wrong
> number.
>
> Hmm.  This looks like you must have an index entry
> pointing at the wrong
> heap tuple.  Which is not something we hear of
> regularly, and certainly
> not an error that I'd expect to see appearing in
> isolation.
>
> You didn't say in so many words, but should I take
> it that (a) this
> pattern of misbehavior (wrong row returned) occurs
> repeatedly for you,
> and (b) there are no other signs of problems
> (backend unexpectedly
> quits, strange error messages, that sort of thing)?
>
> I can't really think of any plausible mechanism for
> this to happen,
> except for the index and table files getting out of
> sync somehow.
> Are you doing anything that operates directly on the
> database files
> (for example, doing backup/restore with cp or tar or
> some such)?
>
>             regards, tom lane

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Re: Corrupt index

From

Tom Lane

Date:

10 April 2003, 18:50:42

Amir Becher <abecher@yahoo.com> writes:
> I don't know if this may have something to do with it,
> but we do backup the data every night using VERITAS
> Backup Exec. We are not restoring anything, though
> (the data is backed up to tape). The VERITAS software
> runs on Windows, but there is an agent that runs on
> our Linux box where the PostgreSQL data is stored. I
> should also mention that the backup is running while
> the database is being modified (we modify the database
> 24/7).

You're wasting your time making such a backup --- if you ever have to
use it, it'll be corrupt, because the individual files in the database
won't be in sync.  But that's not the immediate problem.

> There is another unexpected behavior that I noticed
> for the first time this morning (so I am not sure if
> it's recurring, related or relevant). The database
> "blinked" in the sense that all database connections
> were lost - but new connections could be obtained
> immediately after the "blink". The error message that
> I got said something about possible "corrupted shared
> memory" and I guess the shutting down of the
> connections was a precautionary measure.

That sounds like a backend crash, all right.  Given that, I'm thinking
that you have more extensive problems than just this one symptom.  The
odds are good that it's a hardware issue, because we haven't heard any
reports of comparable misbehavior from anyone else.

I'd recommend running some hardware diagnostics --- memtest86 and
badblocks seem to be the most widely used, although they aren't always
able to find problems.

It would also be a good idea to start taking some *real* backups, using
pg_dump or pg_dumpall.  You will be lucky if you don't find any more
serious corruption in the database, if I'm right that there's hardware
flakiness involved.  You may find yourself forced to initdb and restore
from a backup, so you'd better have one.

            regards, tom lane

Re: Corrupt index

From

Amir Becher

Date:

10 April 2003, 18:52:12

I have noticed this misbehavior on three tables that
are somewhat different than the example I gave. So
here are the actual tables with trigger and index
information:

TABLE #1
--------
CREATE TABLE example1_table (
id INTEGER NOT NULL,
price FLOAT NOT NULL,
date DATE NOT NULL,
source TEXT);

CREATE UNIQUE INDEX example1_index ON example1_table
(id, date);

TABLE #2
--------
CREATE TABLE example2_table (
entry_number INTEGER PRIMARY KEY,
s_id INTEGER NOT NULL REFERENCES example4_table ON
UPDATE CASCADE ON DELETE NO ACTION,
a_number INTEGER NOT NULL REFERENCES example5_table ON
UPDATE CASCADE ON DELETE NO ACTION,
type TEXT NOT NULL CHECK (type IN ('D','C')),
a_id INTEGER NOT NULL,
price FLOAT NOT NULL,
quantity FLOAT NOT NULL,
currency CHAR(3) NOT NULL,
date DATE NOT NULL,
date_created DATE NOT NULL,
is_derived BOOLEAN NOT NULL,
description TEXT NOT NULL);

CREATE INDEX example2_index1 ON example2_table
(is_derived, date_created);
CREATE INDEX example2_index2 ON example2_table (s_id,
date);
CREATE INDEX example2_index3 ON example2_table (s_id,
a_id);
CREATE INDEX example2_index4 ON example2_table
(description);

TABLE #3
--------
CREATE TABLE example3_table (
a_id INTEGER NOT NULL,
s_id INTEGER NOT NULL REFERENCES example4_table ON
UPDATE CASCADE ON DELETE NO ACTION,
a_number INTEGER NOT NULL REFERENCES example5_table ON
UPDATE CASCADE ON DELETE NO ACTION,
date DATE NOT NULL,
is_alone BOOLEAN NOT NULL,
price FLOAT NOT NULL,
rate FLOAT NOT NULL,
currency CHAR(3) NOT NULL,
d_quantity FLOAT NOT NULL,
d_price FLOAT NOT NULL,
d_rate FLOAT NOT NULL,
c_quantity FLOAT NOT NULL,
c_price FLOAT NOT NULL,
c_rate FLOAT NOT NULL,

PRIMARY KEY (a_id, s_id, a_number, date, is_alone));

CREATE INDEX example3_table_index ON example3_table
(s_id, date, is_alone);




--- Dennis Gearon <gearond@cvc.net> wrote:
> exactly what are your indexes on this, as well as
> any triggers?
>
> Amir Becher wrote:
> > Here is an example of the misbehavior that I am
> > seeing:
> >
> > A simple table with 3 colums (and approximately a
> > million records):
> > column_a INTEGER PRIMARY KEY
> > column_b DATE NOT NULL
> > column_c DOUBLE NOT NULL

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Re: Corrupt index

From

"scott.marlowe"

Date:

10 April 2003, 19:13:16

On Thu, 10 Apr 2003, Tom Lane wrote:

> Amir Becher <abecher@yahoo.com> writes:
> > I don't know if this may have something to do with it,
> > but we do backup the data every night using VERITAS
> > Backup Exec. We are not restoring anything, though
> > (the data is backed up to tape). The VERITAS software
> > runs on Windows, but there is an agent that runs on
> > our Linux box where the PostgreSQL data is stored. I
> > should also mention that the backup is running while
> > the database is being modified (we modify the database
> > 24/7).
>
> You're wasting your time making such a backup --- if you ever have to
> use it, it'll be corrupt, because the individual files in the database
> won't be in sync.  But that's not the immediate problem.

Is there any chance that the veritas backup agent on linux is somehow
locking the files as root against read or some other such insanity?  If
so, I'd imagine there being problems of some kind.  But I have no idea if
the veritas backup agent can or does do something like that.

Either way, I'd turn off file system backups for that directory, as they
aren't gonna do any good anyway.

I'm thinking with my rusty old windows NT sysadmin hat here, by the way.

Re: Corrupt index

From

Amir Becher

Date:

10 April 2003, 19:32:49

The source of the problem was the VERITAS Backup Exec.
I was able to replicate the index corruption in under
a minute by running a backup and updating the database
at the same time. I have never been able to replicate
the problem before because I was testing during the
day, when the backup was not running.

As far as backups are concerned, we will no longer
backup the data directory itself - that was clearly a
dumb thing to do in the first place. We actually have
been backing up the data using pg_dumpall as well (so
there is still hope for us).

Thanks for all the help - I greatly appreciate it.


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amir Becher <abecher@yahoo.com> writes:
> > I don't know if this may have something to do with
> it,
> > but we do backup the data every night using
> VERITAS
> > Backup Exec. We are not restoring anything, though
> > (the data is backed up to tape). The VERITAS
> software
> > runs on Windows, but there is an agent that runs
> on
> > our Linux box where the PostgreSQL data is stored.
> I
> > should also mention that the backup is running
> while
> > the database is being modified (we modify the
> database
> > 24/7).
>
> You're wasting your time making such a backup --- if
> you ever have to
> use it, it'll be corrupt, because the individual
> files in the database
> won't be in sync.  But that's not the immediate
> problem.
>
> > There is another unexpected behavior that I
> noticed
> > for the first time this morning (so I am not sure
> if
> > it's recurring, related or relevant). The database
> > "blinked" in the sense that all database
> connections
> > were lost - but new connections could be obtained
> > immediately after the "blink". The error message
> that
> > I got said something about possible "corrupted
> shared
> > memory" and I guess the shutting down of the
> > connections was a precautionary measure.
>
> That sounds like a backend crash, all right.  Given
> that, I'm thinking
> that you have more extensive problems than just this
> one symptom.  The
> odds are good that it's a hardware issue, because we
> haven't heard any
> reports of comparable misbehavior from anyone else.
>
> I'd recommend running some hardware diagnostics ---
> memtest86 and
> badblocks seem to be the most widely used, although
> they aren't always
> able to find problems.
>
> It would also be a good idea to start taking some
> *real* backups, using
> pg_dump or pg_dumpall.  You will be lucky if you
> don't find any more
> serious corruption in the database, if I'm right
> that there's hardware
> flakiness involved.  You may find yourself forced to
> initdb and restore
> from a backup, so you'd better have one.
>
>             regards, tom lane


__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

Re: Corrupt index

From

Tom Lane

Date:

10 April 2003, 19:55:52

Amir Becher <abecher@yahoo.com> writes:
> The source of the problem was the VERITAS Backup Exec.
> I was able to replicate the index corruption in under
> a minute by running a backup and updating the database
> at the same time.

Fascinating.  It would be interesting to know exactly what Veritas is
doing that causes the problem.  Do any error messages show up in the
postmaster log while you are doing the above?

I'm still quite concerned about the health of your database, btw.
I can't see any reason to think that whatever interference is going
on would corrupt only indexes.  There may be problems accumulating
in the table files as well :-(

            regards, tom lane

Re: Corrupt index

From

Amir Becher

Date:

10 April 2003, 20:44:13

Unfortuntely, I haven't enabled the postmaster logs,
so I can't shed any more light on this problem.

Thanks for the advice on the health of my database. I
plan to do a pg_dumpall and rebuild it from scratch
this weekend (since it is in production). Hopefully
nothing else is damaged, and at least going forward
the database will be healthy.

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amir Becher <abecher@yahoo.com> writes:
> > The source of the problem was the VERITAS Backup
> Exec.
> > I was able to replicate the index corruption in
> under
> > a minute by running a backup and updating the
> database
> > at the same time.
>
> Fascinating.  It would be interesting to know
> exactly what Veritas is
> doing that causes the problem.  Do any error
> messages show up in the
> postmaster log while you are doing the above?
>
> I'm still quite concerned about the health of your
> database, btw.
> I can't see any reason to think that whatever
> interference is going
> on would corrupt only indexes.  There may be
> problems accumulating
> in the table files as well :-(
>
>             regards, tom lane

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com