Thread: Rare corruption of pg_class index

Rare corruption of pg_class index

From

"Greg Sabino Mullane"

Date:

20 December 2006, 14:04:12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I'm encountering some disconcerting problems on a 8.1.3 database.
Very occasionally, I get a "could not open relation with OID xxx".
This always occurs inside of a plpgsql function, and always refers
to a normal, stable table that has not been dropped. The first
time this occured, I reindexed the system indexes, but it has occured
again, and on more than one database, which seems to lessen the chance
of a hardware issue or a temporary index corruption issue. The functions
in question are called many times, but the error only happens once in
a blue moon. The last time it happened, the function ran with no problem
one minute before the error, and again four minutes afterwards. The
table name is hard-coded into the functions. This happens on average
five times a day or so, on a very busy database (> 10M statements/day,
the functions fire tens of thousands of times)

Any clues on what could be causing this, or ways to go forward on
debugging? Thanks.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200612201252
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFiXmzvJuQZxSWSsgRAkr4AKDhTKTILjrVbitWTu4wzktejXD2egCfSk/H
h+vBn6lJegKeRD+tsBFSYng=
=svv7
-----END PGP SIGNATURE-----

Re: Rare corruption of pg_class index

From

Jeff Davis

Date:

20 December 2006, 14:22:16

On Wed, 2006-12-20 at 18:06 +0000, Greg Sabino Mullane wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> I'm encountering some disconcerting problems on a 8.1.3 database.
> Very occasionally, I get a "could not open relation with OID xxx".
> This always occurs inside of a plpgsql function, and always refers
> to a normal, stable table that has not been dropped. The first
> time this occured, I reindexed the system indexes, but it has occured
> again, and on more than one database, which seems to lessen the chance
> of a hardware issue or a temporary index corruption issue. The functions
> in question are called many times, but the error only happens once in
> a blue moon. The last time it happened, the function ran with no problem
> one minute before the error, and again four minutes afterwards. The
> table name is hard-coded into the functions. This happens on average
> five times a day or so, on a very busy database (> 10M statements/day,
> the functions fire tens of thousands of times)
> 

It has to do with PL/pgSQL caching the plans of the statements inside.
If there is some kind of change that causes that plan to become invalid,
that could cause that error.

You can almost certainly fix it by using EXECUTE in the function, which
will not cache the plan, or telling all clients to reconnect when you
make a change that could invalidate the plan.

Regards,Jeff Davis

Re: Rare corruption of pg_class index

From

Alvaro Herrera

Date:

20 December 2006, 14:27:39

Greg Sabino Mullane wrote:

> I'm encountering some disconcerting problems on a 8.1.3 database.
> Very occasionally, I get a "could not open relation with OID xxx".
> This always occurs inside of a plpgsql function, and always refers
> to a normal, stable table that has not been dropped. The first
> time this occured, I reindexed the system indexes, but it has occured
> again, and on more than one database, which seems to lessen the chance
> of a hardware issue or a temporary index corruption issue. The functions
> in question are called many times, but the error only happens once in
> a blue moon. The last time it happened, the function ran with no problem
> one minute before the error, and again four minutes afterwards. The
> table name is hard-coded into the functions. This happens on average
> five times a day or so, on a very busy database (> 10M statements/day,
> the functions fire tens of thousands of times)

Well, if the error "fixed by itself" four minutes after the failure,
then it's not likely to be a corrupted index.  My first guess would be
that there's some kind of race condition on the relcache or the sinval
mechanism.

It would help if you could get a stack trace at the moment of the
problem, but I'm not sure how to do that.  Maybe cause the backend to
send a SIGSTOP to itself and then have an external program to quickly
launch GDB on it and get a backtrace, then send a SIGCONT.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Rare corruption of pg_class index

From

Tom Lane

Date:

20 December 2006, 14:56:02

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Greg Sabino Mullane wrote:
>> I'm encountering some disconcerting problems on a 8.1.3 database.
>> Very occasionally, I get a "could not open relation with OID xxx".

Does the mentioned OID actually correspond to the OID of the table it's
supposed to be opening, or is it wrong?  Is anything being done to
the table schema in parallel?

If the table is occasionally dropped and recreated, there's a known race
condition that could cause it: we lookup the table name in pg_class to
get the OID, then lock the relation by OID, then try to finish opening
the relation.  By the time we obtain lock the original rel could be gone
and the name now refers to some other OID, but we'll fail because the
old OID is no longer anywhere to be found.  I think this is fixed in 8.2.

If the table is perfectly static then another explanation is needed ...

> It would help if you could get a stack trace at the moment of the
> problem, but I'm not sure how to do that.

Perhaps insert an abort() call right before the elog(ERROR)
that's reporting this.  (I think there are three possibilities,
but they're all in heapam.c so you might as well just hack them all.)
        regards, tom lane

Re: Rare corruption of pg_class index

From

"Greg Sabino Mullane"

Date:

20 December 2006, 15:15:54

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(Jeff Davis: I've not changed the function, so it's not the plan cache)

Tom Lane wrote:
> Does the mentioned OID actually correspond to the OID of the table it's
> supposed to be opening, or is it wrong?  Is anything being done to
> the table schema in parallel?

Yes, it is the correct OID. No, nothing done to the schema in parallel,
although there is a process that disables/re-enables triggers and rules
on that table via pg_class tweaking (inside a txn, of course).

> If the table is occasionally dropped and recreated...

Yeah, that's the first thing I thought of, but it's definitely not
being dropped and recreated.

Alvaro Herrera wrote:
>> It would help if you could get a stack trace at the moment of the
>> problem, but I'm not sure how to do that.
>
> Perhaps insert an abort() call right before the elog(ERROR)
> that's reporting this.  (I think there are three possibilities,
> but they're all in heapam.c so you might as well just hack them all.)

Argh, that will have to be a last resort measure, as this is a
production system. Have not been able to duplicate yet on a dev
box, but will look into adding the abort() for when/if I can
duplicate it there. Thanks everyone.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200612201412
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFFiYtwvJuQZxSWSsgRArEbAJ9udj9/Kh2Vi45A8ej3YCC2RKESwgCcCkT8
5ZnN0+yCU0rQ6+PuHtJRtnw=
=C1gB
-----END PGP SIGNATURE-----

Re: Rare corruption of pg_class index

From

Alvaro Herrera

Date:

20 December 2006, 15:19:42

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Greg Sabino Mullane wrote:

> > It would help if you could get a stack trace at the moment of the
> > problem, but I'm not sure how to do that.
> 
> Perhaps insert an abort() call right before the elog(ERROR)
> that's reporting this.

Yeah, but doing this in a production environment is not likely to go
very far ...

> (I think there are three possibilities,
> but they're all in heapam.c so you might as well just hack them all.)

Maybe it would be good to run under log_error_verbosity='verbose' for a
while to discover whether it's always the same one.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Rare corruption of pg_class index

From

Tom Lane

Date:

26 December 2006, 19:57:17

"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Tom Lane wrote:
>> Does the mentioned OID actually correspond to the OID of the table it's
>> supposed to be opening, or is it wrong?  Is anything being done to
>> the table schema in parallel?

> Yes, it is the correct OID. No, nothing done to the schema in parallel,
> although there is a process that disables/re-enables triggers and rules
> on that table via pg_class tweaking (inside a txn, of course).

Oh!  Duh, that's your issue right there, I'll bet.  The problem is that
relcache-open tries to read the pg_class row under SnapshotNow rules,
and if there is another xact concurrently modifying the row, it is
entirely possible for none of the row versions to be committed good at
the instant they are visited.  (The new row version either isn't seen at
all or isn't committed good yet when it's visited, and later when the
old row version is visited, it has become committed dead.)  This results
in ScanPgRelation failing (returning NULL) which leads to exactly the 
"could not open relation with OID xxx" symptom --- and in fact I see no
other code path that yields that failure.

As of 8.2 we have this problem fixed for system-initiated changes to the
pg_class row, but you're still going to be at risk if you are doing
manual "UPDATE pg_class" operations.  Can you get away from needing to
do that?  ALTER TABLE DISABLE TRIGGER might help, but we haven't got
anything like ALTER TABLE DISABLE RULE.  In any case the important point
is that you have to take AccessExclusive lock on a relation whose
pg_class row you would like to change, and you need to be on 8.2 because
prior releases weren't careful about obtaining lock *before* reading the
row.
        regards, tom lane

Re: Rare corruption of pg_class index

From

"Greg Sabino Mullane"

Date:

02 January 2007, 14:33:46

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Tom Lane wrote:
> Oh!  Duh, that's your issue right there, I'll bet.  The problem is that
> relcache-open tries to read the pg_class row under SnapshotNow rules,
> and if there is another xact concurrently modifying the row, it is
> entirely possible for none of the row versions to be committed good at
> the instant they are visited.  (The new row version either isn't seen at
> all or isn't committed good yet when it's visited, and later when the
> old row version is visited, it has become committed dead.)  This results
> in ScanPgRelation failing (returning NULL) which leads to exactly the
> "could not open relation with OID xxx" symptom --- and in fact I see no
> other code path that yields that failure.

Doesn't this violate ACID, or am I misunderstanding something? (FWIW, I'm
using a serializable isolation level for the process that changes pg_class)

> As of 8.2 we have this problem fixed for system-initiated changes to the
> pg_class row, but you're still going to be at risk if you are doing
> manual "UPDATE pg_class" operations.  Can you get away from needing to
> do that?  ALTER TABLE DISABLE TRIGGER might help, but we haven't got
> anything like ALTER TABLE DISABLE RULE.

Yeah, triggers alone won't do it. Sounds like a TODO item - Bruce?

> In any case the important point is that you have to take AccessExclusive
> lock on a relation whose pg_class row you would like to change, and you
> need to be on 8.2 because prior releases weren't careful about obtaining
> lock *before* reading the row.

Obtaining an AccessExclusive lock is a last resort, as the tables in question
are very busy. That's another reason why DISABLE TRIGGER might not work out
either.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200701021325
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8





-----BEGIN PGP SIGNATURE-----

iD8DBQFFmqU9vJuQZxSWSsgRAgZ4AJ4wBUI6APz658zaE8bFQ5xmILFiugCgxfsW
GI2zgdF6l/tmxWpnO4J9dms=
=NtIn
-----END PGP SIGNATURE-----

Re: Rare corruption of pg_class index

From

Tom Lane

Date:

02 January 2007, 15:05:55

"Greg Sabino Mullane" <greg@turnstep.com> writes:
> Tom Lane wrote:
>> Oh!  Duh, that's your issue right there, I'll bet.  The problem is that
>> relcache-open tries to read the pg_class row under SnapshotNow rules,
>> and if there is another xact concurrently modifying the row, it is
>> entirely possible for none of the row versions to be committed good at
>> the instant they are visited.

> Doesn't this violate ACID, or am I misunderstanding something?

No, it's not a violation of ACID.  In this case what you are doing is
altering a table's schema without a sufficiently strong lock on the
table, and that's a no-no, whether you would like it to be or not.

>> In any case the important point is that you have to take AccessExclusive
>> lock on a relation whose pg_class row you would like to change, and you
>> need to be on 8.2 because prior releases weren't careful about obtaining
>> lock *before* reading the row.

> Obtaining an AccessExclusive lock is a last resort, as the tables in question
> are very busy. That's another reason why DISABLE TRIGGER might not work out
> either.

Well, ENABLE/DISABLE TRIGGER can't conceivably operate correctly without
locking out writes, because it wouldn't be clear whether any particular
write operation should fire the trigger or not.  A hypothetical
ENABLE/DISABLE RULE would be worse: AFAICS it'd have to lock out reads
too, else it wouldn't be clear whether SELECTs should notice an ON
SELECT rule.  You can't get around those restrictions by trying to
implement the enable/disable yourself via UPDATE pg_class; as you've
found out, it just doesn't work.
        regards, tom lane

Re: Rare corruption of pg_class index

From

"Greg Sabino Mullane"

Date:

03 January 2007, 11:40:38

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Tom Lane wrote:
> No, it's not a violation of ACID.  In this case what you are doing is
> altering a table's schema without a sufficiently strong lock on the
> table, and that's a no-no, whether you would like it to be or not.

So as a general rule, the system tables should be considered a special
case as far as transactional activity? To be more precise, you are saying
that a system table must be locked in access exclusive mode before any
change is made to guarantee no problems occur?

> Well, ENABLE/DISABLE TRIGGER can't conceivably operate correctly without
> locking out writes, because it wouldn't be clear whether any particular
> write operation should fire the trigger or not.  A hypothetical
> ENABLE/DISABLE RULE would be worse: AFAICS it'd have to lock out reads
> too, else it wouldn't be clear whether SELECTs should notice an ON
> SELECT rule.  You can't get around those restrictions by trying to
> implement the enable/disable yourself via UPDATE pg_class; as you've
> found out, it just doesn't work.

So the oft-given advice of "UPDATE pg_class SET relhasrules = false"
is actually completely unsafe unless the entire referenced table is
completely locked, and unless you are using at least 8.2?

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200701031037
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFm83kvJuQZxSWSsgRAvhiAJ0TEgzg3mkZ6BKvlR7jHKY96JVmmACgxrYb
pFY851yrZd5bh7GHaR+k0Q4=
=FgL+
-----END PGP SIGNATURE-----

Re: Rare corruption of pg_class index

From

Tom Lane

Date:

03 January 2007, 11:50:29

"Greg Sabino Mullane" <greg@turnstep.com> writes:
> So as a general rule, the system tables should be considered a special
> case as far as transactional activity? To be more precise, you are saying
> that a system table must be locked in access exclusive mode before any
> change is made to guarantee no problems occur?

No, I didn't say that --- I said that you need to lock the table whose
schema you're trying to modify, to ensure that no one else is in the
midst of accessing it using the old schema info.

> So the oft-given advice of "UPDATE pg_class SET relhasrules = false"
> is actually completely unsafe unless the entire referenced table is
> completely locked, and unless you are using at least 8.2?

I don't recall having ever given *that* advice to anyone.  But yes,
it's unsafe if there might be concurrent access to that table.  The
only context I've ever seen people use this sort of thing in is
pg_restore --disable-triggers, and in that situation I think there's
an implicit assumption that no one else is busy modifying the table
you're restoring into.
        regards, tom lane

Re: Rare corruption of pg_class index

From

"Greg Sabino Mullane"

Date:

04 January 2007, 18:09:12

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Tom Lane replied:
>> So as a general rule, the system tables should be considered a special
>> case as far as transactional activity? To be more precise, you are saying
>> that a system table must be locked in access exclusive mode before any
>> change is made to guarantee no problems occur?

> No, I didn't say that --- I said that you need to lock the table whose
> schema you're trying to modify, to ensure that no one else is in the
> midst of accessing it using the old schema info.

Sorry, I did mean the target table.

>> So the oft-given advice of "UPDATE pg_class SET relhasrules = false"
>> is actually completely unsafe unless the entire referenced table is
>> completely locked, and unless you are using at least 8.2?

> I don't recall having ever given *that* advice to anyone.  But yes,
> it's unsafe if there might be concurrent access to that table.  The
> only context I've ever seen people use this sort of thing in is
> pg_restore --disable-triggers, and in that situation I think there's
> an implicit assumption that no one else is busy modifying the table
> you're restoring into.

Not, not your advice, and perhaps not as common as SET reltriggers, but
still invaluable for things like bulk loading. Thanks for the responses,
I think I've finally got my head around the problem. At the very least,
I've discovered another good reason to push production sites to use
8.2.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200701041708
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFnXrivJuQZxSWSsgRAlSUAJ9xSg6NBO984pBT0Ea2fAnhFdfX/QCcDXKj
2j4m8MdNCnLX5iVXz4D8AAI=
=NkGM
-----END PGP SIGNATURE-----