Thread: Rare corruption of pg_class index
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm encountering some disconcerting problems on a 8.1.3 database. Very occasionally, I get a "could not open relation with OID xxx". This always occurs inside of a plpgsql function, and always refers to a normal, stable table that has not been dropped. The first time this occured, I reindexed the system indexes, but it has occured again, and on more than one database, which seems to lessen the chance of a hardware issue or a temporary index corruption issue. The functions in question are called many times, but the error only happens once in a blue moon. The last time it happened, the function ran with no problem one minute before the error, and again four minutes afterwards. The table name is hard-coded into the functions. This happens on average five times a day or so, on a very busy database (> 10M statements/day, the functions fire tens of thousands of times) Any clues on what could be causing this, or ways to go forward on debugging? Thanks. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200612201252 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFiXmzvJuQZxSWSsgRAkr4AKDhTKTILjrVbitWTu4wzktejXD2egCfSk/H h+vBn6lJegKeRD+tsBFSYng= =svv7 -----END PGP SIGNATURE-----
On Wed, 2006-12-20 at 18:06 +0000, Greg Sabino Mullane wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > I'm encountering some disconcerting problems on a 8.1.3 database. > Very occasionally, I get a "could not open relation with OID xxx". > This always occurs inside of a plpgsql function, and always refers > to a normal, stable table that has not been dropped. The first > time this occured, I reindexed the system indexes, but it has occured > again, and on more than one database, which seems to lessen the chance > of a hardware issue or a temporary index corruption issue. The functions > in question are called many times, but the error only happens once in > a blue moon. The last time it happened, the function ran with no problem > one minute before the error, and again four minutes afterwards. The > table name is hard-coded into the functions. This happens on average > five times a day or so, on a very busy database (> 10M statements/day, > the functions fire tens of thousands of times) > It has to do with PL/pgSQL caching the plans of the statements inside. If there is some kind of change that causes that plan to become invalid, that could cause that error. You can almost certainly fix it by using EXECUTE in the function, which will not cache the plan, or telling all clients to reconnect when you make a change that could invalidate the plan. Regards,Jeff Davis
Greg Sabino Mullane wrote: > I'm encountering some disconcerting problems on a 8.1.3 database. > Very occasionally, I get a "could not open relation with OID xxx". > This always occurs inside of a plpgsql function, and always refers > to a normal, stable table that has not been dropped. The first > time this occured, I reindexed the system indexes, but it has occured > again, and on more than one database, which seems to lessen the chance > of a hardware issue or a temporary index corruption issue. The functions > in question are called many times, but the error only happens once in > a blue moon. The last time it happened, the function ran with no problem > one minute before the error, and again four minutes afterwards. The > table name is hard-coded into the functions. This happens on average > five times a day or so, on a very busy database (> 10M statements/day, > the functions fire tens of thousands of times) Well, if the error "fixed by itself" four minutes after the failure, then it's not likely to be a corrupted index. My first guess would be that there's some kind of race condition on the relcache or the sinval mechanism. It would help if you could get a stack trace at the moment of the problem, but I'm not sure how to do that. Maybe cause the backend to send a SIGSTOP to itself and then have an external program to quickly launch GDB on it and get a backtrace, then send a SIGCONT. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Greg Sabino Mullane wrote: >> I'm encountering some disconcerting problems on a 8.1.3 database. >> Very occasionally, I get a "could not open relation with OID xxx". Does the mentioned OID actually correspond to the OID of the table it's supposed to be opening, or is it wrong? Is anything being done to the table schema in parallel? If the table is occasionally dropped and recreated, there's a known race condition that could cause it: we lookup the table name in pg_class to get the OID, then lock the relation by OID, then try to finish opening the relation. By the time we obtain lock the original rel could be gone and the name now refers to some other OID, but we'll fail because the old OID is no longer anywhere to be found. I think this is fixed in 8.2. If the table is perfectly static then another explanation is needed ... > It would help if you could get a stack trace at the moment of the > problem, but I'm not sure how to do that. Perhaps insert an abort() call right before the elog(ERROR) that's reporting this. (I think there are three possibilities, but they're all in heapam.c so you might as well just hack them all.) regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 (Jeff Davis: I've not changed the function, so it's not the plan cache) Tom Lane wrote: > Does the mentioned OID actually correspond to the OID of the table it's > supposed to be opening, or is it wrong? Is anything being done to > the table schema in parallel? Yes, it is the correct OID. No, nothing done to the schema in parallel, although there is a process that disables/re-enables triggers and rules on that table via pg_class tweaking (inside a txn, of course). > If the table is occasionally dropped and recreated... Yeah, that's the first thing I thought of, but it's definitely not being dropped and recreated. Alvaro Herrera wrote: >> It would help if you could get a stack trace at the moment of the >> problem, but I'm not sure how to do that. > > Perhaps insert an abort() call right before the elog(ERROR) > that's reporting this. (I think there are three possibilities, > but they're all in heapam.c so you might as well just hack them all.) Argh, that will have to be a last resort measure, as this is a production system. Have not been able to duplicate yet on a dev box, but will look into adding the abort() for when/if I can duplicate it there. Thanks everyone. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200612201412 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFiYtwvJuQZxSWSsgRArEbAJ9udj9/Kh2Vi45A8ej3YCC2RKESwgCcCkT8 5ZnN0+yCU0rQ6+PuHtJRtnw= =C1gB -----END PGP SIGNATURE-----
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Greg Sabino Mullane wrote: > > It would help if you could get a stack trace at the moment of the > > problem, but I'm not sure how to do that. > > Perhaps insert an abort() call right before the elog(ERROR) > that's reporting this. Yeah, but doing this in a production environment is not likely to go very far ... > (I think there are three possibilities, > but they're all in heapam.c so you might as well just hack them all.) Maybe it would be good to run under log_error_verbosity='verbose' for a while to discover whether it's always the same one. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Greg Sabino Mullane" <greg@turnstep.com> writes: > Tom Lane wrote: >> Does the mentioned OID actually correspond to the OID of the table it's >> supposed to be opening, or is it wrong? Is anything being done to >> the table schema in parallel? > Yes, it is the correct OID. No, nothing done to the schema in parallel, > although there is a process that disables/re-enables triggers and rules > on that table via pg_class tweaking (inside a txn, of course). Oh! Duh, that's your issue right there, I'll bet. The problem is that relcache-open tries to read the pg_class row under SnapshotNow rules, and if there is another xact concurrently modifying the row, it is entirely possible for none of the row versions to be committed good at the instant they are visited. (The new row version either isn't seen at all or isn't committed good yet when it's visited, and later when the old row version is visited, it has become committed dead.) This results in ScanPgRelation failing (returning NULL) which leads to exactly the "could not open relation with OID xxx" symptom --- and in fact I see no other code path that yields that failure. As of 8.2 we have this problem fixed for system-initiated changes to the pg_class row, but you're still going to be at risk if you are doing manual "UPDATE pg_class" operations. Can you get away from needing to do that? ALTER TABLE DISABLE TRIGGER might help, but we haven't got anything like ALTER TABLE DISABLE RULE. In any case the important point is that you have to take AccessExclusive lock on a relation whose pg_class row you would like to change, and you need to be on 8.2 because prior releases weren't careful about obtaining lock *before* reading the row. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > Oh! Duh, that's your issue right there, I'll bet. The problem is that > relcache-open tries to read the pg_class row under SnapshotNow rules, > and if there is another xact concurrently modifying the row, it is > entirely possible for none of the row versions to be committed good at > the instant they are visited. (The new row version either isn't seen at > all or isn't committed good yet when it's visited, and later when the > old row version is visited, it has become committed dead.) This results > in ScanPgRelation failing (returning NULL) which leads to exactly the > "could not open relation with OID xxx" symptom --- and in fact I see no > other code path that yields that failure. Doesn't this violate ACID, or am I misunderstanding something? (FWIW, I'm using a serializable isolation level for the process that changes pg_class) > As of 8.2 we have this problem fixed for system-initiated changes to the > pg_class row, but you're still going to be at risk if you are doing > manual "UPDATE pg_class" operations. Can you get away from needing to > do that? ALTER TABLE DISABLE TRIGGER might help, but we haven't got > anything like ALTER TABLE DISABLE RULE. Yeah, triggers alone won't do it. Sounds like a TODO item - Bruce? > In any case the important point is that you have to take AccessExclusive > lock on a relation whose pg_class row you would like to change, and you > need to be on 8.2 because prior releases weren't careful about obtaining > lock *before* reading the row. Obtaining an AccessExclusive lock is a last resort, as the tables in question are very busy. That's another reason why DISABLE TRIGGER might not work out either. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200701021325 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFmqU9vJuQZxSWSsgRAgZ4AJ4wBUI6APz658zaE8bFQ5xmILFiugCgxfsW GI2zgdF6l/tmxWpnO4J9dms= =NtIn -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > Tom Lane wrote: >> Oh! Duh, that's your issue right there, I'll bet. The problem is that >> relcache-open tries to read the pg_class row under SnapshotNow rules, >> and if there is another xact concurrently modifying the row, it is >> entirely possible for none of the row versions to be committed good at >> the instant they are visited. > Doesn't this violate ACID, or am I misunderstanding something? No, it's not a violation of ACID. In this case what you are doing is altering a table's schema without a sufficiently strong lock on the table, and that's a no-no, whether you would like it to be or not. >> In any case the important point is that you have to take AccessExclusive >> lock on a relation whose pg_class row you would like to change, and you >> need to be on 8.2 because prior releases weren't careful about obtaining >> lock *before* reading the row. > Obtaining an AccessExclusive lock is a last resort, as the tables in question > are very busy. That's another reason why DISABLE TRIGGER might not work out > either. Well, ENABLE/DISABLE TRIGGER can't conceivably operate correctly without locking out writes, because it wouldn't be clear whether any particular write operation should fire the trigger or not. A hypothetical ENABLE/DISABLE RULE would be worse: AFAICS it'd have to lock out reads too, else it wouldn't be clear whether SELECTs should notice an ON SELECT rule. You can't get around those restrictions by trying to implement the enable/disable yourself via UPDATE pg_class; as you've found out, it just doesn't work. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > No, it's not a violation of ACID. In this case what you are doing is > altering a table's schema without a sufficiently strong lock on the > table, and that's a no-no, whether you would like it to be or not. So as a general rule, the system tables should be considered a special case as far as transactional activity? To be more precise, you are saying that a system table must be locked in access exclusive mode before any change is made to guarantee no problems occur? > Well, ENABLE/DISABLE TRIGGER can't conceivably operate correctly without > locking out writes, because it wouldn't be clear whether any particular > write operation should fire the trigger or not. A hypothetical > ENABLE/DISABLE RULE would be worse: AFAICS it'd have to lock out reads > too, else it wouldn't be clear whether SELECTs should notice an ON > SELECT rule. You can't get around those restrictions by trying to > implement the enable/disable yourself via UPDATE pg_class; as you've > found out, it just doesn't work. So the oft-given advice of "UPDATE pg_class SET relhasrules = false" is actually completely unsafe unless the entire referenced table is completely locked, and unless you are using at least 8.2? - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200701031037 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFm83kvJuQZxSWSsgRAvhiAJ0TEgzg3mkZ6BKvlR7jHKY96JVmmACgxrYb pFY851yrZd5bh7GHaR+k0Q4= =FgL+ -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: > So as a general rule, the system tables should be considered a special > case as far as transactional activity? To be more precise, you are saying > that a system table must be locked in access exclusive mode before any > change is made to guarantee no problems occur? No, I didn't say that --- I said that you need to lock the table whose schema you're trying to modify, to ensure that no one else is in the midst of accessing it using the old schema info. > So the oft-given advice of "UPDATE pg_class SET relhasrules = false" > is actually completely unsafe unless the entire referenced table is > completely locked, and unless you are using at least 8.2? I don't recall having ever given *that* advice to anyone. But yes, it's unsafe if there might be concurrent access to that table. The only context I've ever seen people use this sort of thing in is pg_restore --disable-triggers, and in that situation I think there's an implicit assumption that no one else is busy modifying the table you're restoring into. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane replied: >> So as a general rule, the system tables should be considered a special >> case as far as transactional activity? To be more precise, you are saying >> that a system table must be locked in access exclusive mode before any >> change is made to guarantee no problems occur? > No, I didn't say that --- I said that you need to lock the table whose > schema you're trying to modify, to ensure that no one else is in the > midst of accessing it using the old schema info. Sorry, I did mean the target table. >> So the oft-given advice of "UPDATE pg_class SET relhasrules = false" >> is actually completely unsafe unless the entire referenced table is >> completely locked, and unless you are using at least 8.2? > I don't recall having ever given *that* advice to anyone. But yes, > it's unsafe if there might be concurrent access to that table. The > only context I've ever seen people use this sort of thing in is > pg_restore --disable-triggers, and in that situation I think there's > an implicit assumption that no one else is busy modifying the table > you're restoring into. Not, not your advice, and perhaps not as common as SET reltriggers, but still invaluable for things like bulk loading. Thanks for the responses, I think I've finally got my head around the problem. At the very least, I've discovered another good reason to push production sites to use 8.2. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200701041708 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iD8DBQFFnXrivJuQZxSWSsgRAlSUAJ9xSg6NBO984pBT0Ea2fAnhFdfX/QCcDXKj 2j4m8MdNCnLX5iVXz4D8AAI= =NkGM -----END PGP SIGNATURE-----