Re: error: could not find pg_class tuple for index 2662 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: error: could not find pg_class tuple for index 2662 |
Date | |
Msg-id | 10842.1313096945@sss.pgh.pa.us Whole thread Raw |
In response to | Re: error: could not find pg_class tuple for index 2662 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: error: could not find pg_class tuple for index 2662
|
List | pgsql-hackers |
I wrote: > I still haven't reproduced the behavior here, but I think I see what > must be happening: we are getting an sinval reset while attempting to > open pg_class_oid_index. After a number of false starts, I've managed to reproduce this behavior locally. The above theory turns out to be wrong, or at least incomplete. In order to be opening pg_class_oid_index, we must already have opened and locked pg_class, and we would have absorbed the relmapping update for pg_class when we did that, so an sinval reset during a subsequent relation open is too late to provoke the bug. Rather, the key to the problem is that the sinval reset has to happen *instead of* seeing the plain relcache inval on pg_class that was emitted by the "VACUUM FULL pg_class" command. In this case, we enter RelationCacheInvalidate with a stale value for pg_class's relmapping, and since that routine is not sufficiently careful about the order in which it revalidates the nailed-relation cache entries, we may try to read something from pg_class before we've updated pg_class's relmapping. So the reason Dave is seeing the problem a lot must be that he has very high sinval traffic, leading to lots of resets, and that increases the probability of a reset replacing just the wrong sinval message. I can reproduce the problem fairly conveniently with this crude hack: diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c index 8499615..5ad2aee 100644 *** a/src/backend/storage/ipc/sinval.c --- b/src/backend/storage/ipc/sinval.c *************** ReceiveSharedInvalidMessages( *** 106,112 **** /* Try to get some more messages */ getResult = SIGetDataEntries(messages, MAXINVALMSGS); ! if (getResult < 0) { /* got a reset message */ elog(DEBUG4, "cache state reset"); --- 106,112 ---- /* Try to get some more messages */ getResult = SIGetDataEntries(messages, MAXINVALMSGS); ! if (getResult != 0) { /* got a reset message */ elog(DEBUG4, "cache state reset"); which forces every occurrence of an incoming sinval message to be treated as a reset. The serial regression tests still work with that, but they fall over almost immediately if you run something like this in parallel: while psql -c "vacuum full pg_class" regression; do usleep 100000; done and the parallel regression tests tend to fall over without any outside help because of the one occurrence of "vacuum full pg_class" in them. I'm inclined to think that a robust solution requires both of the ideas I proposed last week: we should update all the relmapping entries during pass 1 of RelationCacheInvalidate, *and* make pass 2 do things in a more robust order. I think pg_class, pg_class_oid_index, other nailed relations, and then everything else ought to do it. Anyway, that's easily fixed now that we have a reproduceable case. What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack, which was meant to expose exactly this sort of problem, failed to do so --- buildfarm member jaguar has been running with that flag for ages and never showed this problem. I'm thinking that we should take out the hack in AcceptInvalidationMessages and instead put in #ifdeffed code that causes ReceiveSharedInvalidMessages to forcibly always call the reset function. Any thoughts about that? regards, tom lane
pgsql-hackers by date: