Re: error: could not find pg_class tuple for index 2662 - Mailing list pgsql-hackers
| From | Tom Lane | 
|---|---|
| Subject | Re: error: could not find pg_class tuple for index 2662 | 
| Date | |
| Msg-id | 10842.1313096945@sss.pgh.pa.us Whole thread Raw | 
| In response to | Re: error: could not find pg_class tuple for index 2662 (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Responses | Re: error: could not find pg_class tuple for index 2662 | 
| List | pgsql-hackers | 
I wrote:
> I still haven't reproduced the behavior here, but I think I see what
> must be happening: we are getting an sinval reset while attempting to
> open pg_class_oid_index.
After a number of false starts, I've managed to reproduce this behavior
locally.  The above theory turns out to be wrong, or at least
incomplete.  In order to be opening pg_class_oid_index, we must already
have opened and locked pg_class, and we would have absorbed the
relmapping update for pg_class when we did that, so an sinval reset
during a subsequent relation open is too late to provoke the bug.
Rather, the key to the problem is that the sinval reset has to happen
*instead of* seeing the plain relcache inval on pg_class that was
emitted by the "VACUUM FULL pg_class" command.  In this case, we enter
RelationCacheInvalidate with a stale value for pg_class's relmapping,
and since that routine is not sufficiently careful about the order in
which it revalidates the nailed-relation cache entries, we may try to
read something from pg_class before we've updated pg_class's relmapping.
So the reason Dave is seeing the problem a lot must be that he has very
high sinval traffic, leading to lots of resets, and that increases the
probability of a reset replacing just the wrong sinval message.
I can reproduce the problem fairly conveniently with this crude hack:
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index 8499615..5ad2aee 100644
*** a/src/backend/storage/ipc/sinval.c
--- b/src/backend/storage/ipc/sinval.c
*************** ReceiveSharedInvalidMessages(
*** 106,112 ****       /* Try to get some more messages */       getResult = SIGetDataEntries(messages, MAXINVALMSGS);
!       if (getResult < 0)       {           /* got a reset message */           elog(DEBUG4, "cache state reset");
--- 106,112 ----       /* Try to get some more messages */       getResult = SIGetDataEntries(messages, MAXINVALMSGS);
!       if (getResult != 0)       {           /* got a reset message */           elog(DEBUG4, "cache state reset");
which forces every occurrence of an incoming sinval message to be treated
as a reset.  The serial regression tests still work with that, but they
fall over almost immediately if you run something like this in parallel:
while psql -c "vacuum full pg_class" regression; do usleep 100000; done
and the parallel regression tests tend to fall over without any outside
help because of the one occurrence of "vacuum full pg_class" in them.
I'm inclined to think that a robust solution requires both of the ideas
I proposed last week: we should update all the relmapping entries during
pass 1 of RelationCacheInvalidate, *and* make pass 2 do things in a more
robust order.  I think pg_class, pg_class_oid_index, other nailed
relations, and then everything else ought to do it.
Anyway, that's easily fixed now that we have a reproduceable case.
What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack,
which was meant to expose exactly this sort of problem, failed to do so
--- buildfarm member jaguar has been running with that flag for ages and
never showed this problem.  I'm thinking that we should take out the
hack in AcceptInvalidationMessages and instead put in #ifdeffed code
that causes ReceiveSharedInvalidMessages to forcibly always call the
reset function.  Any thoughts about that?
        regards, tom lane
		
	pgsql-hackers by date: