Re: BUG #5412: test case produced, possible race condition. - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #5412: test case produced, possible race condition.
Date
Msg-id 25124.1271259136@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #5412: test case produced, possible race condition.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> Why would this patch fix anything?  It doesn't change the lock status.

I have not been able to reproduce the crash using Rusty's script on my
own machine, but after contemplating his stack trace for awhile I have a
theory about what is happening.  I think that while we are building a
new relation entry (the RelationBuildDesc call in RelationClearRelation)
for a locally-created relation, we receive an sinval reset event caused
by sinval queue overflow.  (That could only happen with a lot of
concurrent catalog update activity, which is why there's a significant
number of concurrent "job1" clients needed to provoke the problem.)
The sinval reset will be serviced by RelationCacheInvalidate, which will
blow away any relcache entries with refcount zero, including the one
that the outer instance of RelationClearRelation is trying to rebuild.
So when control returns the next thing that happens is we try to do the
equalTupleDescs() comparison against a trashed pointer, as seen in the
stack trace.

This behavior is new in 8.4.3; before that RelationClearRelation
temporarily unhooked the target rel from the relcache hash table,
so it wouldn't be found by RelationCacheInvalidate.  So that explains
why Rusty's app worked before.

In short, then, Heikki's fix is good, although it desperately needs
some comment updates: there's effectively an API change happening here,
because RelationClearRelation's contract with its caller is not the
same as before.  I'll clean it up a bit and apply.  It will need to
go into all the branches this patch went into:
http://archives.postgresql.org/pgsql-committers/2010-01/msg00186.php

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #5412: test case produced, possible race condition.
Next
From: "cool"
Date:
Subject: BUG #5420: pg_attribute broken