Re: [BUGS] BUG #5412: test case produced, possible race condition. - Mailing list pgsql-hackers

From Rusty Conover
Subject Re: [BUGS] BUG #5412: test case produced, possible race condition.
Date
Msg-id C864BEAF-B22B-4D09-A1F4-2642CB26E45B@infogears.com
Whole thread Raw
In response to Re: [BUGS] BUG #5412: test case produced, possible race condition.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUGS] BUG #5412: test case produced, possible race condition.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On Apr 14, 2010, at 3:30 PM, Tom Lane wrote:

Rusty Conover <rconover@infogears.com> writes:
On Apr 14, 2010, at 2:31 PM, Tom Lane wrote:
There is another slightly odd thing here, which is that the stack trace
Rusty provided clearly shows the crash occurring during processing of a
local relcache invalidation message for the truncated relation.

The test case program was the smallest thing I could write to reproduce the problem consistently on my machine, but I couldn't reproduce it consistently on other machines and architectures.  I'm glad Heikki was able to also see the crash on his hardware.  I can take Heikki's patch back out and get a new stack trace from the test program if that would be useful to you.

Yeah, I'm curious to know if the stack trace is the same for crashes in
the watered-down test case.  AFAICS, relcache invals triggered by the
TRUNCATE itself ought to be serviced before we get out of the TRUNCATE;
and after that, the xact is holding exclusive lock on the table so
there's no way that we should get a remote inval on the specific relation
(not to mention that the trace is clearly about a local inval not an
incoming remote inval).  So I don't understand why the trace says that
it's happening during a subsequent INSERT.

Mind you, Heikki's patch looks like a good idea in any case; but I'm
not sure that there isn't something else going on here.

regards, tom lane

Hi Tom,

Here is a backtrace from the test case script (using 8.4.3 as released):

#0  equalTupleDescs (tupdesc1=0x7f7f7f7f, tupdesc2=0x966c3b80) at tupdesc.c:311
311 if (tupdesc1->natts != tupdesc2->natts)
(gdb) bt
#0  equalTupleDescs (tupdesc1=0x7f7f7f7f, tupdesc2=0x966c3b80) at tupdesc.c:311
#1  0x0832451b in RelationClearRelation (relation=0x966c7998, rebuild=<value optimized out>) at relcache.c:1901
#2  0x08324b9f in RelationFlushRelation () at relcache.c:1991
#3  RelationCacheInvalidateEntry (relationId=7217555) at relcache.c:2042
#4  0x0831dd89 in LocalExecuteInvalidationMessage (msg=0x89e2b50) at inval.c:510
#5  0x0831d341 in ProcessInvalidationMessages (hdr=0x89d20f4, func=0x831dc50 <LocalExecuteInvalidationMessage>)
    at inval.c:397
#6  0x0831d3dc in CommandEndInvalidationMessages () at inval.c:1006
#7  0x080c9035 in AtCommit_LocalCache () at xact.c:1009
#8  CommandCounterIncrement () at xact.c:634
#9  0x0826dcc9 in finish_xact_command () at postgres.c:2363
#10 0x0826ed4d in exec_simple_query (
    query_string=0x89b9904 "insert into foo_dest (id, name, size) select id, name, size from foobar_source")
    at postgres.c:1022
#11 0x0827042c in PostgresMain (argc=4, argv=0x894d5c8, username=0x894d590 "postgres") at postgres.c:3614
#12 0x0823a4d3 in BackendRun () at postmaster.c:3449
#13 BackendStartup () at postmaster.c:3063
#14 ServerLoop () at postmaster.c:1387
#15 0x0823b503 in PostmasterMain (argc=4, argv=0x894a678) at postmaster.c:1040
#16 0x081dc0a6 in main (argc=4, argv=0x894a678) at main.c:188
(gdb) 

Hope that helps,

Rusty

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: gist README typo
Next
From: Greg Smith
Date:
Subject: Re: shared_buffers documentation