Re: [HACKERS] md.c is feeling much better now, thank you - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] md.c is feeling much better now, thank you
Date
Msg-id 9042.936281135@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] md.c is feeling much better now, thank you  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] md.c is feeling much better now, thank you
List pgsql-hackers
I wrote:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> StartTransaction() and CommandCounterIncrement() trigger
>> relation cache invalidation. Unfortunately those are insufficient 
>> to prevent backends from inserting into invalid relations.

> If that's true, then we have problems far worse than whether mdtruncate
> has tried to unlink the segment.

I poked at this a little bit and found that for the VACUUM case,
RelationFlushRelation in the other backend (the one waiting to
insert/update) occurs here:

#0  RelationFlushRelation (relationPtr=0x7b034824,   onlyFlushReferenceCountZero=1 '\001') at relcache.c:1259
#1  0x158a60 in RelationIdInvalidateRelationCacheByRelationId (   relationId=272146) at relcache.c:1368
#2  0x156ba0 in CacheIdInvalidate (cacheId=1259, hashIndex=272146, pointer=0x0)   at inval.c:323
#3  0x11673c in SIReadEntryData (segP=0x80da1000, backendId=-2133183664,   invalFunction=0x4000c692 <SSNAN+9066>,
resetFunction=0x4000c69a<SSNAN+9074>) at sinvaladt.c:649
 
#4  0x115e6c in InvalidateSharedInvalid (invalFunction=0x80da1000,   resetFunction=0x4000c69a <SSNAN+9074>) at
sinval.c:164
#5  0x156e54 in DiscardInvalid () at inval.c:518
#6  0x94354 in AtStart_Cache () at xact.c:548
#7  0x94314 in CommandCounterIncrement () at xact.c:514
#8  0x121218 in pg_exec_query_dest (   query_string=0x40079580 "insert into tenk1 values(19999,1234);",   dest=Remote,
aclOverride=0'\000') at postgres.c:726
 

which looks good ... except that the CommandCounterIncrement()
occurs *after* the insert has executed.  So we've got a problem
here.

In the DROP TABLE scenario, things seem to be broken independently
of md.c.  I tried this:

BACKEND #1:begin;lock table tenk1;
BACKEND #2:insert into tenk1 values(29999,1234);-- backend #2 hangs waiting for lock
BACKEND #1:drop table tenk1;end;

Backend #2 now suffers an assert failure:

#6  0x15b8c4 in ExceptionalCondition (   conditionName=0x28898 "!((((PageHeader) ((PageHeader) pageHeader))->pd_upper
==0))", exceptionP=0x40009a58, detail=0x0, fileName=0x7ae4 "\003",   lineNumber=136) at assert.c:72
 
#7  0x7c470 in RelationPutHeapTupleAtEnd (relation=0x400e8a40,   tuple=0x401127a0) at hio.c:136
#8  0x7aa48 in heap_insert (relation=0x400e8a40, tup=0x401127a0)   at heapam.c:1086
#9  0xb87e4 in ExecAppend (slot=0x4010a078, tupleid=0x200, estate=0x40109e98)   at execMain.c:1190
#10 0xb8630 in ExecutePlan (estate=0x40109e98, plan=0x40109860,   operation=CMD_INSERT, offsetTuples=0, numberTuples=0,
 direction=ForwardScanDirection, destfunc=0x40112730) at execMain.c:1064
 
#11 0xb7b6c in ExecutorRun (queryDesc=0x40109e80, estate=0x40109e98,   feature=3, limoffset=0x0, limcount=0x0) at
execMain.c:329
#12 0x12294c in ProcessQueryDesc (queryDesc=0x40109e80, limoffset=0x0,   limcount=0x0) at pquery.c:315
#13 0x1229f4 in ProcessQuery (parsetree=0x400e42d0, plan=0x40109860,   dest=Local) at pquery.c:358
#14 0x1211dc in pg_exec_query_dest (   query_string=0x40079580 "insert into tenk1 values(29999,1234);",   dest=Remote,
aclOverride=2'\002') at postgres.c:710
 

which hardly looks like it can be blamed on md.c either.

My guess is that we ought to be checking for relcache invalidation
immediately after gaining any lock on the relation.  I don't know where
that should be done, however.

Perhaps we also ought to make RelationFlushRelation do smgrclose()
unconditionally, regardless of the reference-count test.  If the
relation is still in use, that should be OK --- md.c will reopen
the files automatically on the next access.


BTW, it appears that DROP TABLE physically deletes the relation
*immediately*, which means that aborting a transaction that contains
a DROP TABLE does not work.  But we knew that, didn't we?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] SELECT BUG
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] Odd problem with pg_class ...