AbortOutOfAnyTransaction is a few bricks shy of a load - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | AbortOutOfAnyTransaction is a few bricks shy of a load |
Date | |
Msg-id | 22227.1338260224@sss.pgh.pa.us Whole thread Raw |
List | pgsql-hackers |
While poking at the slow-relcache-rebuild issue recently reported by Jeff Frost and Greg Mullane, I had set up a test case that used a small function to create and drop enough temp tables to provoke an sinval reset. I had also modified sinval.c so that it reported sinval reset processing as elog(LOG) not elog(DEBUG4). The test script just ran until killed, and I happened to notice that killing the client resulted in something bizarre in the postmaster log: 2012-05-28 22:06:23.943 EDT 19701 LOG: statement: select maketemps(30); 2012-05-28 22:06:24.018 EDT 19701 LOG: sinval reset received 2012-05-28 22:06:24.018 EDT 19701 LOG: statement: select maketemps(30); 2012-05-28 22:06:24.091 EDT 19701 LOG: sinval reset received 2012-05-28 22:06:24.091 EDT 19701 LOG: statement: select maketemps(30); 2012-05-28 22:06:24.162 EDT 19701 LOG: sinval reset received 2012-05-28 22:06:24.163 EDT 19701 LOG: statement: select maketemps(30); 2012-05-28 22:06:24.235 EDT 19701 LOG: could not send data to client: Broken pipe 2012-05-28 22:06:24.235 EDT 19701 LOG: sinval reset received 2012-05-28 22:06:24.235 EDT 19701 FATAL: connection to client lost 2012-05-28 22:06:24.235 EDT 19701 WARNING: StartTransaction while in START state 2012-05-28 22:06:24.235 EDT 19701 WARNING: GUC nest level = 1 at transaction start 2012-05-28 22:06:24.235 EDT 19701 LOG: disconnection: session time: 0:00:12.148 user=tgl database=foo host=[local] Most of that is expected, but not the WARNINGs. Digging into this, I found the following stack trace leading to the first warning: #3 0x0000000000708431 in elog_finish (elevel=<optimized out>, fmt=0x73bfa8 "StartTransaction while in %s state") at elog.c:1212 #4 0x000000000048b17d in StartTransaction () at xact.c:1689 #5 StartTransactionCommand () at xact.c:2440 #6 0x00000000004cac05 in RemoveTempRelationsCallback (code=<optimized out>, arg=<optimized out>) at namespace.c:3753 #7 RemoveTempRelationsCallback (code=<optimized out>, arg=<optimized out>) at namespace.c:3747 #8 0x0000000000628cdd in shmem_exit (code=1) at ipc.c:221 #9 0x0000000000628d55 in proc_exit_prepare (code=1) at ipc.c:181 #10 0x0000000000628dd8 in proc_exit (code=1) at ipc.c:96 #11 0x0000000000705ef3 in errfinish (dummy=<optimized out>) at elog.c:518-- this is ereport(FATAL, errmsg("connection toclient lost"))); #12 0x0000000000641149 in ProcessInterrupts () at postgres.c:2875 #13 ProcessInterrupts () at postgres.c:2824 #14 0x0000000000705eb4 in errfinish (dummy=<optimized out>) at elog.c:543 #15 0x0000000000708431 in elog_finish (elevel=<optimized out>, fmt=0x85387b "sinval reset received") at elog.c:1212 #16 0x000000000062d08a in ReceiveSharedInvalidMessages ( invalFunction=0x6f6800 <LocalExecuteInvalidationMessage>, resetFunction=0x6f6140<InvalidateSystemCaches>) at sinval.c:112 #17 0x000000000048b0f8 in AtStart_Cache () at xact.c:792 #18 StartTransaction () at xact.c:1791 #19 StartTransactionCommand () at xact.c:2440 #20 0x000000000062cefd in ProcessCatchupEvent () at sinval.c:331 #21 0x000000000062d185 in EnableCatchupInterrupt () at sinval.c:267 #22 0x0000000000641665 in prepare_for_client_read () at postgres.c:515 #23 0x000000000059cfc1 in secure_read (port=0x2a49060, ptr=0xb23260, len=8192) at be-secure.c:302 #24 0x00000000005a43ef in pq_recvbuf () at pqcomm.c:816 #25 0x00000000005a4e65 in pq_getbyte () at pqcomm.c:857 #26 0x0000000000642ba1 in SocketBackend (inBuf=0x7fff060ba630) at postgres.c:342 #27 ReadCommand (inBuf=0x7fff060ba630) at postgres.c:490 #28 PostgresMain (argc=<optimized out>, argv=<optimized out>, username=<optimized out>) at postgres.c:3929 That is, we're trying to do an elog(FATAL) exit after having realized the client is dead, and within that we have to run a transaction to get rid of the temp tables we made, and StartTransactionCommand is unhappy with the transaction state. Now, RemoveTempRelationsCallback calls AbortOutOfAnyTransaction before it does StartTransactionCommand, so why are we getting this whining? The answer appears to be that AbortOutOfAnyTransaction is satisfied if it sees blockState == TBLOCK_DEFAULT. In contrast, AbortCurrentTransaction knows that it should poke a little deeper and check for state == TRANS_DEFAULT; otherwise, we failed partway through transaction start and still need to do cleanup. I added the latter code back in 2005, in commit 60b2444cc3ba037630c9b940c3c9ef01b954b87b, which is otherwise entirely unrelated. I suppose testing of that patch had shown me that AbortCurrentTransaction needed to handle this case, but I overlooked that AbortOutOfAnyTransaction might need to do likewise. I intend to add the identical code to AbortOutOfAnyTransaction, as per attached patch against HEAD; this eliminates the WARNING messages in my test case. I think this had probably better get back-patched, too. In my test the warnings seem only cosmetic, but I'm afraid there could be more severe consequences in some cases. regards, tom lane diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index c71a10e..5ae46de 100644 *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *************** AbortOutOfAnyTransaction(void) *** 3861,3867 **** switch (s->blockState) { case TBLOCK_DEFAULT: ! /* Not in a transaction, do nothing */ break; case TBLOCK_STARTED: case TBLOCK_BEGIN: --- 3861,3884 ---- switch (s->blockState) { case TBLOCK_DEFAULT: ! if (s->state == TRANS_DEFAULT) ! { ! /* Not in a transaction, do nothing */ ! } ! else ! { ! /* ! * We can get here after an error during transaction start ! * (state will be TRANS_START). Need to clean up the ! * incompletely started transaction. First, adjust the ! * low-level state to suppress warning message from ! * AbortTransaction. ! */ ! if (s->state == TRANS_START) ! s->state = TRANS_INPROGRESS; ! AbortTransaction(); ! CleanupTransaction(); ! } break; case TBLOCK_STARTED: case TBLOCK_BEGIN: regards, tom lane
pgsql-hackers by date: