AbortOutOfAnyTransaction is a few bricks shy of a load - Mailing list pgsql-hackers

From Tom Lane
Subject AbortOutOfAnyTransaction is a few bricks shy of a load
Date
Msg-id 22227.1338260224@sss.pgh.pa.us
Whole thread Raw
List pgsql-hackers
While poking at the slow-relcache-rebuild issue recently reported by
Jeff Frost and Greg Mullane, I had set up a test case that used a
small function to create and drop enough temp tables to provoke an
sinval reset.  I had also modified sinval.c so that it reported sinval
reset processing as elog(LOG) not elog(DEBUG4).  The test script just
ran until killed, and I happened to notice that killing the client
resulted in something bizarre in the postmaster log:

2012-05-28 22:06:23.943 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.018 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.018 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.091 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.091 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.162 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.163 EDT 19701 LOG:  statement: select maketemps(30);
2012-05-28 22:06:24.235 EDT 19701 LOG:  could not send data to client: Broken pipe
2012-05-28 22:06:24.235 EDT 19701 LOG:  sinval reset received
2012-05-28 22:06:24.235 EDT 19701 FATAL:  connection to client lost
2012-05-28 22:06:24.235 EDT 19701 WARNING:  StartTransaction while in START state
2012-05-28 22:06:24.235 EDT 19701 WARNING:  GUC nest level = 1 at transaction start
2012-05-28 22:06:24.235 EDT 19701 LOG:  disconnection: session time: 0:00:12.148 user=tgl database=foo host=[local]

Most of that is expected, but not the WARNINGs.  Digging into this,
I found the following stack trace leading to the first warning:

#3  0x0000000000708431 in elog_finish (elevel=<optimized out>,    fmt=0x73bfa8 "StartTransaction while in %s state") at
elog.c:1212
#4  0x000000000048b17d in StartTransaction () at xact.c:1689
#5  StartTransactionCommand () at xact.c:2440
#6  0x00000000004cac05 in RemoveTempRelationsCallback (code=<optimized out>, arg=<optimized out>)   at
namespace.c:3753
#7  RemoveTempRelationsCallback (code=<optimized out>, arg=<optimized out>) at namespace.c:3747
#8  0x0000000000628cdd in shmem_exit (code=1) at ipc.c:221
#9  0x0000000000628d55 in proc_exit_prepare (code=1) at ipc.c:181
#10 0x0000000000628dd8 in proc_exit (code=1) at ipc.c:96
#11 0x0000000000705ef3 in errfinish (dummy=<optimized out>) at elog.c:518-- this is ereport(FATAL, errmsg("connection
toclient lost")));
 
#12 0x0000000000641149 in ProcessInterrupts () at postgres.c:2875
#13 ProcessInterrupts () at postgres.c:2824
#14 0x0000000000705eb4 in errfinish (dummy=<optimized out>) at elog.c:543
#15 0x0000000000708431 in elog_finish (elevel=<optimized out>,    fmt=0x85387b "sinval reset received") at elog.c:1212
#16 0x000000000062d08a in ReceiveSharedInvalidMessages (   invalFunction=0x6f6800 <LocalExecuteInvalidationMessage>,
resetFunction=0x6f6140<InvalidateSystemCaches>) at sinval.c:112
 
#17 0x000000000048b0f8 in AtStart_Cache () at xact.c:792
#18 StartTransaction () at xact.c:1791
#19 StartTransactionCommand () at xact.c:2440
#20 0x000000000062cefd in ProcessCatchupEvent () at sinval.c:331
#21 0x000000000062d185 in EnableCatchupInterrupt () at sinval.c:267
#22 0x0000000000641665 in prepare_for_client_read () at postgres.c:515
#23 0x000000000059cfc1 in secure_read (port=0x2a49060, ptr=0xb23260, len=8192) at be-secure.c:302
#24 0x00000000005a43ef in pq_recvbuf () at pqcomm.c:816
#25 0x00000000005a4e65 in pq_getbyte () at pqcomm.c:857
#26 0x0000000000642ba1 in SocketBackend (inBuf=0x7fff060ba630) at postgres.c:342
#27 ReadCommand (inBuf=0x7fff060ba630) at postgres.c:490
#28 PostgresMain (argc=<optimized out>, argv=<optimized out>, username=<optimized out>)   at postgres.c:3929

That is, we're trying to do an elog(FATAL) exit after having realized
the client is dead, and within that we have to run a transaction to get
rid of the temp tables we made, and StartTransactionCommand is unhappy
with the transaction state.

Now, RemoveTempRelationsCallback calls AbortOutOfAnyTransaction before
it does StartTransactionCommand, so why are we getting this whining?
The answer appears to be that AbortOutOfAnyTransaction is satisfied if
it sees blockState == TBLOCK_DEFAULT.  In contrast,
AbortCurrentTransaction knows that it should poke a little deeper and
check for state == TRANS_DEFAULT; otherwise, we failed partway through
transaction start and still need to do cleanup.  I added the latter code
back in 2005, in commit 60b2444cc3ba037630c9b940c3c9ef01b954b87b, which
is otherwise entirely unrelated.  I suppose testing of that patch had
shown me that AbortCurrentTransaction needed to handle this case, but
I overlooked that AbortOutOfAnyTransaction might need to do likewise.

I intend to add the identical code to AbortOutOfAnyTransaction, as per
attached patch against HEAD; this eliminates the WARNING messages in my
test case.  I think this had probably better get back-patched, too.
In my test the warnings seem only cosmetic, but I'm afraid there could
be more severe consequences in some cases.
        regards, tom lane

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c71a10e..5ae46de 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** AbortOutOfAnyTransaction(void)
*** 3861,3867 ****         switch (s->blockState)         {             case TBLOCK_DEFAULT:
!                 /* Not in a transaction, do nothing */                 break;             case TBLOCK_STARTED:
    case TBLOCK_BEGIN:
 
--- 3861,3884 ----         switch (s->blockState)         {             case TBLOCK_DEFAULT:
!                 if (s->state == TRANS_DEFAULT)
!                 {
!                     /* Not in a transaction, do nothing */
!                 }
!                 else
!                 {
!                     /*
!                      * We can get here after an error during transaction start
!                      * (state will be TRANS_START).  Need to clean up the
!                      * incompletely started transaction.  First, adjust the
!                      * low-level state to suppress warning message from
!                      * AbortTransaction.
!                      */
!                     if (s->state == TRANS_START)
!                         s->state = TRANS_INPROGRESS;
!                     AbortTransaction();
!                     CleanupTransaction();
!                 }                 break;             case TBLOCK_STARTED:             case TBLOCK_BEGIN:

        regards, tom lane


pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Next
From: Alex
Date:
Subject: Re: libpq URL syntax vs SQLAlchemy