Thread: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes getsignored when statement timeout is pending
[BUGS] BUG #14821: idle_in_transaction_session_timeout sometimes getsignored when statement timeout is pending
From
lukas@fittl.com
Date:
The following bug has been logged on the website: Bug reference: 14821 Logged by: Lukas Fittl Email address: lukas@fittl.com PostgreSQL version: 9.6.4 Operating system: Amazon Linux RHEL 6 Description: Hi all, I've been observing a problem on 9.6.4, but in my understanding this still applies on master too. Relevant config: idle_in_transaction_session_timeout = 30s statement_timeout = 30s Backend thats idle in transaction and should have been cancelled: => SELECT pid, state, now() - backend_start AS backend_age, now() - xact_start AS xact_age, now() - state_change AS state_age, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY backend_start;pid | state | backend_age | xact_age | state_age | query ------+---------------------+-----------------+-----------------+-----------------+------------------4005 | idle in transaction| 04:29:51.844063 | 01:19:10.984972 | 01:19:10.984947 | BEGIN READ WRITE (1 row) Backtrace from that backend: (gdb) bt #0 0x00007f9dffb298c3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x00000000006a06d6 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7fff0e0f3cf0, cur_timeout=-1, set=0x29ea1a8) at latch.c:1053 #2 WaitEventSetWait (set=0x29ea1a8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7fff0e0f3cf0, nevents=nevents@entry=1) at latch.c:1007 #3 0x00000000005e4985 in secure_read (port=0x2a42f30, ptr=0xc09260 <PqRecvBuffer>, len=8192) at be-secure.c:149 #4 0x00000000005ed898 in pq_recvbuf () at pqcomm.c:940 #5 0x00000000005ee3a5 in pq_getbyte () at pqcomm.c:983 #6 0x00000000006bf012 in SocketBackend (inBuf=0x7fff0e0f3e80) at postgres.c:326 #7 ReadCommand (inBuf=0x7fff0e0f3e80) at postgres.c:499 #8 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2a4c4f8, dbname=0x2a4c468 "mydb", username=<optimized out>) at postgres.c:4019 #9 0x000000000046ed2b in BackendRun (port=0x2a42f30) at postmaster.c:4294 #10 BackendStartup (port=0x2a42f30) at postmaster.c:3968 #11 ServerLoop () at postmaster.c:1719 #12 0x0000000000666946 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x29e86b0) at postmaster.c:1327 #13 0x000000000046f8a2 in main (argc=3, argv=0x29e86b0) at main.c:228 So as you can see its waiting on line 326 of tcop/postgres.c for data from pq_getbyte(). And the reason that ProcessInterrupts() can't move forward is due to the HOLD_CANCEL_INTERRUPTS() just before that in combination with a pending query cancellation. Relevant variables: (gdb) p InterruptHoldoffCount $13 = 0 (gdb) p CritSectionCount $14 = 0 (gdb) p QueryCancelHoldoffCount $16 = 1 (gdb) p QueryCancelPending $17 = 1 '\001' (gdb) p IdleInTransactionSessionTimeout $1 = 30000 (gdb) p IdleInTransactionSessionTimeoutPending $2 = 1 '\001' (gdb) p all_timeouts[IDLE_IN_TRANSACTION_SESSION_TIMEOUT] $3 = {index = IDLE_IN_TRANSACTION_SESSION_TIMEOUT, indicator = 1 '\001', timeout_handler = 0x7baee0 <IdleInTransactionSessionTimeoutHandler>, start_time = 559266384031137, fin_time = 559266414031137} It seems to me that ProcessInterrupts() should check for the idle in transaction timeout first, since it short-circuits and returns early if QueryCancelPending && QueryCancelHoldoffCount != 0 before even getting there. Best, Lukas -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimesgets ignored when statement timeout is pending
From
Andres Freund
Date:
Hi, On 2017-09-21 01:09:56 +0000, lukas@fittl.com wrote: > It seems to me that ProcessInterrupts() should check for the idle in > transaction timeout first, since it short-circuits and returns early if > QueryCancelPending && QueryCancelHoldoffCount != 0 before even getting > there. Indeed - although I wonder if the correct fix isn't to move things around, but to instead avoid the order dependency changing the short circuit logic so that there's no early return. Like e.g. if (QueryCancelPending && QueryCancelHoldoffCount != 0){ /* rearm */ } else if (QueryCancelPending) { /* handle interrupt */ } there's really no good reason for the return right now, and it's bound to create more bugs in the future. Greetings, Andres Freund -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14821: idle_in_transaction_session_timeout sometimesgets ignored when statement timeout is pending
From
Lukas Fittl
Date:
Hi,
On Wed, Sep 20, 2017 at 6:29 PM, Andres Freund <andres@anarazel.de> wrote:
-- Hi,
On 2017-09-21 01:09:56 +0000, lukas@fittl.com wrote:
> It seems to me that ProcessInterrupts() should check for the idle in
> transaction timeout first, since it short-circuits and returns early if
> QueryCancelPending && QueryCancelHoldoffCount != 0 before even getting
> there.
Indeed - although I wonder if the correct fix isn't to move things
around, but to instead avoid the order dependency changing the short
circuit logic so that there's no early return. Like e.g.
if (QueryCancelPending && QueryCancelHoldoffCount != 0)
{
/* rearm */
}
else if (QueryCancelPending)
{
/* handle interrupt */
}
there's really no good reason for the return right now, and it's bound
to create more bugs in the future.
That seems like a reasonable approach and a good idea.
I'll take a stab at making a patch for this tonight and send it over to hackers then.
Best,
Lukas
Lukas Fittl
Skype: lfittl
Phone: +1 415 321 0630
Skype: lfittl
Phone: +1 415 321 0630