The following bug has been logged on the website:
Bug reference: 14821
Logged by: Lukas Fittl
Email address: lukas@fittl.com
PostgreSQL version: 9.6.4
Operating system: Amazon Linux RHEL 6
Description:
Hi all,
I've been observing a problem on 9.6.4, but in my understanding this still
applies on master too.
Relevant config:
idle_in_transaction_session_timeout = 30s
statement_timeout = 30s
Backend thats idle in transaction and should have been cancelled:
=> SELECT pid, state, now() - backend_start AS backend_age, now() -
xact_start AS xact_age, now() - state_change AS state_age, query FROM
pg_stat_activity WHERE state = 'idle in transaction' ORDER BY
backend_start;pid | state | backend_age | xact_age |
state_age | query
------+---------------------+-----------------+-----------------+-----------------+------------------4005 | idle in
transaction| 04:29:51.844063 | 01:19:10.984972 |
01:19:10.984947 | BEGIN READ WRITE
(1 row)
Backtrace from that backend:
(gdb) bt
#0 0x00007f9dffb298c3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1 0x00000000006a06d6 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7fff0e0f3cf0, cur_timeout=-1, set=0x29ea1a8) at
latch.c:1053
#2 WaitEventSetWait (set=0x29ea1a8, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7fff0e0f3cf0,
nevents=nevents@entry=1) at latch.c:1007
#3 0x00000000005e4985 in secure_read (port=0x2a42f30, ptr=0xc09260
<PqRecvBuffer>, len=8192) at be-secure.c:149
#4 0x00000000005ed898 in pq_recvbuf () at pqcomm.c:940
#5 0x00000000005ee3a5 in pq_getbyte () at pqcomm.c:983
#6 0x00000000006bf012 in SocketBackend (inBuf=0x7fff0e0f3e80) at
postgres.c:326
#7 ReadCommand (inBuf=0x7fff0e0f3e80) at postgres.c:499
#8 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2a4c4f8,
dbname=0x2a4c468 "mydb", username=<optimized out>) at postgres.c:4019
#9 0x000000000046ed2b in BackendRun (port=0x2a42f30) at postmaster.c:4294
#10 BackendStartup (port=0x2a42f30) at postmaster.c:3968
#11 ServerLoop () at postmaster.c:1719
#12 0x0000000000666946 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x29e86b0) at postmaster.c:1327
#13 0x000000000046f8a2 in main (argc=3, argv=0x29e86b0) at main.c:228
So as you can see its waiting on line 326 of tcop/postgres.c for data from
pq_getbyte(). And the reason that ProcessInterrupts() can't move forward is
due to the HOLD_CANCEL_INTERRUPTS() just before that in combination with a
pending query cancellation.
Relevant variables:
(gdb) p InterruptHoldoffCount
$13 = 0
(gdb) p CritSectionCount
$14 = 0
(gdb) p QueryCancelHoldoffCount
$16 = 1
(gdb) p QueryCancelPending
$17 = 1 '\001'
(gdb) p IdleInTransactionSessionTimeout
$1 = 30000
(gdb) p IdleInTransactionSessionTimeoutPending
$2 = 1 '\001'
(gdb) p all_timeouts[IDLE_IN_TRANSACTION_SESSION_TIMEOUT]
$3 = {index = IDLE_IN_TRANSACTION_SESSION_TIMEOUT, indicator = 1 '\001',
timeout_handler = 0x7baee0 <IdleInTransactionSessionTimeoutHandler>,
start_time = 559266384031137, fin_time = 559266414031137}
It seems to me that ProcessInterrupts() should check for the idle in
transaction timeout first, since it short-circuits and returns early if
QueryCancelPending && QueryCancelHoldoffCount != 0 before even getting
there.
Best,
Lukas
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs