Re: Active sessions does not terminated due to statement_timeout - Mailing list pgsql-general

From Ц
Subject Re: Active sessions does not terminated due to statement_timeout
Date
Msg-id 1711553962.427100737@f479.i.mail.ru
Whole thread Raw
In response to Active sessions does not terminated due to statement_timeout  (Ц <pfunk@mail.ru>)
Responses Re: Active sessions does not terminated due to statement_timeout
List pgsql-general
I was able to connect to this process using
 
strace -T -p `pgrep -n -u ks-postgres -f "ocb.*FETCH"` -o strace_of_active_session -ff
(the -T flag gives us the time the system call has been running at the end of each line).
 
If the session had been hanging active for some time, there was only one unfinished line in strace:
epoll_wait(4,
 
 
But if we got it a bit earlier, we got this output:
 
***something before process was catched***
 
epoll_wait(4, [{EPOLLOUT, {u32=1938706152, u64=93859859025640}}], 1, -1) = 1 <0.003515>
sendto(10, "1.211 \320\224\320\232Z200.21121002 \320\240\320\232211)"..., 1968, 0, NULL, 0) = 1968 <0.000068>
sendto(10, " \320\277\320\276\320\273 \321\204\320\265\320\262\321\200\320\260\320\273\321\214 \320\235\320\224\320\241 \320\275"..., 8192, 0, NULL, 0) = 8192 <0.000051>
pread64(407, "y\221\275\0\0\0\0\0\0\342\275[\0\325\320\244\320\232\320\243 \320\220\320\264\320\274\320\270\320\275\320"..., 8192, 10354688) = 8192 <0.000065>
sendto(10, "\377\377\0\0\0\270(907.0702.1340380180.111.2"..., 8192, 0, NULL, 0) = 8192 <0.000066>
pread64(407, "00057476 \320\276\321\202 31.01.2024 \320\262 \321\202.\321"..., 8192, 10362880) = 8192 <0.000044>
sendto(10, "7233430 \320\225\320\264\320\270\320\275\321\213\320\271 \320\275\320\260\320\273\320\276\320\263\320"..., 8192, 0, NULL, 0) = 8192 <0.000073>
 
***a lot of pread64, sendto***
 
sendto(10, "010006300\377\377\377\377\0\0\0X\320\241\320\274\320\276\320\273\320\265\320\275\321\201\320"..., 8192, 0, NULL, 0) = 8192 <0.000023>
pread64(407, "901001\10\2\0\0\30\0\3\0 y\221\235\0\0\0\0\0\0\34\202[\0\20\2\0\0"..., 8192, 16949248) = 8192 <0.000020>
sendto(10, "\321\200\320\265\320\265\321\201\321\202\321\200\321\203 \342\204\226 24119 \320\276\321\202 19"..., 8192, 0, NULL, 0) = 8192 <0.000024>
sendto(10, ") \320\273/\321\20120907234120  \320\221\320\23690741220"..., 8192, 0, NULL, 0) = 8192 <0.000022>
pread64(407, "\320\275\321\201\320\272\320\260 (\320\243\320\277\321\200\320\260\320\262\320\273\320\265\320\275\320\270\320\265 \320"..., 8192, 16957440) = 8192 <0.000019>
sendto(10, "632\0\0\0\n7707083893\0\0\0\0\0\0\0\t6730020"..., 8192, 0, NULL, 0) = 8192 <0.000022>
pread64(407, "\1\0\0\30\0\3\0 y\221\275\0\0\0\0\0\0\356\367[\0\375\320\244\320\232\320\243 \320\220\320"..., 8192, 16965632) = 8192 <0.000020>
sendto(10, "\0\0\0\02403234643667010006300\377\377\377\377\0\0\0d"..., 8192, 0, NULL, 0) = 6632 <0.000026>
sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <2.855325>
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=2292607, si_uid=2001} ---
kill(2292604, SIGURG)                   = 0 <0.000023>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000052>
sendto(10, "\320\260\321\206\320\270\320\270 \320\263\320\276\321\200\320\276\320\264\320\260 \320\241\320\274\320\276\320\273\320\265"..., 1560, 0, NULL, 0) = 1560 <0.000033>
pread64(407, "\320\265\321\202.\23\1\202\4\0J\6\210\23\025044525411\0276732208"..., 8192, 16973824) = 8192 <0.000026>
sendto(10, "2.24,4085 \320\276\321\202 21.02.24 \320\272-\321\202 01"..., 8192, 0, NULL, 0) = 8192 <0.000025>
pread64(407, "\276\320\263. 59003491 \320\276\321\202 16.11.2007 \320\235"..., 8192, 16982016) = 8192 <0.000020>
 
***a lot of pread64, sendto***
 
pread64(407, "\260 (\320\234\320\221\320\236\320\243 \"\320\241\320\250 \342\204\226 11\")+40702"..., 8192, 17375232) = 8192 <0.000021>
sendto(10, "\274\320\276\320\273\320\265\320\275\321\201\320\272\320\276\320\265 \320\276\321\202\320\264\320\265\320\273\320\265\320\275"..., 8192, 0, NULL, 0) = 4000 <0.000025>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000019>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000018>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000020>
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <4.214523>
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
kill(2292604, SIGURG)                   = 0 <0.000028>
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=1790, tv_usec=905739}}, NULL) = 0 <0.000020>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000018>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000023>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000017>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000021>
 
***First hang in strace output for  <1790.905479> seconds***
***It was looking like last line in this file***
***epoll_wait(4,  ***
 
epoll_wait(4, 0x555d738e4f30, 1, -1)    = -1 EINTR (Interrupted system call) <1790.905479>
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
kill(2292604, SIGURG)                   = 0 <0.000035>
kill(-2292604, SIGINT)                  = 0 <0.000020>
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000018>
kill(2292604, SIGINT)                   = 0 <0.000017>
--- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2292604, si_uid=2001} ---
rt_sigreturn({mask=[ALRM URG]})         = 0 <0.000016>
rt_sigreturn({mask=[URG]})              = -1 EINTR (Interrupted system call) <0.000016>
sendto(10, "\n6731030957\0\0\0\0\0\0\0\t673001001\377\377\377\377"..., 4192, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000025>
epoll_wait(4, [{EPOLLIN, {u32=1938706176, u64=93859859025664}}], 1, -1) = 1 <0.000020>
read(11, "\27\0\0\0\0\0\0\0\0\0\0\0|\373\"\0\321\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 128 <0.000018>
epoll_wait(4,
 
***last line looks incomplete and it hangs forever***
 
 
 
When I first noticed that the file was not growing, the last line was the same incomplete one
epoll_wait(4 ,
and I thought that nothing would change.
 
But later I noticed that the file had changed.
The line was formed in:
epoll_wait(4, 0x555d738e4f30, 1, -1) = -1 EINTR (Interrupted system call) <1790.905479>
telling us that epoll_wait had been hanging for 30 minutes.
 
An there were added 13 more lines with the same ending epoll_wait(4,
...but it was already final.
 
Here are the values of the possibly involved parameters:
 
                name                 | setting | unit
-------------------------------------+---------+------
 deadlock_timeout                    | 1000    | ms
 idle_in_transaction_session_timeout | 30000   | ms
 idle_session_timeout                | 60000   | ms
 statement_timeout                   | 1800000 | ms
 tcp_keepalives_count                | 4       |
 tcp_keepalives_idle                 | 30      | s
 tcp_keepalives_interval             | 5       | s
 tcp_user_timeout                    | 0       | ms
 
 
PS: found an issue in node-postgres with similar symptoms but can't figure out how their solution can be applied in my case.
https://github.com/brianc/node-postgres/issues/1952
 

Вторник, 26 марта 2024, 17:19 +03:00 от Ц <pfunk@mail.ru>:
 
Greetings!
I’ve faced with strange behavior when I see a lot of active sessions started hours ago while statement_timeout = '30min'.
All of them are fetching from cursors.
 
Typical session looks like:
backend_start    | 2024-03-26 14:34:20.552594+03
xact_start           | 2024-03-26 14:34:54.974628+03
query_start         | 2024-03-26 14:35:02.024133+03
state_change     | 2024-03-26 14:35:02.024134+03
wait_event_type | Client
wait_event          | ClientWrite
state                   | active
backend_xid       | 23240392
backend_xmin    | 23226474
query                   | fetch all from "<unnamed portal 20>"
backend_type     | client backend
 
 
They are accumulating up to tens by the end of the day with all negative impacts on performance.
Initially I thought that clients already died but due to network issues database considers them to be alive. So I set tcp_keepalive GUCs to nonzero values. Without success.
Then I checked connections from the app server side and found them in ESTABLISHED state.
It's certainly an application fault and it should not hold cursor forever...but
 
Is the any GUC parameters to fight with such «clients»?
 
 
 
 
 
 

pgsql-general by date:

Previous
From: "Bandi, Venkataramana - Dell Team"
Date:
Subject: RE: Query on Postgres SQL transaction
Next
From: Ц
Date:
Subject: Re: Active sessions does not terminated due to statement_timeout