Thread: Terminate the idle sessions
Hi, hackers When some clients connect to database in idle state, postgres do not close the idle sessions, here i add a new GUC idle_session_timeout to let postgres close the idle sessions, it samilar to idle_in_transaction_session_timeout. Best, regards. Japin Li
Attachment
Hi, hackers
When some clients connect to database in idle state, postgres do not close the idle sessions,
here i add a new GUC idle_session_timeout to let postgres close the idle sessions, it samilar
to idle_in_transaction_session_timeout
On Jun 9, 2020, at 10:35 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:I’m curious as to the use case because I cannot imagine using this. Idle connections are normal. Seems better to monitor them and conditionally execute the disconnect backend function from the monitoring layer than indiscriminately disconnect based upon time.
On Wed, Jun 10, 2020 at 05:20:36AM +0000, Li Japin wrote: > I agree with you. But we can also give the user to control the idle > sessions lifetime. Idle sessions staying around can be a problem in the long run as they impact snapshot building. You could for example use a background worker to do this work, like that: https://github.com/michaelpq/pg_plugins/tree/master/kill_idle -- Michael
Attachment
On Jun 10, 2020, at 4:25 PM, Michael Paquier <michael@paquier.xyz> wrote:Idle sessions staying around can be a problem in the long run as they
impact snapshot building. You could for example use a background
worker to do this work, like that:
https://github.com/michaelpq/pg_plugins/tree/master/kill_idle
On Jun 10, 2020, at 10:27 PM, Adam Brusselback <adambrusselback@gmail.com> wrote:My use case is, I have a primary application that connects to the DB, most users work through that (setting is useless for this scenario, app manages it's connections well enough). I also have a number of internal users who deal with data ingestion and connect to the DB directly to work, and those users sometimes leave query windows open for days accidentally. Generally not an issue, but would be nice to be able to time those connections out.
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed I applied this patch to the PG13 branch and generally this feature works as described. The new "idle_session_timeout" thatcontrols the idle session disconnection is not in the default postgresql.conf and I think it should be included therewith default value 0, which means disabled. There is currently no enforced minimum value for "idle_session_timeout" (except for value 0 for disabling the feature), souser can put any value larger than 0 and it could be very small like 500 or even 50 millisecond, this would make any psqlconnection to disconnect shortly after it has connected, which may not be ideal. Many systems I have worked with have30 minutes inactivity timeout by default, and I think it would be better and safer to enforce a reasonable minimum timeoutvalue
There is currently no enforced minimum value for "idle_session_timeout" (except for value 0 for disabling the feature), so user can put any value larger than 0 and it could be very small like 500 or even 50 millisecond, this would make any psql connection to disconnect shortly after it has connected, which may not be ideal. Many systems I have worked with have 30 minutes inactivity timeout by default, and I think it would be better and safer to enforce a reasonable minimum timeout value
On Aug 11, 2020, at 5:42 AM, Cary Huang <cary.huang@highgo.ca> wrote:I applied this patch to the PG13 branch and generally this feature works as described. The new "idle_session_timeout" that controls the idle session disconnection is not in the default postgresql.conf and I think it should be included there with default value 0, which means disabled.
On Mon, Aug 10, 2020 at 2:43 PM Cary Huang <cary.huang@highgo.ca> wrote:There is currently no enforced minimum value for "idle_session_timeout" (except for value 0 for disabling the feature), so user can put any value larger than 0 and it could be very small like 500 or even 50 millisecond, this would make any psql connection to disconnect shortly after it has connected, which may not be ideal. Many systems I have worked with have 30 minutes inactivity timeout by default, and I think it would be better and safer to enforce a reasonable minimum timeout valueI'd accept a value of say 1,000 being minimum in order to reinforce the fact that a unit-less input, while possible, is taken to be milliseconds and such small values most likely mean the user has made a mistake. I would not choose a minimum allowed value solely based on our concept of "reasonable". I don't imagine a value of say 10 seconds, while seemingly unreasonable, is going to be unsafe.
Attachment
On Tue, Aug 11, 2020 at 8:45 AM Li Japin <japinli@hotmail.com> wrote: > > I’ve attached a new version that add “idle_session_timeout” in the default postgresql.conf. > Hi, I would like to just mention a use case I thought of while discussing [1]: In postgres_fdw: assuming we use idle_in_session_timeout on remote backends, the remote sessions will be closed after timeout, but the locally cached connection cache entries still exist and become stale. The subsequent queries that may use the cached connections will fail, of course these subsequent queries can retry the connections only at the beginning of a remote txn but not in the middle of a remote txn, as being discussed in [2]. For instance, in a long running local txn, let say we used a remote connection at the beginning of the local txn(note that it will open a remote session and it's entry is cached in local connection cache), only we use the cached connection later at some point in the local txn, by then let say the idle_in_session_timeout has happened on the remote backend and the remote session would have been closed, the long running local txn will fail instead of succeeding. I think, since the idle_session_timeout is by default disabled, we have no problem. My thought is what if a user enables the feature(knowingly or unknowingly) on the remote backend? If the user knows about the above scenario, that may be fine. On the other hand, either we can always the feature on the remote backend(at the beginning of the remote txn, like we set for some other configuration settings see - configure_remote_session() in connection.c) or how about mentioning the above scenario in this feature documentation? [1] - https://www.postgresql.org/message-id/CALj2ACU1NBQo9mihA15dFf6udkOi7m0u2_s5QJ6dzk%3DZQyVbwQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/flat/CALj2ACUAi23vf1WiHNar_LksM9EDOWXcbHCo-fD4Mbr1d%3D78YQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
On Aug 14, 2020, at 2:15 PM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:I think, since the idle_session_timeout is by default disabled, we
have no problem. My thought is what if a user enables the
feature(knowingly or unknowingly) on the remote backend? If the user
knows about the above scenario, that may be fine. On the other hand,
either we can always the feature on the remote backend(at the
beginning of the remote txn, like we set for some other configuration
settings see - configure_remote_session() in connection.c) or how
about mentioning the above scenario in this feature documentation?
>
> On Aug 14, 2020, at 2:15 PM, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I think, since the idle_session_timeout is by default disabled, we
> have no problem. My thought is what if a user enables the
> feature(knowingly or unknowingly) on the remote backend? If the user
> knows about the above scenario, that may be fine. On the other hand,
> either we can always the feature on the remote backend(at the
> beginning of the remote txn, like we set for some other configuration
> settings see - configure_remote_session() in connection.c) or how
> about mentioning the above scenario in this feature documentation?
>
> Though we can disable the idle_session_timeout when using postgres_fdw,
> there still has locally cached connection cache entries when the remote sessions
> terminated by accident. AFAIK, you have provided a patch to solve this
> problem, and it is in current CF [1].
>
> [1] - https://commitfest.postgresql.org/29/2651/
>
Yes, that solution can retry the cached connections at only the beginning of the remote txn and not at the middle of the remote txn and that makes sense as we can not retry connecting to a different remote backend in the middle of a remote txn.
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Hello. At Mon, 17 Aug 2020 19:28:10 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Aug 14, 2020 at 1:32 PM Li Japin <japinli@hotmail.com> wrote: > > > > On Aug 14, 2020, at 2:15 PM, Bharath Rupireddy < > bharath.rupireddyforpostgres@gmail.com> wrote: > > > > I think, since the idle_session_timeout is by default disabled, we > > have no problem. My thought is what if a user enables the > > feature(knowingly or unknowingly) on the remote backend? If the user > > knows about the above scenario, that may be fine. On the other hand, > > either we can always the feature on the remote backend(at the > > beginning of the remote txn, like we set for some other configuration > > settings see - configure_remote_session() in connection.c) or how > > about mentioning the above scenario in this feature documentation? > > > > Though we can disable the idle_session_timeout when using postgres_fdw, > > there still has locally cached connection cache entries when the remote > sessions > > terminated by accident. AFAIK, you have provided a patch to solve this > > problem, and it is in current CF [1]. > > > > [1] - https://commitfest.postgresql.org/29/2651/ > > > > Yes, that solution can retry the cached connections at only the beginning > of the remote txn and not at the middle of the remote txn and that makes > sense as we can not retry connecting to a different remote backend in the > middle of a remote txn. > > +1 for disabling the idle_session_timeout feature in case of postgres_fdw. > This can avoid the remote backends to timeout during postgres_fdw usages. The same already happens for idle_in_transaction_session_timeout and we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's a bit cumbersome, though. I don't think we should (at least implicitly) disable those timeouts ad-hockerly for postgres_fdw. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Aug 18, 2020, at 9:19 AM, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:The same already happens for idle_in_transaction_session_timeout and
we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's
a bit cumbersome, though. I don't think we should (at least
implicitly) disable those timeouts ad-hockerly for postgres_fdw.
On Tue, Aug 18, 2020 at 2:13 PM Li Japin <japinli@hotmail.com> wrote: > On Aug 18, 2020, at 9:19 AM, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > The same already happens for idle_in_transaction_session_timeout and > we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's > a bit cumbersome, though. I don't think we should (at least > implicitly) disable those timeouts ad-hockerly for postgres_fdw. > > +1. This seems like a reasonable feature to me. The delivery of the error message explaining what happened is probably not reliable, so to some clients and on some operating systems this will be indistinguishable from a dropped network connection or other error, but that's OK and we already have that problem with the existing timeout-based disconnection feature. The main problem I have with it is the high frequency setitimer() calls. If you enable both statement_timeout and idle_session_timeout, then we get up to huge number of system calls, like the following strace -c output for a few seconds of one backend under pgbench -S workload shows: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 39.45 0.118685 0 250523 setitimer 29.98 0.090200 0 125275 sendto 24.30 0.073107 0 126235 973 recvfrom 6.01 0.018068 0 20950 pread64 0.26 0.000779 0 973 epoll_wait ------ ----------- ----------- --------- --------- ---------------- 100.00 0.300839 523956 973 total There's a small but measurable performance drop from this, as also discussed in another thread about another kind of timeout[1]. Maybe we should try to fix that with something like the attached? [1] https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru
Attachment
On Aug 31, 2020, at 8:51 AM, Thomas Munro <thomas.munro@gmail.com> wrote:The main problem I have with it is the high frequency setitimer()
calls. If you enable both statement_timeout and idle_session_timeout,
then we get up to huge number of system calls, like the following
strace -c output for a few seconds of one backend under pgbench -S
workload shows:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
39.45 0.118685 0 250523 setitimer
29.98 0.090200 0 125275 sendto
24.30 0.073107 0 126235 973 recvfrom
6.01 0.018068 0 20950 pread64
0.26 0.000779 0 973 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.300839 523956 973 total
On Mon, Aug 31, 2020 at 2:40 PM Li Japin <japinli@hotmail.com> wrote: > Could you give the more details about the test instructions? Hi Japin, Sure. Because I wasn't trying to get reliable TPS number or anything, I just used a simple short read-only test with one connection, like this: pgbench -i -s10 postgres pgbench -T60 -Mprepared -S postgres Then I looked for the active backend and ran strace -c -p XXX for a few seconds and hit ^C to get the counters. I doubt the times are very accurate, but the number of calls is informative. If you do that on a server running with -c statement_timeout=10s, you see one setitimer() per transaction. If you also use -c idle_session_timeout=10s at the same time, you see two.
At Mon, 31 Aug 2020 12:51:20 +1200, Thomas Munro <thomas.munro@gmail.com> wrote in > On Tue, Aug 18, 2020 at 2:13 PM Li Japin <japinli@hotmail.com> wrote: > > On Aug 18, 2020, at 9:19 AM, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > The same already happens for idle_in_transaction_session_timeout and > > we can use "ALTER ROLE/DATABASE SET" to dislable or loosen them, it's > > a bit cumbersome, though. I don't think we should (at least > > implicitly) disable those timeouts ad-hockerly for postgres_fdw. > > > > +1. > > This seems like a reasonable feature to me. > > The delivery of the error message explaining what happened is probably > not reliable, so to some clients and on some operating systems this > will be indistinguishable from a dropped network connection or other > error, but that's OK and we already have that problem with the > existing timeout-based disconnection feature. > > The main problem I have with it is the high frequency setitimer() > calls. If you enable both statement_timeout and idle_session_timeout, > then we get up to huge number of system calls, like the following > strace -c output for a few seconds of one backend under pgbench -S > workload shows: > > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 39.45 0.118685 0 250523 setitimer > 29.98 0.090200 0 125275 sendto > 24.30 0.073107 0 126235 973 recvfrom > 6.01 0.018068 0 20950 pread64 > 0.26 0.000779 0 973 epoll_wait > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.300839 523956 973 total > > There's a small but measurable performance drop from this, as also > discussed in another thread about another kind of timeout[1]. Maybe > we should try to fix that with something like the attached? > > [1] https://www.postgresql.org/message-id/flat/77def86b27e41f0efcba411460e929ae%40postgrespro.ru I think it's worth doing. Maybe we can get rid of doing anything other than removing an entry in the case where we disable a non-nearest timeout. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
> On Aug 31, 2020, at 11:43 AM, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Aug 31, 2020 at 2:40 PM Li Japin <japinli@hotmail.com> wrote: >> Could you give the more details about the test instructions? > > Hi Japin, > > Sure. Because I wasn't trying to get reliable TPS number or anything, > I just used a simple short read-only test with one connection, like > this: > > pgbench -i -s10 postgres > pgbench -T60 -Mprepared -S postgres > > Then I looked for the active backend and ran strace -c -p XXX for a > few seconds and hit ^C to get the counters. I doubt the times are > very accurate, but the number of calls is informative. > > If you do that on a server running with -c statement_timeout=10s, you > see one setitimer() per transaction. If you also use -c > idle_session_timeout=10s at the same time, you see two. Hi, Thomas, Thanks for your point out this problem, here is the comparison. Without Optimize settimer usage: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 41.22 1.444851 1 1317033 setitimer 28.41 0.995936 2 658622 sendto 24.63 0.863316 1 659116 599 recvfrom 5.71 0.200275 2 111055 pread64 0.03 0.001152 2 599 epoll_wait 0.00 0.000000 0 1 epoll_ctl ------ ----------- ----------- --------- --------- ---------------- 100.00 3.505530 2746426 599 total With Optimize settimer usage: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 49.89 1.464332 1 1091429 sendto 40.83 1.198389 1 1091539 219 recvfrom 9.26 0.271890 1 183321 pread64 0.02 0.000482 2 214 epoll_wait 0.00 0.000013 3 5 setitimer 0.00 0.000010 2 5 rt_sigreturn 0.00 0.000000 0 1 epoll_ctl ------ ----------- ----------- --------- --------- ---------------- 100.00 2.935116 2366514 219 total Here’s a modified version of Thomas’s patch.
Attachment
Dear Li, I read your patch, and I think the documentation is too simple to avoid all problems. (I think if some connection pooling is used, the same problem will occur.) Could you add some explanations in the doc file? I made an example: ``` Note that this values should be set to zero if you use postgres_fdw or some Connection-pooling software, because connections might be closed unexpectedly. ``` I will send other comments if I find something. Hayato Kuroda FUJITSU LIMITED
On Nov 13, 2020, at 6:27 PM, kuroda.hayato@fujitsu.com wrote:
I read your patch, and I think the documentation is too simple to avoid all problems.
(I think if some connection pooling is used, the same problem will occur.)
Could you add some explanations in the doc file? I made an example:
```
Note that this values should be set to zero if you use postgres_fdw or some
Connection-pooling software, because connections might be closed unexpectedly.
```
Attachment
Dear Li, > Thanks for your advice! Attached v4. I confirmed it. OK. > @@ -30,6 +30,7 @@ typedef enum TimeoutId > STANDBY_DEADLOCK_TIMEOUT, > STANDBY_TIMEOUT, > STANDBY_LOCK_TIMEOUT, > + IDLE_SESSION_TIMEOUT, > IDLE_IN_TRANSACTION_SESSION_TIMEOUT, > /* First user-definable timeout reason */ > USER_TIMEOUT, I'm not familiar with timeout, but I can see that the priority of idle-session is set lower than transaction-timeout. Could you explain the reason? In my image this timeout locates at the lowest layer, so it might have the lowest priority. Other codes are still checked :-(. Hayato Kuroda FUJITSU LIMITED
On Nov 16, 2020, at 1:22 PM, kuroda.hayato@fujitsu.com wrote:
@@ -30,6 +30,7 @@ typedef enum TimeoutId
STANDBY_DEADLOCK_TIMEOUT,
STANDBY_TIMEOUT,
STANDBY_LOCK_TIMEOUT,
+ IDLE_SESSION_TIMEOUT,
IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
/* First user-definable timeout reason */
USER_TIMEOUT,
I'm not familiar with timeout, but I can see that the priority of idle-session is set lower than transaction-timeout.
Could you explain the reason? In my image this timeout locates at the lowest layer, so it might have the lowest
priority.
Attachment
Thanks for your review! Attached.
On Nov 17, 2020, at 7:59 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Mon, Nov 16, 2020 at 5:41 AM Li Japin <japinli@hotmail.com> wrote:Thanks for your review! Attached.Reading the doc changes:I'd rather not name postgres_fdw explicitly, or at least not solely, as a reason for setting this to zero. Additionally, using postgres_fdw within the server doesn't cause issues, its using postgres_fdw and the remote server having this setting set to zero that causes a problem.<note>Consider setting this for specific users instead of as a server default. Client connections managed by connection poolers, or initiated indirectly like those by a remote postgres_fdw using server, should probably be excluded from this timeout.Text within <para> should be indented one space (you missed both under listitem).
I'd suggest a comment that aside from a bit of resource consumption idle sessions do not interfere with the long-running stability of the server, unlike idle-in-transaction sessions which are controlled by the other configuration setting.
On Monday, November 16, 2020, Li Japin <japinli@hotmail.com> wrote:
<note>Consider setting this for specific users instead of as a server default. Client connections managed by connection poolers, or initiated indirectly like those by a remote postgres_fdw using server, should probably be excluded from this timeout.<note><para>- This parameter should be set to zero if you use postgres_fdw or some- connection-pooling software, because connections might be closed unexpectedly.+ This parameter should be set to zero if you use some connection-pooling software, or+ PostgreSQL servers used by postgres_fdw, because connections might be closed unexpectedly.</para></note>
Could you please explain how the idle-in-transaction interfere the long-running stability?
On Nov 17, 2020, at 10:53 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Monday, November 16, 2020, Li Japin <japinli@hotmail.com> wrote:<note>Consider setting this for specific users instead of as a server default. Client connections managed by connection poolers, or initiated indirectly like those by a remote postgres_fdw using server, should probably be excluded from this timeout.<note><para>- This parameter should be set to zero if you use postgres_fdw or some- connection-pooling software, because connections might be closed unexpectedly.+ This parameter should be set to zero if you use some connection-pooling software, or+ PostgreSQL servers used by postgres_fdw, because connections might be closed unexpectedly.</para></note>Prefer mine, “or pg servers used by postgres_fdw”, doesn’t flow.Could you please explain how the idle-in-transaction interfere the long-running stability?From the docs (next section):This allows any locks held by that session to be released and the connection slot to be reused; it also allows tuples visible only to this transaction to be vacuumed. See Section 24.1 for more details about this.
Attachment
Dear Li, David,
> Additionally, using postgres_fdw within the server doesn't cause issues,
> its using postgres_fdw and the remote server having this setting set to zero that causes a problem.
I didn't know the fact that postgres_fdw can use within the server... Thanks.
I read optimize-setitimer patch, and looks basically good. I put what I understanding,
so please confirm it whether your implementation is correct.
(Maybe I missed some simultaneities, so please review anyone...)
[besic consept]
sigalrm_due_at means the time that interval timer will ring, and sigalrm_delivered means who calls schedule_alarm().
If fin_time of active_timeouts[0] is larger than or equal to sigalrm_due_at,
stop calling setitimer because handle_sig_alarm() will be call sooner.
[when call setitimer]
In the attached patch, setitimer() will be only called the following scenarios:
* when handle_sig_alarm() is called due to the pqsignal
* when a timeout is registered and its fin_time is later than active_timeous[0]
* when disable a timeout
* when handle_sig_alarm() is interrupted and rescheduled(?)
According to comments, handle_sig_alarm() may be interrupted because of the ereport.
I think if handle_sig_alarm() is interrupted before subsutituting sigalrm_due_at to true,
interval timer will be never set. Is it correct, or is my assumption wrong?
Lastly, I found that setitimer is obsolete and should change to another one. According to my man page:
```
POSIX.1-2001, SVr4, 4.4BSD (this call first appeared in 4.2BSD).
POSIX.1-2008 marks getitimer() and setitimer() obsolete,
recommending the use of the POSIX timers API (timer_gettime(2), timer_settime(2), etc.) instead.
```
Do you have an opinion for this? I think it should be changed
if all platform can support timer_settime system call, but this fix affects all timeouts,
so more considerations might be needed.
Best Regards,
Hayato Kuroda
FUJITSU LIMITED
On Nov 17, 2020, at 2:07 PM, kuroda.hayato@fujitsu.com wrote:Dear Li, David,> Additionally, using postgres_fdw within the server doesn't cause issues,> its using postgres_fdw and the remote server having this setting set to zero that causes a problem.I didn't know the fact that postgres_fdw can use within the server... Thanks.I read optimize-setitimer patch, and looks basically good. I put what I understanding,so please confirm it whether your implementation is correct.(Maybe I missed some simultaneities, so please review anyone...)[besic consept]sigalrm_due_at means the time that interval timer will ring, and sigalrm_delivered means who calls schedule_alarm().If fin_time of active_timeouts[0] is larger than or equal to sigalrm_due_at,stop calling setitimer because handle_sig_alarm() will be call sooner.
[when call setitimer]In the attached patch, setitimer() will be only called the following scenarios:* when handle_sig_alarm() is called due to the pqsignal* when a timeout is registered and its fin_time is later than active_timeous[0]* when disable a timeout* when handle_sig_alarm() is interrupted and rescheduled(?)According to comments, handle_sig_alarm() may be interrupted because of the ereport.I think if handle_sig_alarm() is interrupted before subsutituting sigalrm_due_at to true,interval timer will be never set. Is it correct, or is my assumption wrong?
Lastly, I found that setitimer is obsolete and should change to another one. According to my man page:```POSIX.1-2001, SVr4, 4.4BSD (this call first appeared in 4.2BSD).POSIX.1-2008 marks getitimer() and setitimer() obsolete,recommending the use of the POSIX timers API (timer_gettime(2), timer_settime(2), etc.) instead.```Do you have an opinion for this? I think it should be changedif all platform can support timer_settime system call, but this fix affects all timeouts,so more considerations might be needed.
Dear Li, > I’m not familiar with the system interrupt, however, > the sigalrm_due_at is subsutitue between HOLD_INTERRUPTS() > and RESUM_INTERRUPTS(), so I think it cannot be interrupted. > The following comments comes from miscadmin.h. Right, but how about before HOLD_INTERRUPTS()? If so, only calling handle_sig_alarm() is occurred, and Setitimer will not be set, I think. > Not sure! I find that Win32 do not support setitimer(), > PostgreSQL emulate setitimer() by creating a persistent thread to handle > the timer setting and notification upon timeout. Yes, set/getitimer() is the systemcall, and implemented only in the unix-like system. But I rethink that such a fix is categorized in the refactoring and we should separate topic. Hence we can ignore here. Best Regards, Hayato Kuroda FUJITSU LIMITED From: Li Japin <japinli@hotmail.com> Sent: Tuesday, November 17, 2020 6:02 PM To: Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> Cc: David G. Johnston <david.g.johnston@gmail.com>; Kyotaro Horiguchi <horikyota.ntt@gmail.com>; Thomas Munro <thomas.munro@gmail.com>;bharath.rupireddyforpostgres@gmail.com; pgsql-hackers@lists.postgresql.org Subject: Re: Terminate the idle sessions On Nov 17, 2020, at 2:07 PM, mailto:kuroda.hayato@fujitsu.com wrote: Dear Li, David, > Additionally, using postgres_fdw within the server doesn't cause issues, > its using postgres_fdw and the remote server having this setting set to zero that causes a problem. I didn't know the fact that postgres_fdw can use within the server... Thanks. I read optimize-setitimer patch, and looks basically good. I put what I understanding, so please confirm it whether your implementation is correct. (Maybe I missed some simultaneities, so please review anyone...) [besic consept] sigalrm_due_at means the time that interval timer will ring, and sigalrm_delivered means who calls schedule_alarm(). If fin_time of active_timeouts[0] is larger than or equal to sigalrm_due_at, stop calling setitimer because handle_sig_alarm() will be call sooner. Agree. The sigalrm_delivered means a timer has been handled by handle_sig_alarm(), so we should call setitimer() in next schedule_alarm(). [when call setitimer] In the attached patch, setitimer() will be only called the following scenarios: * when handle_sig_alarm() is called due to the pqsignal * when a timeout is registered and its fin_time is later than active_timeous[0] * when disable a timeout * when handle_sig_alarm() is interrupted and rescheduled(?) According to comments, handle_sig_alarm() may be interrupted because of the ereport. I think if handle_sig_alarm() is interrupted before subsutituting sigalrm_due_at to true, interval timer will be never set. Is it correct, or is my assumption wrong? I’m not familiar with the system interrupt, however, the sigalrm_due_at is subsutitue between HOLD_INTERRUPTS() and RESUM_INTERRUPTS(), so I think it cannot be interrupted. The following comments comes from miscadmin.h. > The HOLD_INTERRUPTS() and RESUME_INTERRUPTS() macros > allow code to ensure that no cancel or die interrupt will be accepted, > even if CHECK_FOR_INTERRUPTS() gets called in a subroutine. The interrupt > will be held off until CHECK_FOR_INTERRUPTS() is done outside any > HOLD_INTERRUPTS() ... RESUME_INTERRUPTS() section. Lastly, I found that setitimer is obsolete and should change to another one. According to my man page: ``` POSIX.1-2001, SVr4, 4.4BSD (this call first appeared in 4.2BSD). POSIX.1-2008 marks getitimer() and setitimer() obsolete, recommending the use of the POSIX timers API (timer_gettime(2), timer_settime(2), etc.) instead. ``` Do you have an opinion for this? I think it should be changed if all platform can support timer_settime system call, but this fix affects all timeouts, so more considerations might be needed. Not sure! I find that Win32 do not support setitimer(), PostgreSQL emulate setitimer() by creating a persistent thread tohandle the timer setting and notification upon timeout. So if we want to replace it, I think we should open a new thread to achieve this. -- Best regards Japin Li
On Nov 18, 2020, at 10:40 AM, kuroda.hayato@fujitsu.com wrote:
I’m not familiar with the system interrupt, however,
the sigalrm_due_at is subsutitue between HOLD_INTERRUPTS()
and RESUM_INTERRUPTS(), so I think it cannot be interrupted.
The following comments comes from miscadmin.h.
Right, but how about before HOLD_INTERRUPTS()?
If so, only calling handle_sig_alarm() is occurred, and
Setitimer will not be set, I think.
Dear Li, > Yeah, it might be occurred. Any suggestions to fix it? Oops.. I forgot putting my suggestion. Sorry. How about substituting sigalrm_delivered to true in the reschedule_timeouts()? Maybe this processing looks strange, so some comments should be put too. Here is an example: ```diff @@ -423,7 +423,14 @@ reschedule_timeouts(void) /* Reschedule the interrupt, if any timeouts remain active. */ if (num_active_timeouts > 0) + { + /* + * sigalrm_delivered is set to true, + * because any intrreputions might be occured. + */ + sigalrm_delivered = true; schedule_alarm(GetCurrentTimestamp()); + } } ``` Best Regards, Hayato Kuroda FUJITSU LIMITED
On Nov 18, 2020, at 2:22 PM, kuroda.hayato@fujitsu.com wrote:Oops.. I forgot putting my suggestion. Sorry.
How about substituting sigalrm_delivered to true in the reschedule_timeouts()?
Maybe this processing looks strange, so some comments should be put too.
Here is an example:
```diff
@@ -423,7 +423,14 @@ reschedule_timeouts(void)
/* Reschedule the interrupt, if any timeouts remain active. */
if (num_active_timeouts > 0)
+ {
+ /*
+ * sigalrm_delivered is set to true,
+ * because any intrreputions might be occured.
+ */
+ sigalrm_delivered = true;
schedule_alarm(GetCurrentTimestamp());
+ }
}
```
Attachment
Dear Li, > Thanks for your suggestion. Attached! I prefer your comments:-). I think this patch is mostly good. I looked whole the codes again and I found the following comment in the PostgresMain(): ```c /* * (5) turn off the idle-in-transaction timeout */ ``` Please mention about idle-session timeout and check another comment. Best Regards, Hayato Kuroda
Thanks! Add the comment for idle-session timeout.Dear Li,Thanks for your suggestion. Attached!I prefer your comments:-). I think this patch is mostly good. I looked whole the codes again and I found the following comment in the PostgresMain(): ```c /* * (5) turn off the idle-in-transaction timeout */ ``` Please mention about idle-session timeout and check another comment.
Attachment
Dear Li, > Thanks! Add the comment for idle-session timeout. I confirmed it. OK. I don't have any comments anymore. If no one has, I will change the status few days later. Other comments or suggestions to him? Best Regards, Hayato Kuroda FUJITSU LIMITED
No one have any comments, patch tester says OK, and I think this works well. I changed status to "Ready for Committer." Hayato Kuroda FUJITSU LIMITED -----Original Message----- From: kuroda.hayato@fujitsu.com <kuroda.hayato@fujitsu.com> Sent: Friday, November 20, 2020 11:05 AM To: 'japin' <japinli@hotmail.com> Cc: David G. Johnston <david.g.johnston@gmail.com>; Kyotaro Horiguchi <horikyota.ntt@gmail.com>; Thomas Munro <thomas.munro@gmail.com>;bharath.rupireddyforpostgres@gmail.com; pgsql-hackers@lists.postgresql.org Subject: RE: Terminate the idle sessions Dear Li, > Thanks! Add the comment for idle-session timeout. I confirmed it. OK. I don't have any comments anymore. If no one has, I will change the status few days later. Other comments or suggestions to him? Best Regards, Hayato Kuroda FUJITSU LIMITED
No one have any comments, patch tester says OK, and I think this works well.
I changed status to "Ready for Committer."
Hi, Kuroda Thank for your review. > On Nov 24, 2020, at 8:01 AM, kuroda.hayato@fujitsu.com wrote: > > No one have any comments, patch tester says OK, and I think this works well. > I changed status to "Ready for Committer." > > Hayato Kuroda > FUJITSU LIMITED > > -----Original Message----- > From: kuroda.hayato@fujitsu.com <kuroda.hayato@fujitsu.com> > Sent: Friday, November 20, 2020 11:05 AM > To: 'japin' <japinli@hotmail.com> > Cc: David G. Johnston <david.g.johnston@gmail.com>; Kyotaro Horiguchi <horikyota.ntt@gmail.com>; Thomas Munro <thomas.munro@gmail.com>;bharath.rupireddyforpostgres@gmail.com; pgsql-hackers@lists.postgresql.org > Subject: RE: Terminate the idle sessions > > Dear Li, > >> Thanks! Add the comment for idle-session timeout. > > I confirmed it. OK. > > > I don't have any comments anymore. If no one has, > I will change the status few days later. > Other comments or suggestions to him? > > Best Regards, > Hayato Kuroda > FUJITSU LIMITED > -- Best regards Japin Li
Sorry for ignoring this suggestion.On Nov 24, 2020, at 11:39 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Mon, Nov 23, 2020 at 5:02 PM kuroda.hayato@fujitsu.com <kuroda.hayato@fujitsu.com> wrote:No one have any comments, patch tester says OK, and I think this works well.
I changed status to "Ready for Committer."Some proof-reading:v8-0001Documentation:My suggestion wasn't taken for the first note paragraph (review/author disagreement) and the current has the following issues:
"if you use some connection-pooling" software doesn't need the word "some"Don't substitute "pg" for the name of the product, PostgreSQL.The word "used" is a more stylistic dislike, but "connected to using postgres_fdw" would be a better choice IMO.Code (minor, but if you are in there anyway):
(5) turn off ... timeout (there are now two, timeouts should be plural)
How about use “foreign-data wrapper” replace “postgres_fdw”?
- This parameter should be set to zero if you use some connection-pooling software,- or pg servers used by postgres_fdw, because connections might be closed unexpectedly.+ This parameter should be set to zero if you use connection-pooling software,+ or <productname>PostgreSQL</productname> servers connected to using foreign-data+ wrapper, because connections might be closed unexpectedly.</para>
On Nov 24, 2020, at 11:20 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Mon, Nov 23, 2020 at 11:22 PM Li Japin <japinli@hotmail.com> wrote:How about use “foreign-data wrapper” replace “postgres_fdw”?I don't see much value in avoiding mentioning that specific term - my proposal turned it into an example instead of being exclusive.- This parameter should be set to zero if you use some connection-pooling software,- or pg servers used by postgres_fdw, because connections might be closed unexpectedly.+ This parameter should be set to zero if you use connection-pooling software,+ or <productname>PostgreSQL</productname> servers connected to using foreign-data+ wrapper, because connections might be closed unexpectedly.</para>Maybe:+ or your PostgreSQL server receives connection from postgres_fdw or similar middleware.+ Such software is expected to self-manage its connections.
Attachment
On Nov 24, 2020, at 11:20 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:On Mon, Nov 23, 2020 at 11:22 PM Li Japin <japinli@hotmail.com> wrote:How about use “foreign-data wrapper” replace “postgres_fdw”?I don't see much value in avoiding mentioning that specific term - my proposal turned it into an example instead of being exclusive.- This parameter should be set to zero if you use some connection-pooling software,- or pg servers used by postgres_fdw, because connections might be closed unexpectedly.+ This parameter should be set to zero if you use connection-pooling software,+ or <productname>PostgreSQL</productname> servers connected to using foreign-data+ wrapper, because connections might be closed unexpectedly.</para>Maybe:+ or your PostgreSQL server receives connection from postgres_fdw or similar middleware.+ Such software is expected to self-manage its connections.Thank you for your suggestion and patient! Fixed.```+ <para>+ This parameter should be set to zero if you use connection-pooling software,+ or <productname>PostgreSQL</productname> servers connected to using postgres_fdw+ or similar middleware (such software is expected to self-manage its connections),+ because connections might be closed unexpectedly.+ </para>```--Best regardsJapin Li
Status update for a commitfest entry.
As far as I see, all recommendations from reviewers were addressed in the last version of the patch.
It passes CFbot successfully, so I move it to Ready For Committer.
-- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Li Japin <japinli@hotmail.com> writes: > [ v9-0001-Allow-terminating-the-idle-sessions.patch ] I've reviewed and pushed this. A few notes: * Thomas' patch for improving timeout.c seems like a great idea, but it did indeed have a race condition, and I felt the comments could do with more work. * I'm not entirely comfortable with the name "idle_session_timeout", because it sounds like it applies to all idle states, but actually it only applies when we're not in a transaction. I left the name alone and tried to clarify the docs, but I wonder whether a different name would be better. (Alternatively, we could say that it does apply in all cases, making the effective timeout when in a transaction the smaller of the two GUCs. But that'd be more complex to implement and less flexible, so I'm not in favor of that answer.) * The SQLSTATE you chose for the new error condition seems pretty random. I do not see it in the SQL standard, so using a code that's within the spec-reserved code range is certainly wrong. I went with 08P02 (the "P" makes it outside the reserved range), but I'm not really happy either with using class 08 ("Connection Exception", which seems to be mainly meant for connection-request failures), or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is practically identical but it's not even in the same major class. Now 25 ("Invalid Transaction State") is certainly not right for this new error, but I think what that's telling us is that 25 was a misguided choice for the other error. In a green field I think I'd put both of them in class 53 ("Insufficient Resources") or maybe class 57 ("Operator Intervention"). Can we get away with renumbering the older error at this point? In any case I'd be inclined to move the new error to 53 or 57 --- anybody have a preference which? * I think the original intent in timeout.h was to have 10 slots available for run-time-defined timeout reasons. This is the third predefined one we've added since the header was created, so it's starting to look a little tight. I adjusted the code to hopefully preserve 10 free slots going forward. * I noted the discussion about dropping setitimer in place of some newer kernel API. I'm not sure that that is worth the trouble in isolation, but it might be worth doing if we can switch the whole module over to relying on CLOCK_MONOTONIC, so as to make its behavior less flaky if the sysadmin steps the system clock. Portability might be a big issue here though, plus we'd have to figure out how we want to define enable_timeout_at(), which is unlikely to want to use CLOCK_MONOTONIC values. In any case, that's surely material for a whole new thread. regards, tom lane
On Thu, Jan 7, 2021 at 12:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > * Thomas' patch for improving timeout.c seems like a great idea, but > it did indeed have a race condition, and I felt the comments could do > with more work. Oops, and thanks! Very happy to see this one in the tree. > * I'm not entirely comfortable with the name "idle_session_timeout", > because it sounds like it applies to all idle states, but actually > it only applies when we're not in a transaction. I left the name > alone and tried to clarify the docs, but I wonder whether a different > name would be better. (Alternatively, we could say that it does > apply in all cases, making the effective timeout when in a transaction > the smaller of the two GUCs. But that'd be more complex to implement > and less flexible, so I'm not in favor of that answer.) Hmm, it is a bit confusing, but having them separate is indeed more flexible. > * The SQLSTATE you chose for the new error condition seems pretty > random. I do not see it in the SQL standard, so using a code that's > within the spec-reserved code range is certainly wrong. I went with > 08P02 (the "P" makes it outside the reserved range), but I'm not > really happy either with using class 08 ("Connection Exception", > which seems to be mainly meant for connection-request failures), > or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is > practically identical but it's not even in the same major class. > Now 25 ("Invalid Transaction State") is certainly not right for > this new error, but I think what that's telling us is that 25 was a > misguided choice for the other error. In a green field I think I'd > put both of them in class 53 ("Insufficient Resources") or maybe class > 57 ("Operator Intervention"). Can we get away with renumbering the > older error at this point? In any case I'd be inclined to move the > new error to 53 or 57 --- anybody have a preference which? I don't have a strong view here, but 08 with a P doesn't seem crazy to me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout), 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for that, distinguished from deadlock by another error field), after these timeouts you don't have a session/connection anymore. The two are a bit different though: in the older one, you were in a transaction, and it seems to me quite newsworthy that your transaction has been aborted, information that is not conveyed quite so clearly with a connection-related error class. > * I noted the discussion about dropping setitimer in place of some > newer kernel API. I'm not sure that that is worth the trouble in > isolation, but it might be worth doing if we can switch the whole > module over to relying on CLOCK_MONOTONIC, so as to make its behavior > less flaky if the sysadmin steps the system clock. Portability > might be a big issue here though, plus we'd have to figure out how > we want to define enable_timeout_at(), which is unlikely to want to > use CLOCK_MONOTONIC values. In any case, that's surely material > for a whole new thread. +1
On Thu, Jan 7, 2021 at 3:03 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Jan 7, 2021 at 12:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > * The SQLSTATE you chose for the new error condition seems pretty > > random. I do not see it in the SQL standard, so using a code that's > > within the spec-reserved code range is certainly wrong. I went with > > 08P02 (the "P" makes it outside the reserved range), but I'm not > > really happy either with using class 08 ("Connection Exception", > > which seems to be mainly meant for connection-request failures), > > or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is > > practically identical but it's not even in the same major class. > > Now 25 ("Invalid Transaction State") is certainly not right for > > this new error, but I think what that's telling us is that 25 was a > > misguided choice for the other error. In a green field I think I'd > > put both of them in class 53 ("Insufficient Resources") or maybe class > > 57 ("Operator Intervention"). Can we get away with renumbering the > > older error at this point? In any case I'd be inclined to move the > > new error to 53 or 57 --- anybody have a preference which? > > I don't have a strong view here, but 08 with a P doesn't seem crazy to > me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout), > 55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for > that, distinguished from deadlock by another error field), after these > timeouts you don't have a session/connection anymore. The two are a > bit different though: in the older one, you were in a transaction, and > it seems to me quite newsworthy that your transaction has been > aborted, information that is not conveyed quite so clearly with a > connection-related error class. Hmm, on further reflection it's still more similar than different and I'd probably have voted for 08xxx for the older one too if I'd been paying attention. One of the strange things about these errors is that they're asynchronous/unsolicited, but they appear to the client to be the response to their next request (if it doesn't eat ECONNRESET instead). That makes me think we should try to make it clear that it's a sort of lower level thing, and not really a response to your next request at all. Perhaps 08 does that. But it's not obvious... I see a couple of DB2 extension SQLSTATEs saying you have no connection: 57015 = "Connection to the local Db2 not established" and 51006 = "A valid connection has not been established", and then there's standard 08003 = "connection does not exist" which we're currently using to shout into the void when the *client* goes away (and also for dblink failure to find named connection, a pretty unrelated meaning).
Thomas Munro <thomas.munro@gmail.com> writes: > One of the strange things about these errors is that they're > asynchronous/unsolicited, but they appear to the client to be the > response to their next request (if it doesn't eat ECONNRESET instead). Right, which is what makes class 57 (operator intervention) seem attractive to me. From the client's standpoint these look little different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, which are in that category. regards, tom lane
On Jan 7, 2021, at 10:03 AM, Thomas Munro <thomas.munro@gmail.com> wrote:
* I'm not entirely comfortable with the name "idle_session_timeout",
because it sounds like it applies to all idle states, but actually
it only applies when we're not in a transaction. I left the name
alone and tried to clarify the docs, but I wonder whether a different
name would be better. (Alternatively, we could say that it does
apply in all cases, making the effective timeout when in a transaction
the smaller of the two GUCs. But that'd be more complex to implement
and less flexible, so I'm not in favor of that answer.)
Hmm, it is a bit confusing, but having them separate is indeed more flexible.
* The SQLSTATE you chose for the new error condition seems pretty
random. I do not see it in the SQL standard, so using a code that's
within the spec-reserved code range is certainly wrong. I went with
08P02 (the "P" makes it outside the reserved range), but I'm not
really happy either with using class 08 ("Connection Exception",
which seems to be mainly meant for connection-request failures),
or the fact that ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT is
practically identical but it's not even in the same major class.
Now 25 ("Invalid Transaction State") is certainly not right for
this new error, but I think what that's telling us is that 25 was a
misguided choice for the other error. In a green field I think I'd
put both of them in class 53 ("Insufficient Resources") or maybe class
57 ("Operator Intervention"). Can we get away with renumbering the
older error at this point? In any case I'd be inclined to move the
new error to 53 or 57 --- anybody have a preference which?
I don't have a strong view here, but 08 with a P doesn't seem crazy to
me. Unlike eg 57014 (statement_timeout), 57017 (deadlock_timeout),
55P03 (lock_timeout... huh, I just noticed that DB2 uses 57017 for
that, distinguished from deadlock by another error field), after these
timeouts you don't have a session/connection anymore. The two are a
bit different though: in the older one, you were in a transaction, and
it seems to me quite newsworthy that your transaction has been
aborted, information that is not conveyed quite so clearly with a
connection-related error class.
On Thu, Jan 7, 2021 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > One of the strange things about these errors is that they're > > asynchronous/unsolicited, but they appear to the client to be the > > response to their next request (if it doesn't eat ECONNRESET instead). > > Right, which is what makes class 57 (operator intervention) seem > attractive to me. From the client's standpoint these look little > different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, > which are in that category. Yeah, that's a good argument.
Thomas Munro <thomas.munro@gmail.com> writes: > On Thu, Jan 7, 2021 at 4:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> One of the strange things about these errors is that they're >>> asynchronous/unsolicited, but they appear to the client to be the >>> response to their next request (if it doesn't eat ECONNRESET instead). >> Right, which is what makes class 57 (operator intervention) seem >> attractive to me. From the client's standpoint these look little >> different from ERRCODE_ADMIN_SHUTDOWN or ERRCODE_CRASH_SHUTDOWN, >> which are in that category. > Yeah, that's a good argument. Given the lack of commentary on this thread, I'm guessing that people aren't so excited about this topic that a change in the existing sqlstate assignment for ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT would fly. So I propose to change the new ERRCODE_IDLE_SESSION_TIMEOUT to be in class 57 and call it good. regards, tom lane
Dear Tom, > So I propose to change the new ERRCODE_IDLE_SESSION_TIMEOUT to be in > class 57 and call it good. I agreed your suggestion and I confirmed your commit. Thanks! Hayato Kuroda FUJITSU LIMITED