Thread: Function to kill backend
Hi! When debugging on win32, I've created myself a little function that I feel should be added to the "backend proper". While it adds a lot of vlaue on win32, I think it adds quite a bit of value on non-win32 platforms as well... The function is pg_kill_backend(<pid>,<signal>). superuser-only, of course. Which simply sends a signal to the specified backend - querycancel, terminate, etc. The advantage over using the kill command from a shell account is, well, you don't need shell access to the db server. On win32, that's going to be more common than on Unix - plus, if you want to signal a specific backend, you need a special tool (can't do from tas kmanager/service manager etc - service manager can only do the postmaster, and task manager can only do kill -9). I also think a function like this could be good to have for e.g. pgadmin, to implement some more "management functionality". For example, in MSSQL I can go into a view called "current activity", pick a "bad user", right-click and cancel query or terminate session. To do this remote, a funciton like this is required. pg_stat_activity can be used to get a list of sessions and their pids. The function should probably be complemented with a pg_get_postmasterpid or something along that way, to be able to send signals to th epostmaster itself. So, would such a function be accepted into the backend code? And if so, any preferences on where you want it put? //Magnus
Magnus Hagander wrote: >Hi! > >When debugging on win32, I've created myself a little function that I >feel should be added to the "backend proper". While it adds a lot of >vlaue on win32, I think it adds quite a bit of value on non-win32 >platforms as well... > >The function is pg_kill_backend(<pid>,<signal>). superuser-only, of >course. Which simply sends a signal to the specified backend - >querycancel, terminate, etc. > > If' we're going to have this shouldn't it be a proper command? And maybe an internal shutdown command to go with it? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Magnus Hagander wrote: >> The function is pg_kill_backend(<pid>,<signal>). superuser-only, of >> course. Which simply sends a signal to the specified backend - >> querycancel, terminate, etc. > If' we're going to have this shouldn't it be a proper command? And maybe > an internal shutdown command to go with it? I don't like the idea at all, but if we were to have something it would definitely need to be a lot more constrained than send-any-signal-to-any-postgres-process ... even for a superuser, that's a mighty fat-gauge foot-gun. regards, tom lane
>>Hi! >> >>When debugging on win32, I've created myself a little function that I >>feel should be added to the "backend proper". While it adds a lot of >>vlaue on win32, I think it adds quite a bit of value on non-win32 >>platforms as well... >> >>The function is pg_kill_backend(<pid>,<signal>). superuser-only, of >>course. Which simply sends a signal to the specified backend - >>querycancel, terminate, etc. >> >> > >If' we're going to have this shouldn't it be a proper command? >And maybe >an internal shutdown command to go with it? I guess it could be. I guess: 1) I don't know how to do one of those ;-) Which is why I didn't even think it. 2) Won't that clutter up the namespace more, by introducing more keywords that you can't use for other things? 3) Will it still be possible to do the kind of things Rod mentioned, e.g. SELECT pg_kill_backend(procpid, 'TERM') FROM pg_stat_activity WHERE current_query LIKE '<IDLE>%'; ? //Magnus
> -----Original Message----- > From: Magnus Hagander [mailto:mha@sollentuna.net] > Sent: Friday, April 02, 2004 2:34 PM > To: Andrew Dunstan; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Function to kill backend > > > >>Hi! > >> > >>When debugging on win32, I've created myself a little > function that I > >>feel should be added to the "backend proper". While it adds > a lot of > >>vlaue on win32, I think it adds quite a bit of value on non-win32 > >>platforms as well... > >> > >>The function is pg_kill_backend(<pid>,<signal>). superuser-only, of > >>course. Which simply sends a signal to the specified backend - > >>querycancel, terminate, etc. > >> > >> > > > >If' we're going to have this shouldn't it be a proper command? > >And maybe > >an internal shutdown command to go with it? > > I guess it could be. I guess: > > 1) I don't know how to do one of those ;-) Which is why I > didn't even think it. > > 2) Won't that clutter up the namespace more, by introducing > more keywords that you can't use for other things? > > 3) Will it still be possible to do the kind of things Rod > mentioned, e.g. SELECT pg_kill_backend(procpid, 'TERM') FROM > pg_stat_activity WHERE current_query LIKE '<IDLE>%'; ? Sybase had something like that: Syb_kill <pid> to kill a dangling process. And the undocumented: Syb_terminate <pid> to absolutely, positively kill it (Syb_kill only worked sometimes). In general, I think this approach is a bit worrisome. It reminds one of the famous tagline: "Tip: Don't kill -9 the postmaster."
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Magnus Hagander wrote: >> >> >>>The function is pg_kill_backend(<pid>,<signal>). superuser-only, of >>>course. Which simply sends a signal to the specified backend - >>>querycancel, terminate, etc. >>> >>> > > > >>If' we're going to have this shouldn't it be a proper command? And maybe >>an internal shutdown command to go with it? >> >> > >I don't like the idea at all, but if we were to have something >it would definitely need to be a lot more constrained than >send-any-signal-to-any-postgres-process ... even for a superuser, >that's a mighty fat-gauge foot-gun. > > > What sort of constraints do you have in mind? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> it would definitely need to be a lot more constrained than >> send-any-signal-to-any-postgres-process ... even for a superuser, >> that's a mighty fat-gauge foot-gun. > What sort of constraints do you have in mind? I'd limit it to SIGINT (query cancel) and SIGTERM (fast shutdown), and I'm not even real sure about SIGTERM. That facility is designed to work in the case of shutting down all backends together --- I'm not sure I want to promise that it behaves pleasantly to SIGTERM one backend and leave the rest going. Nor do I see a real good use-case for it. Also, no killing processes that aren't regular backends (eg, the bgwriter, the stats processes, and most especially the postmaster). Another point is that killing by PID is not necessarily what you want to do --- kill by transaction ID might be a better API, especially for query-cancel cases. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > Tom Lane wrote: > >> it would definitely need to be a lot more constrained than > >> send-any-signal-to-any-postgres-process ... even for a superuser, > >> that's a mighty fat-gauge foot-gun. > > > What sort of constraints do you have in mind? > > I'd limit it to SIGINT (query cancel) and SIGTERM (fast shutdown), > and I'm not even real sure about SIGTERM. That facility is designed to > work in the case of shutting down all backends together --- I'm not sure > I want to promise that it behaves pleasantly to SIGTERM one backend and > leave the rest going. Nor do I see a real good use-case for it. > > Also, no killing processes that aren't regular backends (eg, the > bgwriter, the stats processes, and most especially the postmaster). > > Another point is that killing by PID is not necessarily what you want to > do --- kill by transaction ID might be a better API, especially for > query-cancel cases. Seems like useful functionality. Right now, how does an administrator kill another backend from psql? They can't. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Seems like useful functionality. Right now, how does an administrator > kill another backend from psql? They can't. The question to ask is "should they be able to?" I think any such facility is inherently a security risk, since it means that a remote attacker who's managed to break into your superuser account can randomly zap other backends. Now admittedly there's plenty of other mischief he can do with superuser privs, but that doesn't mean we should hand him a pre-loaded, pre-sighted cannon. Having to log into the database server locally to execute such operations doesn't seem that bad to me. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Seems like useful functionality. Right now, how does an administrator > > kill another backend from psql? They can't. > > The question to ask is "should they be able to?" > > I think any such facility is inherently a security risk, since it means > that a remote attacker who's managed to break into your superuser > account can randomly zap other backends. Now admittedly there's plenty > of other mischief he can do with superuser privs, but that doesn't mean > we should hand him a pre-loaded, pre-sighted cannon. > > Having to log into the database server locally to execute such > operations doesn't seem that bad to me. If they can read/write your data (as superuser), killing backends is the least worry. I can see it as useful as part of pg_stat_activity output. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Tom Lane wrote: >>> it would definitely need to be a lot more constrained than >>> send-any-signal-to-any-postgres-process ... even for a superuser, >>> that's a mighty fat-gauge foot-gun. > >> What sort of constraints do you have in mind? > >I'd limit it to SIGINT (query cancel) and SIGTERM (fast shutdown), Actually, that is a restriction that's already there - just didn't get into those details. Since the functino as I wrote it so far just takes signal name as a string (can't rely on signal numbers being identical across platforms, right?), and then comparing it with a fixed set of signals. >and I'm not even real sure about SIGTERM. That facility is designed to >work in the case of shutting down all backends together --- >I'm not sure >I want to promise that it behaves pleasantly to SIGTERM one backend and >leave the rest going. Nor do I see a real good use-case for it. Really? Then what is the recommended way of shutting down a backend that you are not connected to, as an administrator? Even if you are logged in with shell access? I may have been doing things wrong for a long time, because I have certainly killed backends with TERM many times without problems. If that's not safe, there really ought to be a tip on the mailinglists to complement the "don't kill -9 the postmaster" with "and don't ever kill the backends, period"? I'm sure I'm not the only one who has done that... >Also, no killing processes that aren't regular backends (eg, the >bgwriter, the stats processes, and most especially the postmaster). That sounds like a reasonable limitation to add. Either by specifically excluding these processes, or by limiting it to only work on the backends currently listed in pg_stat_activity. >Another point is that killing by PID is not necessarily what >you want to >do --- kill by transaction ID might be a better API, especially for >query-cancel cases. Well, in my scenarios, killing by PID is what I need. But I guess transaction IDs might be added to the pg_stat_activity, which would give me the same functionality (I usually check that one first to see what a backend does, before I do anything) - and then some, because the transaction id carries other information as well. Question on that - how will it handle an idle backend (that has not explicitly opened a transaction, and is not executing a command in an implicit transaction)? > I think any such facility is inherently a security risk, since it means > that a remote attacker who's managed to break into your superuser > account can randomly zap other backends. Now admittedly there's plenty > of other mischief he can do with superuser privs, but that doesn't mean > we should hand him a pre-loaded, pre-sighted cannon. > Having to log into the database server locally to execute such > operations doesn't seem that bad to me. It does to me. I prefer being able to admin the server without having to do a separate login. I also much prefer being able to delegate the capability to terminate a backend, interrupt a long-running query, etc to someone who does not have to have shell access on the server. I guess it depends on the environment. > Bruce Momjian <pgman@candle.pha.pa.us> writes: >> If they can read/write your data (as superuser), killing backends is the >> least worry. That's pretty much the assumption I was working under. //Magnus
Tom, > > Seems like useful functionality. Right now, how does an administrator > > kill another backend from psql? They can't. > > The question to ask is "should they be able to?" And the answer is, "Yes". This is a commonly requested feature by DBA's migrating from SQL Server and Oracle. In those databases, there is a GUI to monitor database requests, and potentially kill them to resolve deadlocks or runaway queries (though, in the case of SQL server, it does not work). Right now, it is very difficult for any of our GUI projects to construct such an interface due to the necessity of root shell access. > I think any such facility is inherently a security risk, since it means > that a remote attacker who's managed to break into your superuser > account can randomly zap other backends. Now admittedly there's plenty > of other mischief he can do with superuser privs, but that doesn't mean > we should hand him a pre-loaded, pre-sighted cannon. And requiring DBAs to use root shell access whenever they want to stop a runaway query is somehow a good security approach? If nothing else, it exposes lots of DBAs to the temptation to use SIGKILL instead off SIGINT or SIGTERM, making the database shut down. And I, personally, worry about the number of root shells I have to use, becuase every once in a while I forget and leave one open at the end of the day. Killing backends with runaway queries is a routine administrative task. It should be possible to accomplish it remotely, using tools provided by PostgreSQL instead of the command shell, because then it is possible for us to limit what those tools can do. Further, if an intruder has superuser access, having them kill random backends is the last thing I'm worried about. "DROP DATABASE" ranks a lot higher. In fact, it would be nice if they started killing random backends because then I'd know something was wrong. -- -Josh BerkusAglio Database SolutionsSan Francisco
> This is a commonly requested feature by DBA's migrating from SQL Server and > Oracle. In those databases, there is a GUI to monitor database requests, > and potentially kill them to resolve deadlocks or runaway queries (though, in > the case of SQL server, it does not work). Right now, it is very difficult > for any of our GUI projects to construct such an interface due to the > necessity of root shell access. Yes, MySQL can do it too. http://www.mysql.com/doc/en/KILL.html I would love to have a KILL <pid> command in postgres. I don't know how you would restrict it to only being able to kill postgres backends though. Chris
On 2-Apr-04, at 5:16 PM, Andrew Dunstan wrote: > If' we're going to have this shouldn't it be a proper command? Why? What benefit would this offer over implementing this feature as a function? -Neil
Josh Berkus <josh@agliodbs.com> writes: > Killing backends with runaway queries is a routine administrative > task. Cancelling runaway queries is a routine task. I'm less convinced that a remote kill (ie SIGTERM) facility is such a great idea. regards, tom lane
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >>Killing backends with runaway queries is a routine administrative >>task. > > > Cancelling runaway queries is a routine task. I'm less convinced that a > remote kill (ie SIGTERM) facility is such a great idea. Of course, cancelling runaway queries on Oracle is only a necessity if the DBA hasn't made use of resource limits - PROFILEs. ;-) Mike Mascari
Neil Conway said: > On 2-Apr-04, at 5:16 PM, Andrew Dunstan wrote: >> If' we're going to have this shouldn't it be a proper command? > > Why? What benefit would this offer over implementing this feature as a > function? > psql help cheers andrew
>> Killing backends with runaway queries is a routine administrative >> task. > >Cancelling runaway queries is a routine task. I'm less >convinced that a >remote kill (ie SIGTERM) facility is such a great idea. Consider a scenario like: User A starts transaction. User A issues a LOCK TABLE (or does something to lock it) User A goes on vacation without commit/rollback User A might well be Program A instead, of course. Caught in a tight loop, waiting for user input, or whatever. In this case, SIGINT (query cancel) will not help, because all locks held by the transaction will still be held. If there was a way to "force rollback" a connection, that could be done. Buf AFAIK there are none? And would those be safer/better than terminating the backend? //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > In this case, SIGINT (query cancel) will not help, because all locks > held by the transaction will still be held. Wrong. regards, tom lane
> > In this case, SIGINT (query cancel) will not help, because > all locks > > held by the transaction will still be held. > > Wrong. Really? Please point out where I am wrong in this: SESSION A: BEGIN TRANSACTION SESSION A: LOCK TABLE foo IN ACCESS EXCLUSIVE MODE SESSION B: SELECT * FROM foo <-- blocks (expected)! not-in-session: KILL -INT <pid_of_session_A>(at this point, <pid_of_session_A> is listed as "idle in transaction") SESSION B: <-- still blcoks.... SESSION A: ROLLBACK SESSION B: <-- now completes If I kill it with TERM instead of INT, SESSION B is released at that point. Am I missing something very obvious here? This is on:PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3 //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: >>> In this case, SIGINT (query cancel) will not help, because >>> all locks held by the transaction will still be held. >> >> Wrong. > Really? [ experiments... ] My apologies --- you are correct about the present behavior. If a SIGINT arrives while waiting for client input, it's just dropped on the floor. The locks *will* be dropped if the SIGINT arrives during actual query processing. It strikes me that this is incorrect behavior, at least for the case where the client has a transaction block open. It'd be better to define the interrupt as "transaction cancel". regards, tom lane
On Fri, 2004-04-02 at 16:52, Magnus Hagander wrote: > Hi! > > When debugging on win32, I've created myself a little function that I > feel should be added to the "backend proper". While it adds a lot of > vlaue on win32, I think it adds quite a bit of value on non-win32 > platforms as well... > > The function is pg_kill_backend(<pid>,<signal>). superuser-only, of > course. Which simply sends a signal to the specified backend - > querycancel, terminate, etc. Nice.. My new favourite command is going to be: SELECT pg_kill_backend(procpid, 'TERM') FROM pg_stat_activity WHERE current_query LIKE '<IDLE>%'; -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc
Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <pgman@candle.pha.pa.us> writes: >> > Seems like useful functionality. Right now, how does an administrator >> > kill another backend from psql? They can't. >> >> The question to ask is "should they be able to?" >> >> I think any such facility is inherently a security risk, since it means >> that a remote attacker who's managed to break into your superuser >> account can randomly zap other backends. Now admittedly there's plenty >> of other mischief he can do with superuser privs, but that doesn't mean >> we should hand him a pre-loaded, pre-sighted cannon. >> >> Having to log into the database server locally to execute such >> operations doesn't seem that bad to me. > > If they can read/write your data (as superuser), killing backends is the > least worry. Even as superuser, they still need to get a lock to drop the table. So killing other backends will ... This is so pointless. If an attacker manages to become superuser in the compromised database, what good are restrictions against killing backends? I agree that it should be restricted to backends, with an identification based on Xid and SIGINT. But that's it. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
> >>> In this case, SIGINT (query cancel) will not help, because > >>> all locks held by the transaction will still be held. > >> > >> Wrong. > > > Really? > > [ experiments... ] My apologies --- you are correct about > the present behavior. If a SIGINT arrives while waiting for > client input, it's just dropped on the floor. The locks > *will* be dropped if the SIGINT arrives during actual query > processing. > > It strikes me that this is incorrect behavior, at least for > the case where the client has a transaction block open. It'd > be better to define the interrupt as "transaction cancel". That sounds reasonable. That would certainly solve this part. That leaves justo ne part of the issue - ability to "kill off" idle backends. Since they do use up backend "slots" (of which there is a limited number), I still think this wuold be good. Dunno if it might be possible to use the same signal for that, under the following scheme: if (query_running) cancel_query abort transaction else if (idle in transaction) abort transaction else if (idle not in transaction) graceful shutdown or if that is too confusing? Fromt he "idle not in transaction", it should be possibleo treat it the same way a "connection lost" would be, no? (Instead of forcibly going down as SIGTERM does). With that, I would definitly think limiting to SIGINT would be enough. A question on using xids for identification instead of PIDs: is there a xid assigned to a process thati s in the state "idle not in transaction"? Otherwise, there still needs to be a way to send a signal to those. I don't see why the PID is not a good idea (assuming only-backends-restriction, but is even that necessary if you just send SIGINT?) //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > if (query_running) > cancel_query > abort transaction > else if (idle in transaction) > abort transaction > else if (idle not in transaction) > graceful shutdown > or if that is too confusing? Too hazardous. Say you meant to kill a slow query, only it finishes just before you press RETURN. The above definition silently promotes your query cancel into a session kill, which is not what you intended. (Silently promoting query cancel to transaction cancel seems okay to me, since a successful query cancel aborts the xact anyway. However this might need closer thought to consider what will happen with nested xacts.) > A question on using xids for identification instead of PIDs: is there a > xid assigned to a process thati s in the state "idle not in > transaction"? No, and I assert you do not need one if the facility is limited to query or transaction cancel; more, an XID-based identification would be less prone to race conditions of the kind just mentioned (you couldn't mistakenly zap the transaction after the one you meant to). If we are going to allow session kill then of course we need PID for that. regards, tom lane
> > if (query_running) > > cancel_query > > abort transaction > > else if (idle in transaction) > > abort transaction > > else if (idle not in transaction) > > graceful shutdown > > > or if that is too confusing? > > Too hazardous. Say you meant to kill a slow query, only it > finishes just before you press RETURN. The above definition > silently promotes your query cancel into a session kill, > which is not what you intended. Ok. That's a good point :-) > (Silently promoting query cancel to transaction cancel seems > okay to me, since a successful query cancel aborts the xact > anyway. However this might need closer thought to consider > what will happen with nested > xacts.) Sounds good to me. > > A question on using xids for identification instead of > PIDs: is there > > a xid assigned to a process thati s in the state "idle not in > > transaction"? > > No, and I assert you do not need one if the facility is > limited to query or transaction cancel; more, an XID-based > identification would be less prone to race conditions of the > kind just mentioned (you couldn't mistakenly zap the > transaction after the one you meant to). Yeah, under that assumption, it certainly is enough. > If we are going to allow session kill then of course we need > PID for that. I still say we need this. Is there some other good mechanism to do this that you see? Since SIGTERM is not good... (I for one would definitly be happy enough with being able to drop something that's in "idle without transaction", since you can always use SIGINT to get it down to that level) //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: >> If we are going to allow session kill then of course we need >> PID for that. > I still say we need this. Well, that seems to be the consensus, so I won't stand in the way. If you like the cancel-by-XID idea then I'd suggest providing two functions: pg_cancel_query(XID) to send SIGINT and pg_kill_session(PID) to send SIGTERM. I'm not sure if the other two combinations (SIGINT by PID and SIGTERM by XID) are worth providing. regards, tom lane
Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: > >> If we are going to allow session kill then of course we need > >> PID for that. > > > I still say we need this. > > Well, that seems to be the consensus, so I won't stand in the way. > > If you like the cancel-by-XID idea then I'd suggest providing two > functions: pg_cancel_query(XID) to send SIGINT and pg_kill_session(PID) > to send SIGTERM. I'm not sure if the other two combinations (SIGINT by > PID and SIGTERM by XID) are worth providing. Agreed. Two functions, cancel by xid (skips cancel if xid changes during command), and kill by pid for remote admin apps, is great. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Tom Lane wrote: > > "Magnus Hagander" <mha@sollentuna.net> writes: > > >> If we are going to allow session kill then of course we need > > >> PID for that. > > > > > I still say we need this. > > > > Well, that seems to be the consensus, so I won't stand in the way. > > > > If you like the cancel-by-XID idea then I'd suggest providing two > > functions: pg_cancel_query(XID) to send SIGINT and > > pg_kill_session(PID) to send SIGTERM. I'm not sure if the > other two > > combinations (SIGINT by PID and SIGTERM by XID) are worth providing. > > Agreed. Two functions, cancel by xid (skips cancel if xid > changes during command), and kill by pid for remote admin > apps, is great. This sounds good to me, too. (Though I think SIGINT by PID might be good, I see no need for SIGTERM by XID). But are you saying it *is* safe with SIGTERM to a backend? I always thought it was before, but after what you said earlier in this thread, I changed my mind... (Unfort, I'm not well enough into the deep ends of the code to say which is right...) //Magnus
Tom Lane wrote: > "Magnus Hagander" <mha@sollentuna.net> writes: > > But are you saying it *is* safe with SIGTERM to a backend? > > I'm saying I'm not happy about promoting that to the status of a > supported feature. Up to now it's always been "if it breaks you > get to keep both pieces", but if there's a built-in function to do it > then people will naturally expect a certain level of reliability. > > Maybe it works fine, maybe it doesn't. But I foresee having to expend > work down the road because of this addition. I assume admins are already doing this (via kill), so whether it is supported or not, we should give folks a safe way to do this. This seems like basic functionality we should be giving admins, even if it does take some work on our part. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I assume admins are already doing this (via kill), so whether it is > supported or not, we should give folks a safe way to do this. I don't think it's an open-and-shut decision as to whether people actually *need* to do session kills (as opposed to query/transaction kills). The arguments presented so far are not convincing to my mind, certainly not convincing enough to buy into a commitment to do whatever it takes to support that. The fact that people do it doesn't make it a good idea. regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> writes: > But are you saying it *is* safe with SIGTERM to a backend? I'm saying I'm not happy about promoting that to the status of a supported feature. Up to now it's always been "if it breaks you get to keep both pieces", but if there's a built-in function to do it then people will naturally expect a certain level of reliability. Maybe it works fine, maybe it doesn't. But I foresee having to expend work down the road because of this addition. regards, tom lane
Josh Berkus wrote: > Tom, > > > I don't think it's an open-and-shut decision as to whether people > > actually *need* to do session kills (as opposed to query/transaction > > kills). The arguments presented so far are not convincing to my mind, > > certainly not convincing enough to buy into a commitment to do whatever > > it takes to support that. > > Hmmm ... well, I can make a real-world case from my supported apps for > transaction/statement kills. But my support for session kills is just > hypothetical; any time I've had to kill off sessions, it's because I had to > shut the database down, and that's better done from the command line. > > My web apps which need to manage the number of connections do it through their > connection pool. > > So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom > thinks there will be any significant support & troubleshooting involved for > the latter. > > Unless, of course, someone can give us a real business case that they have > actually encountered in production. Someone already posted some pseudocode where they wanted to kill idle backends, perhaps as part of connection pooling. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > Someone already posted some pseudocode where they wanted to kill idle > backends, perhaps as part of connection pooling. I'm not talking about code. I'm talking about a *reason*. i.e.: "I'm administrator of the blah-blah project. We had a lot of trouble managing idle connections to PG because of blah-blah. A function to kill off idle connctions would really help us becuase blah-blah-blah." So far, all we've heard in favor of SIGTERM-by-PID are *hypothetical* cases. Now Tom's telling us that there is a real cost attached to having this feature. Before we do it anyway, I want to be convinced that someone really needs it. It is *not* our practice to add features "just because we can." Otherwise, I'll stick by my assertion that idle connection management should be done in the middleware and NOT by psql. -- -Josh BerkusAglio Database SolutionsSan Francisco
Tom, > I don't think it's an open-and-shut decision as to whether people > actually *need* to do session kills (as opposed to query/transaction > kills). The arguments presented so far are not convincing to my mind, > certainly not convincing enough to buy into a commitment to do whatever > it takes to support that. Hmmm ... well, I can make a real-world case from my supported apps for transaction/statement kills. But my support for session kills is just hypothetical; any time I've had to kill off sessions, it's because I had to shut the database down, and that's better done from the command line. My web apps which need to manage the number of connections do it through their connection pool. So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom thinks there will be any significant support & troubleshooting involved for the latter. Unless, of course, someone can give us a real business case that they have actually encountered in production. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom > thinks there will be any significant support & troubleshooting involved for > the latter. Quite honestly, I don't know. We know that some people have done manual SIGTERMs and not been burnt; and I'm not aware of any reason why it wouldn't work; but I don't think it's well enough tested to be sure that it will work. The sort of problem that I fear could arise is analogous to the problem with kill -9'ing the postmaster: sure, the process is gone and the database on disk is okay, but there might be resource leaks or other problems left behind in shared memory. We would not see such problems in normal use because SIGTERM is associated with complete database shutdown and release of shared memory. With retail SIGTERM and leaving the database up, though, it's a whole new ballgame and the cleanup requirements become much stricter. So like I say, I'm hesitant to buy into supporting this without a fairly convincing argument that it's really needed. regards, tom lane
Bruce, > OK, you have a runaway report. You want to stop it. Query cancel is > only going to stop the current query, and once you do that the next > query is fed in so there is no way to actually stop the report, > especially if the report is not being run from the same machine as the > server (you can't kill the report process). How do you stop it without > SIGTERM? You don't want to shut down the postmaster. Hmmm ... but, at least in the case of my apps, killing the PG connection wouldn't fix things. Most apps I work on are designed to detect connection failure and reconnect. I suspect that most platforms that use connection pooling are the same. So your case would only work if you actually blocked all connections from that host -- not a capability we'd discussed. -- -Josh BerkusAglio Database SolutionsSan Francisco
Josh Berkus wrote: > Bruce, > > > Someone already posted some pseudocode where they wanted to kill idle > > backends, perhaps as part of connection pooling. > > I'm not talking about code. I'm talking about a *reason*. > > i.e.: "I'm administrator of the blah-blah project. We had a lot of trouble > managing idle connections to PG because of blah-blah. A function to kill > off idle connctions would really help us becuase blah-blah-blah." > > So far, all we've heard in favor of SIGTERM-by-PID are *hypothetical* cases. > Now Tom's telling us that there is a real cost attached to having this > feature. Before we do it anyway, I want to be convinced that someone really > needs it. It is *not* our practice to add features "just because we can." > > Otherwise, I'll stick by my assertion that idle connection management should > be done in the middleware and NOT by psql. OK, you have a runaway report. You want to stop it. Query cancel is only going to stop the current query, and once you do that the next query is fed in so there is no way to actually stop the report, especially if the report is not being run from the same machine as the server (you can't kill the report process). How do you stop it without SIGTERM? You don't want to shut down the postmaster. In fact, query cancel is actually more dangerous in this case because unless the report is designed to handle statement failures, it might just keep running and produce incorrect results because you canceled some random queries in the report. (I am thinking specifically of a psql batch job here.) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > If there is a problem, maybe we can fix it, or perhap have the kill > function use SIGINT, then wait for the query to cancel, then SIGTERM. Well, if someone could prove that the SIGTERM path is equivalent to a transaction-aborting error followed by normal client disconnect, I'd feel a lot better about its reliability. We know those two code paths work well. Right now I do not think they are equivalent, but maybe they could be made so. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > If there is a problem, maybe we can fix it, or perhap have the kill > > function use SIGINT, then wait for the query to cancel, then SIGTERM. > > Well, if someone could prove that the SIGTERM path is equivalent to > a transaction-aborting error followed by normal client disconnect, > I'd feel a lot better about its reliability. We know those two code > paths work well. As far as I know, most database systems have the ability to kill an individual session, and I think we need that functionality too, as I described in my report example. > Right now I do not think they are equivalent, but maybe they could be > made so. Let me look into it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Josh Berkus wrote: > Bruce, > > > OK, you have a runaway report. You want to stop it. Query cancel is > > only going to stop the current query, and once you do that the next > > query is fed in so there is no way to actually stop the report, > > especially if the report is not being run from the same machine as the > > server (you can't kill the report process). How do you stop it without > > SIGTERM? You don't want to shut down the postmaster. > > Hmmm ... but, at least in the case of my apps, killing the PG connection > wouldn't fix things. Most apps I work on are designed to detect connection > failure and reconnect. I suspect that most platforms that use connection > pooling are the same. So your case would only work if you actually blocked > all connections from that host -- not a capability we'd discussed. I don't think most apps reconnect on disconnect, except maybe pooled connections where you don't expect your state to be stable between connections. Certainly most reports can't just reconnect and keep going. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, you have a runaway report. You want to stop it. Query cancel is > only going to stop the current query, and once you do that the next > query is fed in so there is no way to actually stop the report, > especially if the report is not being run from the same machine as the > server (you can't kill the report process). > I don't think most apps reconnect on disconnect, except maybe pooled > connections where you don't expect your state to be stable between > connections. Certainly most reports can't just reconnect and keep > going. You're hypothecating a report generator that can recover from failed queries, but not a failed connection? Seems a rather contrived case. Stupid apps are most likely gonna curl up and die on any unexpected error (which is what the query cancel would look like to them). Smart apps may try harder to recover than you think. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, you have a runaway report. You want to stop it. Query cancel is > > only going to stop the current query, and once you do that the next > > query is fed in so there is no way to actually stop the report, > > especially if the report is not being run from the same machine as the > > server (you can't kill the report process). > > > I don't think most apps reconnect on disconnect, except maybe pooled > > connections where you don't expect your state to be stable between > > connections. Certainly most reports can't just reconnect and keep > > going. > > You're hypothecating a report generator that can recover from failed > queries, but not a failed connection? Seems a rather contrived case. > Stupid apps are most likely gonna curl up and die on any unexpected > error (which is what the query cancel would look like to them). Smart > apps may try harder to recover than you think. I figured some reports just continue on failed queries. Is that accurate? I don't know, but I know a psql script will continue, no? Will psql return an error code on exit from cancel? I think it does but am not sure. First people objected to this on security grounds, now that those were beat down, we have use-case complaints. Not having a way to kill backends is like having no way to kill a process except rebooting the server. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgman wrote: > Josh Berkus wrote: > > Tom, > > > > > I don't think it's an open-and-shut decision as to whether people > > > actually *need* to do session kills (as opposed to query/transaction > > > kills). The arguments presented so far are not convincing to my mind, > > > certainly not convincing enough to buy into a commitment to do whatever > > > it takes to support that. > > > > Hmmm ... well, I can make a real-world case from my supported apps for > > transaction/statement kills. But my support for session kills is just > > hypothetical; any time I've had to kill off sessions, it's because I had to > > shut the database down, and that's better done from the command line. > > > > My web apps which need to manage the number of connections do it through their > > connection pool. > > > > So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom > > thinks there will be any significant support & troubleshooting involved for > > the latter. > > > > Unless, of course, someone can give us a real business case that they have > > actually encountered in production. > > Someone already posted some pseudocode where they wanted to kill idle > backends, perhaps as part of connection pooling. Tom, if you have concerns about SIGTERM while other backends keep running, would you share those. (Holding locks, shared memory?) I looked at die(), and it seemed pretty safe to me. It just sets some variables and returns. It is not like quickdie that calls exit(). If there is a problem, maybe we can fix it, or perhap have the kill function use SIGINT, then wait for the query to cancel, then SIGTERM. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Not having a way to kill backends is like having no way to kill a > process except rebooting the server. Some people think that making a database hard to kill is a good thing. regards, tom lane
On Tue, 2004-04-06 at 15:23, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom > > thinks there will be any significant support & troubleshooting involved for > > the latter. > > So like I say, I'm hesitant to buy into supporting this without a fairly > convincing argument that it's really needed. It doesn't necessarily have to be a SIGTERM. The goal is to get rid of unwanted idlers (connections). Could SIGINT be extended with a command telling the daemon to shutdown or rollback the transaction as requested?
On Tue, 2004-04-06 at 15:10, Josh Berkus wrote: > Bruce, > > > OK, you have a runaway report. You want to stop it. Query cancel is > > only going to stop the current query, and once you do that the next > > query is fed in so there is no way to actually stop the report, > > especially if the report is not being run from the same machine as the > > server (you can't kill the report process). How do you stop it without > > SIGTERM? You don't want to shut down the postmaster. > > Hmmm ... but, at least in the case of my apps, killing the PG connection > wouldn't fix things. Most apps I work on are designed to detect connection > failure and reconnect. I suspect that most platforms that use connection > pooling are the same. So your case would only work if you actually blocked > all connections from that host -- not a capability we'd discussed. That would be a likely second step (go into pg_hba to block). But you still have to get rid of the idlers at some point.
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Not having a way to kill backends is like having no way to kill a > > process except rebooting the server. > > Some people think that making a database hard to kill is a good thing. I can't argue with that. :-) I am researching the SIGTERM processing right now and will post in a few minutes. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Rod Taylor wrote: > On Tue, 2004-04-06 at 15:23, Tom Lane wrote: > > Josh Berkus <josh@agliodbs.com> writes: > > > So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom > > > thinks there will be any significant support & troubleshooting involved for > > > the latter. > > > > So like I say, I'm hesitant to buy into supporting this without a fairly > > convincing argument that it's really needed. > > It doesn't necessarily have to be a SIGTERM. The goal is to get rid of > unwanted idlers (connections). Could SIGINT be extended with a command > telling the daemon to shutdown or rollback the transaction as requested? Nope, a signal is just a signal with no other info passed. We could add it, but it would be more code. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > If there is a problem, maybe we can fix it, or perhap have the kill > > function use SIGINT, then wait for the query to cancel, then SIGTERM. > > Well, if someone could prove that the SIGTERM path is equivalent to > a transaction-aborting error followed by normal client disconnect, > I'd feel a lot better about its reliability. We know those two code > paths work well. > > Right now I do not think they are equivalent, but maybe they could be > made so. I just did a whitespace-ignore diff of the SIGINT (which is cancel and we know works), and SIGTERM (which is under study). Here is a diff of SIGINT vs. SIGTERM functions: 1,2c1,2< static void< StatementCancelHandler(SIGNAL_ARGS)---> void> die(SIGNAL_ARGS)6,10c6,7< /*< * Don't joggle theelbow of proc_exit, nor an already-in-progress< * abort< */< if (!proc_exit_inprogress && !InError)---> /* Don'tjoggle the elbow of proc_exit */> if (!proc_exit_inprogress)13c10< QueryCancelPending = true;---> ProcDiePending= true;16,18c13,14< * If it's safe to interrupt, and we're waiting for a lock,< * service theinterrupt immediately. No point in interrupting if< * we're waiting for input, however.---> * If it's safeto interrupt, and we're waiting for input or a> * lock, service the interrupt immediately26,33c22,26< if (LockWaitCancel())< {< DisableNotifyInterrupt();< InterruptHoldoffCount--;< ProcessInterrupts();< }< else< InterruptHoldoffCount--;---> DisableNotifyInterrupt();> /* Make sure CheckDeadLock won't run while shutting down... */> LockWaitCancel();> InterruptHoldoffCount--;> ProcessInterrupts(); The big difference seems to be the InError test, and the test in SIGINT whether LockWaitCancel() actually returns true or false. Also, DisableNotifyInterrupt() is called before LockWaitCancel(). Also, one sets QueryCancelPending and the other ProcDiePending. Those are handled by:voidProcessInterrupts(void){ /* OK to accept interrupt now? */ if (InterruptHoldoffCount != 0 || CritSectionCount!= 0) return; InterruptPending = false; if (ProcDiePending) { ProcDiePending = false; QueryCancelPending = false; /* ProcDie trumps QueryCancel */ ImmediateInterruptOK = false; /* notidle anymore */ DisableNotifyInterrupt(); ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection due to administrator command"))); } if (QueryCancelPending) { QueryCancelPending = false; ImmediateInterruptOK = false; /* not idle anymore */ DisableNotifyInterrupt(); ereport(ERROR, (errcode(ERRCODE_QUERY_CANCELED), errmsg("canceling query due to user request"))); } /* If we get here, do nothing (probably, QueryCancelPending wasreset) */} and the only difference here seems to be the elog(SHUTDOWN) call. On first glance, I don't see anything dangerous about SIGTERM. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
The "function to kill backend" seems no longer in doubt, but the *reason* is still blurred, other than we don't want to bring down the postmaster to do this. So far, reasons given have been: 1. to kill idlers 2. to kill runaway queries/statements, which has transaction implications I'd like to be able to do both of those, though they are fairly different situations, perhaps with different implementations. >Tom Lane writes > Some people think that making a database hard to kill is a good thing. I'm with Tom on this. I'm working on PITR...what happens when you shut down backends uncleanly...how will we test all the possible event/states to ensure xlog history doesn't get broken? Please lets work on a "humane" approach to handling offending sessions...IMHO the various signals mentioned are merely blunt instruments; we have been exhorted daily not to "kill the postmaster", now we seem to be suggesting that we allow that to happen to backends. Isn't there a better way...? Other RDBMSs support the ability to terminate specific sessions, but none of them to my knowledge offer this facility by means of an external executable, which has many implications - how would you know (prove) it had been used? what security protection does the DBA have over its use? The way forward seems safest if this is a command, not an external executable. e.g. ALTER SYSTEM STOP BACKEND <x>. That way we have control over the implementation/porting, security, logging/audit. Anybody that wants to can then wrap that in a script if they choose. Internally, we could then implement it however we chose. Best Regards, Simon Riggs
Simon Riggs wrote: > > The "function to kill backend" seems no longer in doubt, but the > *reason* is still blurred, other than we don't want to bring down the > postmaster to do this. > So far, reasons given have been: > 1. to kill idlers > 2. to kill runaway queries/statements, which has transaction > implications > > I'd like to be able to do both of those, though they are fairly > different situations, perhaps with different implementations. > > >Tom Lane writes > > Some people think that making a database hard to kill is a good thing. > > I'm with Tom on this. I'm working on PITR...what happens when you shut > down backends uncleanly...how will we test all the possible event/states > to ensure xlog history doesn't get broken? Please lets work on a > "humane" approach to handling offending sessions...IMHO the various > signals mentioned are merely blunt instruments; we have been exhorted > daily not to "kill the postmaster", now we seem to be suggesting that we > allow that to happen to backends. Isn't there a better way...? > > Other RDBMSs support the ability to terminate specific sessions, but > none of them to my knowledge offer this facility by means of an external > executable, which has many implications - how would you know (prove) it > had been used? what security protection does the DBA have over its use? > > The way forward seems safest if this is a command, not an external > executable. e.g. ALTER SYSTEM STOP BACKEND <x>. That way we have control > over the implementation/porting, security, logging/audit. Anybody that > wants to can then wrap that in a script if they choose. > > Internally, we could then implement it however we chose. The current plan is to create server-side functions to do this. We believe sending a SIGTERM to a backend via kill already has this effect. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
>Bruce Momjian > Simon Riggs wrote: > > The "function to kill backend" seems no longer in doubt, but the > > *reason* is still blurred, other than we don't want to > bring down the > > postmaster to do this. > > So far, reasons given have been: > > 1. to kill idlers > > 2. to kill runaway queries/statements, which has transaction > > implications > > > > I'd like to be able to do both of those, though they are fairly > > different situations, perhaps with different implementations. > > > > >Tom Lane writes > > > Some people think that making a database hard to kill is > a good thing. > > > > I'm with Tom on this. I'm working on PITR...what happens > when you shut > > down backends uncleanly...how will we test all the possible > event/states > > to ensure xlog history doesn't get broken? Please lets work on a > > "humane" approach to handling offending sessions...IMHO the various > > signals mentioned are merely blunt instruments; we have > been exhorted > > daily not to "kill the postmaster", now we seem to be > suggesting that we > > allow that to happen to backends. Isn't there a better way...? > > > > Other RDBMSs support the ability to terminate specific sessions, but > > none of them to my knowledge offer this facility by means > of an external > > executable, which has many implications - how would you > know (prove) it > > had been used? what security protection does the DBA have > over its use? > > > > The way forward seems safest if this is a command, not an external > > executable. e.g. ALTER SYSTEM STOP BACKEND <x>. That way we > have control > > over the implementation/porting, security, logging/audit. > Anybody that > > wants to can then wrap that in a script if they choose. > > > > Internally, we could then implement it however we chose. > > The current plan is to create server-side functions to do this. We > believe sending a SIGTERM to a backend via kill already has > this effect. If I understand you then, you believe you have found a way to short cut implementing the server-side function. Even if the primary behaviour is indeed exactly similar, are the secondary and subsequent behaviours similar also? Does your plan allow for: i) logging the activity (both the send and the effect of receiving it) ii) full security control of the facility other secondary behaviours... Forgive me if I become over-protective on robustness issues... Best Regards, Simon
Bruce Momjian wrote: >pgman wrote: > > >>Josh Berkus wrote: >> >> >>>Tom, >>> >>> >>> >>>>I don't think it's an open-and-shut decision as to whether people >>>>actually *need* to do session kills (as opposed to query/transaction >>>>kills). The arguments presented so far are not convincing to my mind, >>>>certainly not convincing enough to buy into a commitment to do whatever >>>>it takes to support that. >>>> >>>> >>>Hmmm ... well, I can make a real-world case from my supported apps for >>>transaction/statement kills. But my support for session kills is just >>>hypothetical; any time I've had to kill off sessions, it's because I had to >>>shut the database down, and that's better done from the command line. >>> >>>My web apps which need to manage the number of connections do it through their >>>connection pool. >>> >>>So I would vote for Yes on SIGINT by XID, but No on SIGTERM by PID, if Tom >>>thinks there will be any significant support & troubleshooting involved for >>>the latter. >>> >>>Unless, of course, someone can give us a real business case that they have >>>actually encountered in production. >>> >>> >>Someone already posted some pseudocode where they wanted to kill idle >>backends, perhaps as part of connection pooling. >> >> > >Tom, if you have concerns about SIGTERM while other backends keep >running, would you share those. (Holding locks, shared memory?) I >looked at die(), and it seemed pretty safe to me. It just sets some >variables and returns. It is not like quickdie that calls exit(). > >If there is a problem, maybe we can fix it, or perhap have the kill >function use SIGINT, then wait for the query to cancel, then SIGTERM. > > > One other option to consider while your looking at killing off children as an admin is the ability to switch the database to single user mode for the admin via psql or other connection interface. In essence shutdown all other transactions and session and limit it to only one so that maintenance issues (reindexing, etc) can be done remotely and do not require shell access on the host machine. I think, ultimately, this may be what is being asked for: a way to manage postgresql internally via SQL or other functions rather than relying on user access to the machine (ie sending signals to processes). This is what some people have wanted to do with managing the connection settings in pg_hba.conf via a system table rather than or in addition to a file.
Simon Riggs wrote: > > > The way forward seems safest if this is a command, not an external > > > executable. e.g. ALTER SYSTEM STOP BACKEND <x>. That way we > > have control > > > over the implementation/porting, security, logging/audit. > > Anybody that > > > wants to can then wrap that in a script if they choose. > > > > > > Internally, we could then implement it however we chose. > > > > The current plan is to create server-side functions to do this. We > > believe sending a SIGTERM to a backend via kill already has > > this effect. > > If I understand you then, you believe you have found a way to short cut > implementing the server-side function. No, we are going to create server-side function that mimick SIGINT (though using xid, we can ignore if the xid change), and SIGTERM. Server-side functions are required for clients not running on the server. > Even if the primary behaviour is indeed exactly similar, are the > secondary and subsequent behaviours similar also? > You mean calling them several times? Well, SIGTERM will terminate, and SIGINT will keep canceling xid's, though you have to update the xid to the newer xid because you canceled the old one. > Does your plan allow for: > i) logging the activity (both the send and the effect of receiving it) > ii) full security control of the facility > other secondary behaviours... > > Forgive me if I become over-protective on robustness issues... No, these are super-user-only commands. Not much more we can do on that front. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > On first glance, I don't see anything dangerous about SIGTERM. You haven't thought about it very hard :-( The major difference I see is that elog(FATAL) will call proc_exit directly from elog, rather than longjmp'ing back to PostgresMain. The case that we have confidence in involves elog(ERROR) returning to PostgresMain and then calling proc_exit from there (in the path where we get EOF from the client). This leaves me with a couple of concerns: * Notice all that cleanup/reset stuff in the "if (sigsetjmp())" block in PostgresMain. SIGTERM will cause proc_exit to be entered without any of that being done first. Does it work reliably? Shouldn't this be refactored to ensure the same things happen in both cases? * There are various places, especially in the PLs, that try to hook into error recovery by manipulating Warn_restart. Will any of them have problems if their error recovery code doesn't get called during SIGTERM exit? One possible refactoring is for elog(FATAL) to go ahead and longjmp back to PostgresMain, and at the end of the error recovery block check a flag and do proc_exit() if we're fataling. However I am not sure that this doesn't break the design for coping with elog's during proc_exit. Alvaro's nested-transaction work is another thing that's got to be thought about before touching this code. I have not yet seen any design for error recovery in the nested xact case, but I am sure it's going to need some changes right around here. regards, tom lane
> Otherwise, I'll stick by my assertion that idle connection management should > be done in the middleware and NOT by psql. Perhaps it should be, but as PostgreSQL picks up more and more vendor applications this is difficult for the person administrating the database. Consider a 3rd party application which has the issue but the support contract is such that you cannot affect the application itself (their support staff deals with it). If you need connections in the database for other applications and this 3rd party program is idling on several slots... -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Not having a way to kill backends is like having no way to kill a > > process except rebooting the server. > > Some people think that making a database hard to kill is a good thing. Sure. But we're not talking about taking down the whole database, we're talking about taking down a connection. Killing the database is the equivalent of killing the OS. It should be hard to do either. But it should be easy to kill a process on an OS if you have the right permissions, and similiarly it should be easy to kill a connection to the database if you have the right permissions. With respect to nested transactions and other things that might make properly shutting down difficult, it seems to me that the SIGINT case is actually a harder case to deal with. Why? Because for the SIGTERM case, you basically have to do whatever is done whenever the connection itself drops. If we can't handle the connection itself dropping out arbitrarily then we have more serious problems than just how to handle SIGTERM. :-) But for SIGINT you have to decide whether to just abort the innermost transaction or the outermost one, and if it's the outermost one you have to abort then you have to provide the mechanism for it -- something that you might not have to deal with otherwise. So it seems that handling SIGTERM might actually be easy: you have the signal handler close the backend's side of the connection and let the connection-dropping logic kick in automatically, no? Thoughts? Am I completely off my rocker here? :-) -- Kevin Brown kevin@sysexperts.com