Thread: idle in txn query cancellation
Hi all, I know it is late in the cycle, but I still think that the current behaviour of ERRORing during execution of a query but FATALing during IDLE IN TRANSACTION is very confusing to the user. Especially as you are not even able to read the reason for getting disconnected because the client doesnt expect input in that state... The first patch adds the capability to add a flag to ereport like: ereport(ERROR | LOG_NO_CLIENT) Tom earlier suggested using COMERROR but thats just a version of LOG which doesnt report to the client. The patch makes that to be a synonym of LOG | LOG_NO_CLIENT. While its not the most pretty API I dont think its that bad because the directionality is somewhat a property of the loglevel. Beside it would generate a lot of useless noise and breakage. The second patch changes the FATAL during cancelling an idle in txn query into ERROR | LOG_NO_CLIENT. To avoid breaking the known state there also may no "ready for query" message get sent. The patch ensures that by setting and checking a "silent_error_while_idle" variable. That way the client will not see that an error occured until the next command sent but I dont think there is a solution to that in 9.0 timeframe if at all. The patch only adds that for the recovery conflict path for now. What do you think? Is it worth applying something like that now? If yes I would try to test the patch some more (obviously the patch survives the regression tests, but they do not seem to check the extended query protocol at all). One could argue that the LOG_NO_CLIENT flag should be added when a idle transaction gets terminated by force but I wouldn't bother. On a related note I would also like to get rid of the restriction that a normal query cancellation will only be done if no subtransactions are stacked. But I guess its too late for that? (I have a patch ready, some cleanup would be needed) The latter works by: - adding a explicit error code (which should be done regardless of this discussion) - avoiding to catch such error at a few places (plperl, plpython) - recursively aborting the subtransactions once the mainloop is reached - relying on the fact that the cancellation signal will get resent - possibly escalating to a FATAL if nothing happens after a certain number of tries Andres PS: I know I sort-of-promised a patch earlier, but it didn't work out time- wise.
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > I know it is late in the cycle No problem here. Thanks for your diligence. Will review. -- Simon Riggs www.2ndQuadrant.com
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > On a related note I would also like to get rid of the restriction that > a normal query cancellation will only be done if no subtransactions > are stacked. > But I guess its too late for that? (I have a patch ready, some cleanup > would be needed) > The latter works by: > - adding a explicit error code (which should be done regardless of > this > discussion) > - avoiding to catch such error at a few places (plperl, plpython) > - recursively aborting the subtransactions once the mainloop is > reached > - relying on the fact that the cancellation signal will get resent > - possibly escalating to a FATAL if nothing happens after a certain > number of tries Such an action needs to have a good, clear theoretical explanation with it to show that the interaction with savepoints is a good one. I toyed with the idea of a new level between ERROR and FATAL to allow ERRORs to be handled by savepoints still in all cases. -- Simon Riggs www.2ndQuadrant.com
On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > The first patch adds the capability to add a flag to ereport like: > ereport(ERROR | LOG_NO_CLIENT) > Tom earlier suggested using COMERROR but thats just a version of LOG > which doesnt report to the client. The patch makes that to be a > synonym of LOG | LOG_NO_CLIENT. > While its not the most pretty API I dont think its that bad because > the directionality is somewhat a property of the loglevel. Beside it > would generate a lot of useless noise and breakage. > > The second patch changes the FATAL during cancelling an idle in txn > query into ERROR | LOG_NO_CLIENT. > To avoid breaking the known state there also may no "ready for query" > message get sent. The patch ensures that by setting and checking a > "silent_error_while_idle" variable. > > That way the client will not see that an error occured until the next > command sent but I dont think there is a solution to that in 9.0 > timeframe if at all. > > The patch only adds that for the recovery conflict path for now. > > What do you think? Is it worth applying something like that now? If > yes I would try to test the patch some more (obviously the patch > survives the regression tests, but they do not seem to check the > extended query protocol at all). I think that is much better than FATAL. If it works I think we should apply it for this release. -- Simon Riggs www.2ndQuadrant.com
On Monday 15 February 2010 09:50:08 Simon Riggs wrote: > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > > The first patch adds the capability to add a flag to ereport like: > > ereport(ERROR | LOG_NO_CLIENT) > > Tom earlier suggested using COMERROR but thats just a version of LOG > > which doesnt report to the client. The patch makes that to be a > > synonym of LOG | LOG_NO_CLIENT. > > While its not the most pretty API I dont think its that bad because > > the directionality is somewhat a property of the loglevel. Beside it > > would generate a lot of useless noise and breakage. > > > > The second patch changes the FATAL during cancelling an idle in txn > > query into ERROR | LOG_NO_CLIENT. > > To avoid breaking the known state there also may no "ready for query" > > message get sent. The patch ensures that by setting and checking a > > "silent_error_while_idle" variable. > > > > That way the client will not see that an error occured until the next > > command sent but I dont think there is a solution to that in 9.0 > > timeframe if at all. > > > > The patch only adds that for the recovery conflict path for now. > > > > What do you think? Is it worth applying something like that now? If > > yes I would try to test the patch some more (obviously the patch > > survives the regression tests, but they do not seem to check the > > extended query protocol at all). > > I think that is much better than FATAL. If it works I think we should > apply it for this release. It does work for me at least ;-). I only have marginal testing with the extended query protocol though and I think the error message needs to get improved somewhat. I plan to make testing the extended query protocol easier by making pgbench able to restart after a such an error (thats why I like the seperate error code for such cancellations...) The problem with the error message is, that errdetail_abort() uses MyProc- >recoveryConflictPending which is already unset when the errdetail is used. Unless you beat me I plan to provide a patch here (havent looked at how to do so yet though). Andres
On Monday 15 February 2010 09:47:09 Simon Riggs wrote: > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > > On a related note I would also like to get rid of the restriction that > > a normal query cancellation will only be done if no subtransactions > > are stacked. > > But I guess its too late for that? (I have a patch ready, some cleanup > > would be needed) > > The latter works by: > > - adding a explicit error code (which should be done regardless of > > this > > discussion) > > - avoiding to catch such error at a few places (plperl, plpython) > > - recursively aborting the subtransactions once the mainloop is > > reached > > - relying on the fact that the cancellation signal will get resent > > - possibly escalating to a FATAL if nothing happens after a certain > > number of tries > > Such an action needs to have a good, clear theoretical explanation with > it to show that the interaction with savepoints is a good one. I can provide a bit more explanation. The patch (other thread) already added some more comments but its definitely good to explain/define some more. Will post that to the thread with the patch, ok? > I toyed with the idea of a new level between ERROR and FATAL to allow > ERRORs to be handled by savepoints still in all cases. I have a hard time believing that it will help in that situation. Either you allow cleaning up process local resources in PG_TRY/PG_TRY in which situation you cant abort recursively at all places because the catching code block may very well reference resources associated with that snapshot or you abort the process in a way that there are no process local resources. How would the middleway between those work? Andres
On Sunday 14 February 2010 06:29:45 Simon Riggs wrote: > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > > I know it is late in the cycle > > No problem here. Thanks for your diligence. Will review. Got a chance to look at it? Andres
On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote: > On Sunday 14 February 2010 06:29:45 Simon Riggs wrote: > > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > > > I know it is late in the cycle > > > > No problem here. Thanks for your diligence. Will review. > Got a chance to look at it? I need to spend my time on ensuring we can avoid the cancellation altogether, so I apologise for not reviewing. That's not a comment on your work or the possible effectiveness of the patch. Possibly others have the time to review? -- Simon Riggs www.2ndQuadrant.com
Hi Simon, On Sunday 14 March 2010 20:12:00 Simon Riggs wrote: > On Sun, 2010-03-14 at 19:50 +0100, Andres Freund wrote: > > On Sunday 14 February 2010 06:29:45 Simon Riggs wrote: > > > On Sat, 2010-02-13 at 22:37 +0100, Andres Freund wrote: > > > > I know it is late in the cycle > > > > > > No problem here. Thanks for your diligence. Will review. > > > > Got a chance to look at it? > > I need to spend my time on ensuring we can avoid the cancellation > altogether, so I apologise for not reviewing. That's not a comment on > your work or the possible effectiveness of the patch. Possibly others > have the time to review? I guess that wont go anywhere before 9.1? I think at least the error code should be adjusted before 9.0. Andres