Re: Hot Standy introduced problem with query cancel behavior - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Hot Standy introduced problem with query cancel behavior |
Date | |
Msg-id | 1263285603.19367.171544.camel@ebony Whole thread Raw |
In response to | Re: Hot Standy introduced problem with query cancel behavior (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Hot Standy introduced problem with query cancel behavior
Re: Hot Standy introduced problem with query cancel behavior |
List | pgsql-hackers |
On Tue, 2010-01-12 at 06:30 +0100, Andres Freund wrote: > On 01/07/10 22:37, Andres Freund wrote: > > On Thursday 07 January 2010 22:28:46 Tom Lane wrote: > >> Andres Freund<andres@anarazel.de> writes: > >>> I did not want to suggest using Simons code there. Sorry for the brevity. > >>> should have read as "revert to old code and add two step killing (thats > >>> likely around 10 lines or so)". > >>> > >>> "two step killing" meaning that we signal ERROR for a few times and if > >>> nothing happens that we like, we signal FATAL. > >>> As the code already loops around signaling anyway that should be easy to > >>> implement. > >> Ah. This loop happens in the process that's trying to send the cancel > >> signal, correct, not the one that needs to respond to it? That sounds > >> fairly sane to me. > > Yes. > > > > > >> There are some things we could do to make it more likely that a cancel > >> of this type is accepted --- for instance, give it a distinct SQLSTATE > >> code that *can not* be trapped by plpgsql EXCEPTION blocks --- but there > >> is no practical way to guarantee it except elog(FATAL). I'm not > >> entirely convinced that an untrappable error would be a good thing > >> anyway; it's hard to argue that that's much better than a FATAL. > > Well a session which is usable after a transaction abort is quite sensible - > > quite some software I know handles a failing transaction much more gracefully > > than a session abort (e.g. because it has to deal with serialization failures > > and such anyway). > > > > So making it cought by fewer places and degrading to FATAL sound sensible and > > relatively easy to me. > > Unless somebody disagrees I will give it a shot. > Ok, here is a stab at that: > > 1. Allow the signal multiplexing facility to transfer one sig_atomic_t > worth of data. This is usefull e.g. for making operations idempotent > or more precise. > > In this the LocalBackendId is transported - that should make it > impossible to cancel the wrong transaction This doesn't remove the possibility of cancelling the wrong transaction, it just reduces the chances. You can still cancel a session with LocalBackendId == 1 and then have it accidentally cancel the next session with the same backendid. That's why I didn't chase this idea myself earlier. Closing that loophole isn't a priority and is best left until we have the other things have been cleaned up. > 2. > > AbortTransactionAndAnySubtransactions is only used in the mainloops > error handler as it should be unproblematic there. > > In the current CVS code ConditionalVirtualXactLockTableWait() in > ResolveRecoveryConflictWithVirtualXIDs does the wait for every try of > cancelling the other transaction. > > I moved the retry logic into CancelVirtualTransaction(). If 50 times a > ERROR does nothing it degrades to FATAL It's a good idea but not the right one, IMHO. I'd rather that the backend decided whether the instruction results in an ERROR or a FATAL based upon its own state, rather than have Startup process bang its head against the wall 50 times without knowing why. > XXX: I temporarily do not use the skipExistingConflicts argument of > GetConflictingVirtualXIDs - I dont understand its purpose and a bit of > infrastructure is missing right now as the recoveryConflictMode is not > stored anymore anywhere. Can easily be added back though. > 3. > Add a new error code ERRCODE_QUERY_CANCELED_HS for use with HS > indicating a failure that is more than a plain > ERRCODE_QUERY_CANCELED - namely it should not be caught from > various places like savepoints and in PLs. > > Exemplarily I checked for that error code in plpgsql and make it > uncatcheable. > > I am not sure how new errorcode codes get chosen though - and the name > is not that nice. > > Opinions on that? I don't trust it yet, as a mechanism. Changing only PL/pgSQL seems fairly broken also. Again, this seems like something that be handled separately. > I copied quite a bit from Simons earlier patch... It changes a few things around and adds the ideas you've mentioned, though seeing them as code doesn't in itself move the discussion forwards. There are far too many new ideas in this patch for it to be even close to acceptable, to me. Yes, lets discuss these additional ideas, but after a basic patch has been applied. I do value your interest and input, though racing me to rework my own patch, then asking me to review it seems like wasted time for both of us. I will post a new patch on ~Friday. (Also, please make your braces { } follow the normal coding conventions for Postgres.) > Currently the patch does not yet do anything to avoid letting the > protocol out of sync. What do you think about adding a flag for error > codes not to communicate with the client (Similarly to COMERROR)? > > So that one could do an elog(ERROR & ERROR_NO_SEND_CLIENT, .. or such? Seems fairly important piece. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: