Thread: Error code for "terminating connection due to conflict with recovery"
While looking at the backend code, I realized that error code for "terminating connection due to conflict with recovery" is ERRCODE_ADMIN_SHUTDOWN. I thought the error code is for somewhat a human interruption, such as shutdown command issued by pg_ctl. Is the usage of the error code appropreate? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: > While looking at the backend code, I realized that error code for > "terminating connection due to conflict with recovery" is > ERRCODE_ADMIN_SHUTDOWN. > > I thought the error code is for somewhat a human interruption, such as > shutdown command issued by pg_ctl. Is the usage of the error code > appropreate? That doesn't sound right to me. I'd have thought something in class 40. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> While looking at the backend code, I realized that error code for >> "terminating connection due to conflict with recovery" is >> ERRCODE_ADMIN_SHUTDOWN. >> >> I thought the error code is for somewhat a human interruption, such as >> shutdown command issued by pg_ctl. Is the usage of the error code >> appropreate? > > That doesn't sound right to me. I'd have thought something in class 40. What about: 40004 CONFLICT WITH RECOVERY conflict_with_recovery -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Tue, Jan 11, 2011 at 1:30 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >>> While looking at the backend code, I realized that error code for >>> "terminating connection due to conflict with recovery" is >>> ERRCODE_ADMIN_SHUTDOWN. >>> >>> I thought the error code is for somewhat a human interruption, such as >>> shutdown command issued by pg_ctl. Is the usage of the error code >>> appropreate? >> >> That doesn't sound right to me. I'd have thought something in class 40. > > What about: > > 40004 CONFLICT WITH RECOVERY conflict_with_recovery We should respect the following convention, from errcodes.h: * The convention is that new error codes defined by PostgreSQL in a* class defined by the standard have a subclass valuethat begins* with 'P'. In addition, error codes defined by PostgreSQL clients* (such as ecpg) have a class value thatbegins with 'Y'. And don't forget there are three places where the new error code would need to be added. Otherwise, +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>>> That doesn't sound right to me. I'd have thought something in class 40. >> >> What about: >> >> 40004 CONFLICT WITH RECOVERY conflict_with_recovery > > We should respect the following convention, from errcodes.h: > > * The convention is that new error codes defined by PostgreSQL in a > * class defined by the standard have a subclass value that begins > * with 'P'. In addition, error codes defined by PostgreSQL clients > * (such as ecpg) have a class value that begins with 'Y'. > > And don't forget there are three places where the new error code would > need to be added. > > Otherwise, +1. Ok. Here is the patch for this. I use 40P02, instead of 40004. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp *** a/doc/src/sgml/errcodes.sgml --- b/doc/src/sgml/errcodes.sgml *************** *** 985,990 **** --- 985,995 ---- <entry>deadlock_detected</entry> </row> + <row> + <entry><literal>40P02</literal></entry> + <entry>CONFLICT WITH RECOVERY</entry> + <entry>conflict_with_recovery</entry> + </row> <row> <entry spanname="span13"><emphasis role="bold">Class 42 — Syntax Error or Access Rule Violation</></entry> *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** *** 2909,2915 **** ProcessInterrupts(void) errdetail_recovery_conflict())); else if (RecoveryConflictPending) ereport(FATAL, ! (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection due to conflict withrecovery"), errdetail_recovery_conflict())); else --- 2909,2915 ---- errdetail_recovery_conflict())); else if (RecoveryConflictPending) ereport(FATAL, ! (errcode(ERRCODE_CONFLICT_WITH_RECOVERY), errmsg("terminating connection due to conflictwith recovery"), errdetail_recovery_conflict())); else *** a/src/include/utils/errcodes.h --- b/src/include/utils/errcodes.h *************** *** 243,248 **** --- 243,249 ---- #define ERRCODE_T_R_SERIALIZATION_FAILURE MAKE_SQLSTATE('4','0', '0','0','1') #define ERRCODE_T_R_STATEMENT_COMPLETION_UNKNOWN MAKE_SQLSTATE('4','0', '0','0','3') #define ERRCODE_T_R_DEADLOCK_DETECTED MAKE_SQLSTATE('4','0', 'P','0','1') + #define ERRCODE_CONFLICT_WITH_RECOVERY MAKE_SQLSTATE('4','0', 'P','0','2') /* Class 42 - Syntax Error or Access RuleViolation */ #define ERRCODE_SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION MAKE_SQLSTATE('4','2', '0','0','0') *** a/src/pl/plpgsql/src/plerrcodes.h --- b/src/pl/plpgsql/src/plerrcodes.h *************** *** 484,489 **** --- 484,493 ---- }, { + "conflict_with_recovery", ERRCODE_CONFLICT_WITH_RECOVERY + }, + + { "syntax_error_or_access_rule_violation", ERRCODE_SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION },
On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: > Ok. Here is the patch for this. I use 40P02, instead of 40004. Please add this to the currently open CommitFest: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Ok. Here is the patch for this. I use 40P02, instead of 40004. > > Please add this to the currently open CommitFest: > > https://commitfest.postgresql.org/action/commitfest_view/open Done. Comments are welcome. Unless there's objection, I will commit it this weekend. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
Tatsuo Ishii <ishii@postgresql.org> writes: >> Please add this to the currently open CommitFest: >> https://commitfest.postgresql.org/action/commitfest_view/open > Done. Comments are welcome. Unless there's objection, I will commit it > this weekend. If you're expecting anyone to actually *review* it during the CF, that's a bit premature. regards, tom lane
> Tatsuo Ishii <ishii@postgresql.org> writes: >>> Please add this to the currently open CommitFest: >>> https://commitfest.postgresql.org/action/commitfest_view/open > >> Done. Comments are welcome. Unless there's objection, I will commit it >> this weekend. > > If you're expecting anyone to actually *review* it during the CF, > that's a bit premature. No problem to wait for longer. I will wait by the end of January for the present. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Thu, Jan 13, 2011 at 7:31 PM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Tatsuo Ishii <ishii@postgresql.org> writes: >>>> Please add this to the currently open CommitFest: >>>> https://commitfest.postgresql.org/action/commitfest_view/open >> >>> Done. Comments are welcome. Unless there's objection, I will commit it >>> this weekend. >> >> If you're expecting anyone to actually *review* it during the CF, >> that's a bit premature. > > No problem to wait for longer. I will wait by the end of January for > the present. Review: The only possible point of concern I see here is the naming of the C identifier. Everything else in class 40 uses ERRCODE_T_R_whatever, with T_R standing for transaction rollback. It's not obvious to me that that convention has any real value, but perhaps we ought to follow it here for the sake of consistency? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Review: > > The only possible point of concern I see here is the naming of the C > identifier. Everything else in class 40 uses ERRCODE_T_R_whatever, > with T_R standing for transaction rollback. It's not obvious to me > that that convention has any real value, but perhaps we ought to > follow it here for the sake of consistency? Yeah. Actually at first I used "T_R" convention. After a few seconds thought, I realized that "T_R" is not appropreate by the same reason you feel. Possible other argument might be "Terminating connection always involves transaction rollback. So using T_R is ok". I'm not sure this argument is reasonable enough though. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp
On Fri, Jan 14, 2011 at 10:53 AM, Tatsuo Ishii <ishii@postgresql.org> wrote: >> Review: >> >> The only possible point of concern I see here is the naming of the C >> identifier. Everything else in class 40 uses ERRCODE_T_R_whatever, >> with T_R standing for transaction rollback. It's not obvious to me >> that that convention has any real value, but perhaps we ought to >> follow it here for the sake of consistency? > > Yeah. Actually at first I used "T_R" convention. After a few seconds > thought, I realized that "T_R" is not appropreate by the same reason > you feel. Possible other argument might be "Terminating connection > always involves transaction rollback. So using T_R is ok". I'm not > sure this argument is reasonable enough though. Looking at this a bit more carefully, I notice that there are two cases when a recovery conflict occurs: - we cancel the currently running statement, or - we kill the whole connection Should those use the same error code, or different ones? This patch doesn't appear to update all the places where recovery conflicts can occur, which is probably not ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tatsuo Ishii <ishii@postgresql.org> writes: >> Review: >> The only possible point of concern I see here is the naming of the C >> identifier. Everything else in class 40 uses ERRCODE_T_R_whatever, >> with T_R standing for transaction rollback. It's not obvious to me >> that that convention has any real value, but perhaps we ought to >> follow it here for the sake of consistency? > Yeah. Actually at first I used "T_R" convention. After a few seconds > thought, I realized that "T_R" is not appropreate by the same reason > you feel. Possible other argument might be "Terminating connection > always involves transaction rollback. So using T_R is ok". I'm not > sure this argument is reasonable enough though. This is not only a matter of some macro name or other. According to the SQL standard, class 40 itself is defined as "transaction rollback". If the error condition can't reasonably be regarded as a subcase of that, you're making a bad choice of SQLSTATE code. regards, tom lane
On Fri, 2011-01-14 at 12:04 -0500, Tom Lane wrote: > Tatsuo Ishii <ishii@postgresql.org> writes: > >> Review: > >> The only possible point of concern I see here is the naming of the C > >> identifier. Everything else in class 40 uses ERRCODE_T_R_whatever, > >> with T_R standing for transaction rollback. It's not obvious to me > >> that that convention has any real value, but perhaps we ought to > >> follow it here for the sake of consistency? > > > Yeah. Actually at first I used "T_R" convention. After a few seconds > > thought, I realized that "T_R" is not appropreate by the same reason > > you feel. Possible other argument might be "Terminating connection > > always involves transaction rollback. So using T_R is ok". I'm not > > sure this argument is reasonable enough though. > > This is not only a matter of some macro name or other. According to the > SQL standard, class 40 itself is defined as "transaction rollback". > If the error condition can't reasonably be regarded as a subcase of > that, you're making a bad choice of SQLSTATE code. This whole thing is confused. No change is appropriate here at all. We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for recovery conflicts. We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable, which occurs if someone drops the database that the user was connected to when they get kicked off. That code exists specifically to inform the user that they *cannot* reconnect. So pgpool should not be trying to trap that error and reconnect. So the whole basis for the existence of this patch is moot. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > This whole thing is confused. No change is appropriate here at all. > > We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for > recovery conflicts. > > We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable, > which occurs if someone drops the database that the user was connected > to when they get kicked off. That code exists specifically to inform the > user that they *cannot* reconnect. So pgpool should not be trying to > trap that error and reconnect. CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED. ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it wouldn't be a bad thing to be a bit more consistent, and perhaps to have dedicated error codes for recovery conflicts. This bit strikes me as particularly strange: else if (RecoveryConflictPending && RecoveryConflictRetryable) { pgstat_report_recovery_conflict(RecoveryConflictReason); ereport(FATAL, (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), errmsg("terminating connection due to conflict with recovery"), errdetail_recovery_conflict())); } else if (RecoveryConflictPending) { pgstat_report_recovery_conflict(RecoveryConflictReason); ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("terminating connection due to conflict with recovery"), errdetail_recovery_conflict())); } That's the same error message at the same severity level with two different SQLSTATEs depending on RecoveryConflictRetryable. Seems a bit cryptic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> This whole thing is confused. No change is appropriate here at all. >> >> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for >> recovery conflicts. >> >> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable, >> which occurs if someone drops the database that the user was connected >> to when they get kicked off. That code exists specifically to inform the >> user that they *cannot* reconnect. So pgpool should not be trying to >> trap that error and reconnect. > > CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED. > ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE > and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it > wouldn't be a bad thing to be a bit more consistent, and perhaps to > have dedicated error codes for recovery conflicts. This bit strikes > me as particularly strange: > > else if (RecoveryConflictPending && RecoveryConflictRetryable) > { > pgstat_report_recovery_conflict(RecoveryConflictReason); > ereport(FATAL, > > (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), > errmsg("terminating connection due to > conflict with recovery"), > errdetail_recovery_conflict())); > } > else if (RecoveryConflictPending) > { > pgstat_report_recovery_conflict(RecoveryConflictReason); > ereport(FATAL, > (errcode(ERRCODE_ADMIN_SHUTDOWN), > errmsg("terminating connection due to > conflict with recovery"), > errdetail_recovery_conflict())); > } > > That's the same error message at the same severity level with two > different SQLSTATEs depending on RecoveryConflictRetryable. Seems a > bit cryptic. So what we do want to do about this? I'm pretty well convinced that we should NOT be issuing ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be fixed by a trivial simplification of the code posted above, without introducing any new error code. I'd also be in favor of changing the one that uses ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as the former might be taken to imply active user intervention, and for consistency. It's no longer clear to me that we actually need a new error code for this - using the same one everywhere seems like it might be sufficient, unless someone wants to make an argument why it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> This whole thing is confused. No change is appropriate here at all. >>> >>> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for >>> recovery conflicts. >>> >>> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable, >>> which occurs if someone drops the database that the user was connected >>> to when they get kicked off. That code exists specifically to inform the >>> user that they *cannot* reconnect. So pgpool should not be trying to >>> trap that error and reconnect. >> >> CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED. >> ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE >> and sometimes uses ERRCODE_ADMIN_SHUTDOWN. It seems to me that it >> wouldn't be a bad thing to be a bit more consistent, and perhaps to >> have dedicated error codes for recovery conflicts. This bit strikes >> me as particularly strange: >> >> else if (RecoveryConflictPending && RecoveryConflictRetryable) >> { >> pgstat_report_recovery_conflict(RecoveryConflictReason); >> ereport(FATAL, >> >> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), >> errmsg("terminating connection due to >> conflict with recovery"), >> errdetail_recovery_conflict())); >> } >> else if (RecoveryConflictPending) >> { >> pgstat_report_recovery_conflict(RecoveryConflictReason); >> ereport(FATAL, >> (errcode(ERRCODE_ADMIN_SHUTDOWN), >> errmsg("terminating connection due to >> conflict with recovery"), >> errdetail_recovery_conflict())); >> } >> >> That's the same error message at the same severity level with two >> different SQLSTATEs depending on RecoveryConflictRetryable. Seems a >> bit cryptic. > > So what we do want to do about this? > > I'm pretty well convinced that we should NOT be issuing > ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be > fixed by a trivial simplification of the code posted above, without > introducing any new error code. I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery conflict. And if your proposal does not need to introduce new error code, I also agree with not inventing new error code. > I'd also be in favor of changing the one that uses > ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as > the former might be taken to imply active user intervention, and for > consistency. +1. > It's no longer clear to me that we actually need a new error code for > this - using the same one everywhere seems like it might be > sufficient, unless someone wants to make an argument why it isn't. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote: > > > > I'm pretty well convinced that we should NOT be issuing > > ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be > > fixed by a trivial simplification of the code posted above, without > > introducing any new error code. > > I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery > conflict. And if your proposal does not need to introduce new error > code, I also agree with not inventing new error code. > > I'd also be in favor of changing the one that uses > > ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as > > the former might be taken to imply active user intervention, and for > > consistency. > > +1. We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors, which is almost every error. So no change required there. ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot reconnect or retry because the database we said we wished to connect to no longer exists. That needs to be a different error code to a normal, retryable error, so that pgpool can tell the difference between things it can help with and things it cannot help with. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Jan21, 2011, at 10:16 , Simon Riggs wrote: > On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote: >> >>> I'm pretty well convinced that we should NOT be issuing >>> ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be >>> fixed by a trivial simplification of the code posted above, without >>> introducing any new error code. >> >> I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery >> conflict. And if your proposal does not need to introduce new error >> code, I also agree with not inventing new error code. > >>> I'd also be in favor of changing the one that uses >>> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as >>> the former might be taken to imply active user intervention, and for >>> consistency. >> >> +1. > > We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors, > which is almost every error. So no change required there. > > ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot > reconnect or retry because the database we said we wished to connect to > no longer exists. That needs to be a different error code to a normal, > retryable error, so that pgpool can tell the difference between things > it can help with and things it cannot help with. Yeah. Clients absolutely need to be able to distinguish transient and permanent errors. Otherwise, how would a client know when to retry a transaction (as he needs to in case of a serialization anomaly) and when to report the error to the user? ERRCODE_T_R_SERIALIZATION_FAILURE and ERRCODE_T_R_DEADLOCK_DETECTED are probably both assumed to be transient failure by client aready. So we should use those two for transient recovery conflicts (i.e. those which go away if you retry) and something else for the others (like database dropped) This'd mean that the code is fine as it is, except that we should raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED in CheckRecoveryConflictDeadlock(). I might be missing something though - Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there? best regards, Florian Pflug
On Fri, 2011-01-21 at 13:09 +0100, Florian Pflug wrote: > > > >>> I'd also be in favor of changing the one that uses > >>> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as > >>> the former might be taken to imply active user intervention, and for > >>> consistency. > >> > >> +1. > > > > We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors, > > which is almost every error. So no change required there. > > > > ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot > > reconnect or retry because the database we said we wished to connect to > > no longer exists. That needs to be a different error code to a normal, > > retryable error, so that pgpool can tell the difference between things > > it can help with and things it cannot help with. > > Yeah. Clients absolutely need to be able to distinguish transient and > permanent errors. Otherwise, how would a client know when to retry > a transaction (as he needs to in case of a serialization anomaly) and > when to report the error to the user? > > ERRCODE_T_R_SERIALIZATION_FAILURE and ERRCODE_T_R_DEADLOCK_DETECTED > are probably both assumed to be transient failure by client aready. So > we should use those two for transient recovery conflicts (i.e. those > which go away if you retry) and something else for the others (like > database dropped) > > This'd mean that the code is fine as it is, except that we should > raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED > in CheckRecoveryConflictDeadlock(). I might be missing something though - > Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there? Ah, thanks Florian. Now I understand. There are two related issues here. 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the specific patch should be rejected as is. No changes are required in ProcessInterrupts(), nor new errcodes. 2. Robert is correct that CheckRecoveryConflictDeadlock() returns ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had switched away from the original discussion onto another part of the code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a mistake; CheckRecoveryConflictDeadlock() should return ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12 May 2010. Tatsuo, would you like to modify the patch to correct the issue in CheckRecoveryConflictDeadlock() ? Or would you prefer me to fix? This should be backpatched to 9.0. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Ah, thanks Florian. Now I understand. There are two related issues here. > > 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the > specific patch should be rejected as is. No changes are required in > ProcessInterrupts(), nor new errcodes. Can you please justify that statement instead of simply asserting it? Tatsuo-san and I both seem to agree that it looks wrong. ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's used elsewhere when a SIGTERM is received and the database is shutting down. That's a world away from what's actually happening here. Wanting to have a different error code for this type of failure may make sense, but that doesn't mean that this is the right one. > 2. Robert is correct that CheckRecoveryConflictDeadlock() returns > ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had > switched away from the original discussion onto another part of the > code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a > mistake; CheckRecoveryConflictDeadlock() should return > ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12 > May 2010. This part sounds good. > This should be backpatched to 9.0. Hmm, I don't necessarily agree. The standard for changing behavior in an existing release is fairly high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Ah, thanks Florian. Now I understand. There are two related issues here. >> >> 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the >> specific patch should be rejected as is. No changes are required in >> ProcessInterrupts(), nor new errcodes. > > Can you please justify that statement instead of simply asserting it? > Tatsuo-san and I both seem to agree that it looks wrong. > ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's > used elsewhere when a SIGTERM is received and the database is shutting > down. That's a world away from what's actually happening here. > Wanting to have a different error code for this type of failure may > make sense, but that doesn't mean that this is the right one. > >> 2. Robert is correct that CheckRecoveryConflictDeadlock() returns >> ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had >> switched away from the original discussion onto another part of the >> code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a >> mistake; CheckRecoveryConflictDeadlock() should return >> ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12 >> May 2010. > > This part sounds good. > >> This should be backpatched to 9.0. > > Hmm, I don't necessarily agree. The standard for changing behavior in > an existing release is fairly high. It seems like we have consensus to change CheckRecoveryConflictDetected() to return ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also change that in 9.0 (votes: Robert - for, Simon - against) and arguably not on whether to change the case that returns ERROR_ADMIN_SHUTDOWN for a recovery conflict (votes: Robert, Tatsuo-san - for, Simon - against). Anyone else want to weigh in? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert, > On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Ah, thanks Florian. Now I understand. There are two related issues here. >>> >>> 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the >>> specific patch should be rejected as is. No changes are required in >>> ProcessInterrupts(), nor new errcodes. >> >> Can you please justify that statement instead of simply asserting it? >> Tatsuo-san and I both seem to agree that it looks wrong. >> ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's >> used elsewhere when a SIGTERM is received and the database is shutting >> down. That's a world away from what's actually happening here. >> Wanting to have a different error code for this type of failure may >> make sense, but that doesn't mean that this is the right one. >> >>> 2. Robert is correct that CheckRecoveryConflictDeadlock() returns >>> ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had >>> switched away from the original discussion onto another part of the >>> code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a >>> mistake; CheckRecoveryConflictDeadlock() should return >>> ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12 >>> May 2010. >> >> This part sounds good. >> >>> This should be backpatched to 9.0. >> >> Hmm, I don't necessarily agree. The standard for changing behavior in >> an existing release is fairly high. > > It seems like we have consensus to change > CheckRecoveryConflictDetected() to return > ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also > change that in 9.0 (votes: Robert - for, Simon - against) Please note that I'm with you. > and arguably > not on whether to change the case that returns ERROR_ADMIN_SHUTDOWN > for a recovery conflict (votes: Robert, Tatsuo-san - for, Simon - > against). > > Anyone else want to weigh in? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
Tatsuo Ishii <ishii@postgresql.org> writes: > On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> It seems like we have consensus to change >> CheckRecoveryConflictDetected() to return >> ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also >> change that in 9.0 (votes: Robert - for, Simon - against) > Please note that I'm with you. I'm inclined to agree that 9.0 is new enough that changing this shouldn't be too traumatic. If we leave it till 9.1 there will probably be more pain not less. regards, tom lane
On Sun, Jan 30, 2011 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Tatsuo Ishii <ishii@postgresql.org> writes: >> On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> It seems like we have consensus to change >>> CheckRecoveryConflictDetected() to return >>> ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also >>> change that in 9.0 (votes: Robert - for, Simon - against) > >> Please note that I'm with you. > > I'm inclined to agree that 9.0 is new enough that changing this > shouldn't be too traumatic. If we leave it till 9.1 there will > probably be more pain not less. I have done a poor job of summarizing the previous votes. Simon was arguing FOR back-patching, and I was arguing AGAINST it. But if you and Tatsuo-san are both in favor of it, then perhaps we should do it. However, that might require people to write application code which has to be aware of which minor version it's talking to, which is usually something we try hard to avoid. Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 30, 2011 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm inclined to agree that 9.0 is new enough that changing this >> shouldn't be too traumatic. �If we leave it till 9.1 there will >> probably be more pain not less. > But if you and Tatsuo-san are both in favor of it, then perhaps we > should do it. However, that might require people to write application > code which has to be aware of which minor version it's talking to, > which is usually something we try hard to avoid. Well, the thing is that if we delay changing this until 9.1, then pretty much every app that cares about this will have to be prepared to deal with both possibilities, and will thus have to contain some version-sensitive code, and will have to keep dealing with that for the support lifetime of 9.0.x. My thought is that if it changes in 9.0.4, client authors will probably be able to say "we support >= 9.0.4" and just deal with one behavior. 9.0 hasn't yet had so much uptake that this is an infeasible position, especially given the rate at which we're still finding serious bugs in it. 9.0.early isn't going to be getting run by the sorts of DBAs who would care about apps that care about this. IOW, I think we can still consider changing error codes that are new in 9.0.x as a bug fix. > Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN? Pretty much the same argument here, I think: if we are going to change the SQLSTATE we should do it now not later. However, I think Simon was actually arguing to not change this one either now or later, and that might also be a defensible position. BTW, so far as this goes: http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php we should certainly *not* have the same text for two different SQLSTATEs. If it's worth distinguishing two cases then it's worth providing different texts that make it clear what the cases are. regards, tom lane
On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN? > > Pretty much the same argument here, I think: if we are going to change > the SQLSTATE we should do it now not later. However, I think Simon > was actually arguing to not change this one either now or later, and > that might also be a defensible position. > > BTW, so far as this goes: > http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php > we should certainly *not* have the same text for two different > SQLSTATEs. If it's worth distinguishing two cases then it's worth > providing different texts that make it clear what the cases are. Well, then we either need to change the error codes to be the same, or the texts to be different. The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the database gets dropped out from underneath the HS backend. I don't think it's worth having a separate path just to handle that case; if the user retries the operation it should quickly become clear that the database is gone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, so far as this goes: >> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php >> we should certainly *not* have the same text for two different >> SQLSTATEs. �If it's worth distinguishing two cases then it's worth >> providing different texts that make it clear what the cases are. > Well, then we either need to change the error codes to be the same, or > the texts to be different. > The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the > database gets dropped out from underneath the HS backend. I don't > think it's worth having a separate path just to handle that case; if > the user retries the operation it should quickly become clear that the > database is gone. Somone, Tatsuo-san IIRC, was saying that pgpool would really like to know the difference so that it doesn't have to retry at all. I'm not sure how useful that really is, but that's the argument for having a distinct SQLSTATE. If we do believe that that's useful, I think it should be a unique new SQLSTATE that never means anything other than "your database got deleted from under you". Piggybacking that meaning on an existing SQLSTATE definitely seems completely bogus to me. regards, tom lane
On Sun, Jan 30, 2011 at 9:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> BTW, so far as this goes: >>> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php >>> we should certainly *not* have the same text for two different >>> SQLSTATEs. If it's worth distinguishing two cases then it's worth >>> providing different texts that make it clear what the cases are. > >> Well, then we either need to change the error codes to be the same, or >> the texts to be different. > >> The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the >> database gets dropped out from underneath the HS backend. I don't >> think it's worth having a separate path just to handle that case; if >> the user retries the operation it should quickly become clear that the >> database is gone. > > Somone, Tatsuo-san IIRC, was saying that pgpool would really like to > know the difference so that it doesn't have to retry at all. I'm not > sure how useful that really is, but that's the argument for having a > distinct SQLSTATE. If we do believe that that's useful, I think it > should be a unique new SQLSTATE that never means anything other than > "your database got deleted from under you". Piggybacking that meaning > on an existing SQLSTATE definitely seems completely bogus to me. Actually, it was Simon and Florian who were arguing that we needed to distinguish these cases from other types of recovery conflict; Tatsuo-san was arguing that we needed to distinguish a dropped-database-recovery-conflict from a cluster shutdown - the current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing. ISTM we can invent zero, one, or two new error codes here. If we invent zero, then we change all recovery conflicts to look like serialization failures and call it good. If we invent one, then we make retryable recovery conflicts look like serialization failures and the dropped-database case gets a newly minted error code that means just that. Or we can invent two, and make serialization failures different from recovery conflicts, and retryable recovery conflicts different from the dropped-database variety. I don't have a terribly strong opinion as between those options. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Actually, it was Simon and Florian who were arguing that we needed to > distinguish these cases from other types of recovery conflict; > Tatsuo-san was arguing that we needed to distinguish a > dropped-database-recovery-conflict from a cluster shutdown - the > current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing. > > ISTM we can invent zero, one, or two new error codes here. If we > invent zero, then we change all recovery conflicts to look like > serialization failures and call it good. If we invent one, then we > make retryable recovery conflicts look like serialization failures and > the dropped-database case gets a newly minted error code that means > just that. Or we can invent two, and make serialization failures > different from recovery conflicts, and retryable recovery conflicts > different from the dropped-database variety. > > I don't have a terribly strong opinion as between those options. As a novice I am not sure why we _wouldn't_ create two new separate error codes --- it not not like they cost us anything, and they certainly sound distinct. The requirement to retry is clearly something we want to avoid if we get a new error code. Backpatching to 9.0 makes sense too, though the problem is the delay in getting the code into a released minor version. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Re: Error code for "terminating connection due to conflict with recovery"
From
"Kevin Grittner"
Date:
Bruce Momjian <bruce@momjian.us> wrote: > As a novice I am not sure why we _wouldn't_ create two new > separate error codes The argument for using SQLSTATE 40001 for failures which are strictly due to concurrency problems, and are likely to work if the transaction is retried, is that there is already a lot of software which knows how to do that. On the other hand, going into such code to turn that into a list of concurrency failure states is probably only going to cause pain to those with applications intended to work with multiple DBMS products without much modification. -Kevin
On Mon, 2011-01-31 at 09:46 -0500, Bruce Momjian wrote: > > Actually, it was Simon and Florian who were arguing that we needed to > > distinguish these cases from other types of recovery conflict; > > Tatsuo-san was arguing that we needed to distinguish a > > dropped-database-recovery-conflict from a cluster shutdown - the > > current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing. > > > > ISTM we can invent zero, one, or two new error codes here. If we > > invent zero, then we change all recovery conflicts to look like > > serialization failures and call it good. If we invent one, then we > > make retryable recovery conflicts look like serialization failures and > > the dropped-database case gets a newly minted error code that means > > just that. Or we can invent two, and make serialization failures > > different from recovery conflicts, and retryable recovery conflicts > > different from the dropped-database variety. > > > > I don't have a terribly strong opinion as between those options. > > As a novice I am not sure why we _wouldn't_ create two new separate > error codes --- it not not like they cost us anything, and they > certainly sound distinct. The requirement to retry is clearly something > we want to avoid if we get a new error code. It's the way it was because of discussion during 9.0. The errors for "serialization error" and "deadlock" are appropriate because everybody knows these exist. If you invent a new error code then you'll need to re-program loads of applications which would all "just work" if we use those error codes. Kevin specifically requested it for that reason. That leaves the error code for the drop database case. As both Tatsuo and myself have said, it needs to be different so we do not retry it. I don't personally think an edge case like that needs a whole new error code to explain it. > Backpatching to 9.0 makes sense too, though the problem is the delay in > getting the code into a released minor version. I treat this as a bugfix, so backpatch is appropriate. My earlier "fix" in 9.0 beta didn't cover the case pointed out by Robert/Florian and as Tatsuo points out, we need to have a retryable error for all common cases. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Kevin Grittner wrote: > Bruce Momjian <bruce@momjian.us> wrote: > > > As a novice I am not sure why we _wouldn't_ create two new > > separate error codes > > The argument for using SQLSTATE 40001 for failures which are > strictly due to concurrency problems, and are likely to work if the > transaction is retried, is that there is already a lot of software > which knows how to do that. On the other hand, going into such code > to turn that into a list of concurrency failure states is probably > only going to cause pain to those with applications intended to work > with multiple DBMS products without much modification. The way they usually handle that is by having a class of codes that designate that behavior, but I can see now that the number can't be subdivided. :-( -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Jan 31, 2011 at 10:31 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Bruce Momjian <bruce@momjian.us> wrote: >> As a novice I am not sure why we _wouldn't_ create two new >> separate error codes > > The argument for using SQLSTATE 40001 for failures which are > strictly due to concurrency problems, and are likely to work if the > transaction is retried, is that there is already a lot of software > which knows how to do that. On the other hand, going into such code > to turn that into a list of concurrency failure states is probably > only going to cause pain to those with applications intended to work > with multiple DBMS products without much modification. Yeah, I think that one's pretty logical. I think my vote is to either change the drop-database case to be the same as that, or to use a new error code. ERRCODE_ADMIN_SHUTDOWN is just strange. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote: > On Mon, Jan 31, 2011 at 10:31 AM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > > Bruce Momjian <bruce@momjian.us> wrote: > >> As a novice I am not sure why we _wouldn't_ create two new > >> separate error codes > > > > The argument for using SQLSTATE 40001 for failures which are > > strictly due to concurrency problems, and are likely to work if the > > transaction is retried, is that there is already a lot of software > > which knows how to do that. On the other hand, going into such code > > to turn that into a list of concurrency failure states is probably > > only going to cause pain to those with applications intended to work > > with multiple DBMS products without much modification. > > Yeah, I think that one's pretty logical. > I think my vote is to either > change the drop-database case to be the same as that No, we shouldn't have a "can retry" errcode for a "must not retry" case. > , or to use a new > error code. ERRCODE_ADMIN_SHUTDOWN is just strange. It's not strange at all. It's the same error code as we use for all of the other cases listed. We need that because it is the current catch-all errcode for "cannot retry". The purpose of errcodes is to allow programs to check them and then act. It's pointless to add a new errcode that is so rare that nobody will ever program for it because they won't expect it, let alone test for it. Or at least won't assign any sensible priority to handling that error. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote: >> , or to use a new >> error code. ERRCODE_ADMIN_SHUTDOWN is just strange. > It's not strange at all. It's the same error code as we use for all of > the other cases listed. We need that because it is the current > catch-all errcode for "cannot retry". > The purpose of errcodes is to allow programs to check them and then act. > It's pointless to add a new errcode that is so rare that nobody will > ever program for it because they won't expect it, let alone test for it. > Or at least won't assign any sensible priority to handling that error. The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a connection pooler to expect that *all* its connections are going bad, not just the ones that are connected to a specific database. I think this is a bad decision. Programs that are interested in testing for this case at all are likely to need to be worried about that distinction. Also, while I believe that ERRCODE_T_R_DEADLOCK_DETECTED is a reasonable catchall retry code, I don't think it's equally sane to think that ERRCODE_ADMIN_SHUTDOWN is a catchall non-retry code. regards, tom lane
On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote: > >> , or to use a new > >> error code. ERRCODE_ADMIN_SHUTDOWN is just strange. > > > It's not strange at all. It's the same error code as we use for all of > > the other cases listed. We need that because it is the current > > catch-all errcode for "cannot retry". > > > The purpose of errcodes is to allow programs to check them and then act. > > It's pointless to add a new errcode that is so rare that nobody will > > ever program for it because they won't expect it, let alone test for it. > > Or at least won't assign any sensible priority to handling that error. > > The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a > connection pooler to expect that *all* its connections are going bad, > not just the ones that are connected to a specific database. I think > this is a bad decision. Programs that are interested in testing for this > case at all are likely to need to be worried about that distinction. That's a reasonable argument. My objection to a new code is only to one that is so specific that people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR. Can we invent a new "catch-all" that might be used here? Something that means "unknown operational error, not sure what to do". ERRCODE_ADMIN_OTHER or ERRCODE_ADMIN_UNCLASSIFIED. > Also, while I believe that ERRCODE_T_R_DEADLOCK_DETECTED is a reasonable > catchall retry code, I don't think it's equally sane to think that > ERRCODE_ADMIN_SHUTDOWN is a catchall non-retry code. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote: >> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a >> connection pooler to expect that *all* its connections are going bad, >> not just the ones that are connected to a specific database. I think >> this is a bad decision. Programs that are interested in testing for this >> case at all are likely to need to be worried about that distinction. > That's a reasonable argument. > My objection to a new code is only to one that is so specific that > people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR. What's wrong with ERRCODE_DATABASE_DROPPED, or something like that? > Can we invent a new "catch-all" that might be used here? Something that > means "unknown operational error, not sure what to do". Because that's not the situation here. We know exactly what a pooler should do. It might be an infrequent case, but obscurantism isn't going to help anyone. regards, tom lane
On 1/31/11 11:26 AM, Simon Riggs wrote: > The purpose of errcodes is to allow programs to check them and then act. > It's pointless to add a new errcode that is so rare that nobody will > ever program for it because they won't expect it, let alone test for it. > Or at least won't assign any sensible priority to handling that error. Personally, I would prefer a new error code because I *intend* to write specific code to deal with replication cancellation. Without a specific code, then I cannot distinguish replication cancel from whatever else it's lumped in with, except by regexping the error message. If we're not going to get a new code, then I'd prefer DEADLOCK_DETECTED because the behavior we expect from the client (try the whole transaction over from the beginning) is the same. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Mon, 2011-01-31 at 16:24 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote: > >> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a > >> connection pooler to expect that *all* its connections are going bad, > >> not just the ones that are connected to a specific database. I think > >> this is a bad decision. Programs that are interested in testing for this > >> case at all are likely to need to be worried about that distinction. > > > That's a reasonable argument. > > > My objection to a new code is only to one that is so specific that > > people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR. > > What's wrong with ERRCODE_DATABASE_DROPPED, or something like that? > > > Can we invent a new "catch-all" that might be used here? Something that > > means "unknown operational error, not sure what to do". > > Because that's not the situation here. We know exactly what a pooler > should do. It might be an infrequent case, but obscurantism isn't going > to help anyone. OK, that makes sense to me. I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error, rather than a Transaction Rollback code. So sqlstate 28P02 The sensible handling of such an error is not to retry, or at least to switch to an alternate if one is available. Ready to commit if no objection. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2011-01-31 at 16:24 -0500, Tom Lane wrote: >> Simon Riggs <simon@2ndQuadrant.com> writes: >> > On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote: >> >> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a >> >> connection pooler to expect that *all* its connections are going bad, >> >> not just the ones that are connected to a specific database. I think >> >> this is a bad decision. Programs that are interested in testing for this >> >> case at all are likely to need to be worried about that distinction. >> >> > That's a reasonable argument. >> >> > My objection to a new code is only to one that is so specific that >> > people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR. >> >> What's wrong with ERRCODE_DATABASE_DROPPED, or something like that? >> >> > Can we invent a new "catch-all" that might be used here? Something that >> > means "unknown operational error, not sure what to do". >> >> Because that's not the situation here. We know exactly what a pooler >> should do. It might be an infrequent case, but obscurantism isn't going >> to help anyone. > > OK, that makes sense to me. > > I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error, > rather than a Transaction Rollback code. So sqlstate 28P02 > > The sensible handling of such an error is not to retry, or at least to > switch to an alternate if one is available. > > Ready to commit if no objection. ISTM it should still be in class 40. There's nothing wrong with the user's authorization; we've just decided to roll back the transaction for our own purposes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2011-01-31 at 18:21 -0500, Robert Haas wrote: > > Ready to commit if no objection. > > ISTM it should still be in class 40. There's nothing wrong with the > user's authorization; we've just decided to roll back the transaction > for our own purposes. OK. BTW, anybody know why we have PL/pgSQL condition codes for conditions that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to me. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error, >> rather than a Transaction Rollback code. So sqlstate 28P02 > ISTM it should still be in class 40. There's nothing wrong with the > user's authorization; we've just decided to roll back the transaction > for our own purposes. I agree, 28 is a completely off-point category. But it wasn't in 40 before, either --- we are talking about where it currently says ADMIN_SHUTDOWN, no? I'd vote for keeping it in class 57 (operator intervention), as that is both sensible and a minimal change from current behavior. regards, tom lane
> BTW, anybody know why we have PL/pgSQL condition codes for conditions > that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and > ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to > me. So we can support autonomous transactions in the future? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error, >>> rather than a Transaction Rollback code. So sqlstate 28P02 > >> ISTM it should still be in class 40. There's nothing wrong with the >> user's authorization; we've just decided to roll back the transaction >> for our own purposes. > > I agree, 28 is a completely off-point category. But it wasn't in 40 > before, either --- we are talking about where it currently says > ADMIN_SHUTDOWN, no? I'd vote for keeping it in class 57 (operator > intervention), as that is both sensible and a minimal change from > current behavior. Seems a little weird to me, since the administrator hasn't done anything. It's the system that has decide to roll the transaction back, not the operator. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 31, 2011 at 7:13 PM, Josh Berkus <josh@agliodbs.com> wrote: >> BTW, anybody know why we have PL/pgSQL condition codes for conditions >> that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and >> ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to >> me. > > So we can support autonomous transactions in the future? Huh? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I agree, 28 is a completely off-point category. �But it wasn't in 40 >> before, either --- we are talking about where it currently says >> ADMIN_SHUTDOWN, no? �I'd vote for keeping it in class 57 (operator >> intervention), as that is both sensible and a minimal change from >> current behavior. > Seems a little weird to me, since the administrator hasn't done > anything. Sure he has: he issued the DROP DATABASE command that's causing the system to disconnect standby sessions. regards, tom lane
On Mon, 2011-01-31 at 19:12 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error, > >> rather than a Transaction Rollback code. So sqlstate 28P02 > > > ISTM it should still be in class 40. There's nothing wrong with the > > user's authorization; we've just decided to roll back the transaction > > for our own purposes. > > I agree, 28 is a completely off-point category. But it wasn't in 40 > before, either --- we are talking about where it currently says > ADMIN_SHUTDOWN, no? I'd vote for keeping it in class 57 (operator > intervention), as that is both sensible and a minimal change from > current behavior. Sorry about that chaps, didn't see your emails: I just committed. 57 was another one I considered, as was 08 connection failures. I don't think 40 was the right place, but my bed was calling. I'll sleep now and fix in my morning according to what y'all decide. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
Simon Riggs <simon@2ndQuadrant.com> writes: > BTW, anybody know why we have PL/pgSQL condition codes for conditions > that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and > ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to > me. There's a difference between not being able to trap the error and not being able to name it at all. You might wish to throw it, for instance. regards, tom lane
On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I agree, 28 is a completely off-point category. But it wasn't in 40 >>> before, either --- we are talking about where it currently says >>> ADMIN_SHUTDOWN, no? I'd vote for keeping it in class 57 (operator >>> intervention), as that is both sensible and a minimal change from >>> current behavior. > >> Seems a little weird to me, since the administrator hasn't done >> anything. > > Sure he has: he issued the DROP DATABASE command that's causing the > system to disconnect standby sessions. Well, I'm not sure how much this matters - as long as it's a dedicated error code, the user can write code to DTRT somehow. But I don't buy your argument. Ultimately, user activity causes any kind of recovery conflict. So I don't see why one particular kind of recovery conflict should be in a different class than all the others. It's the administrator (I guess) who ran VACUUM FREEZE and blew up every query running on the standby, too. But the user doesn't directly control when recovery conflicts get fired on the standby - it's the system that decides to do that. Right now, we happen to ignore max_standby_delay for drop database, but in general the coupling between when an action happens on the master and when conflicts occur on the standby is fairly loose. Then again - in theory, there's no reason why we couldn't drop a database on the master when it's in use, kicking out everyone using it with this very same error code. We don't happen to handle it that way right now, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Seems a little weird to me, since the administrator hasn't done >>> anything. >> Sure he has: he issued the DROP DATABASE command that's causing the >> system to disconnect standby sessions. > Well, I'm not sure how much this matters - as long as it's a dedicated > error code, the user can write code to DTRT somehow. But I don't buy > your argument. Ultimately, user activity causes any kind of recovery > conflict. Well, yeah, but the predictability of the failure is pretty variable. In this case we can say that the error definitely would not have occurred if somebody hadn't done a DROP DATABASE on the master while there were live sessions in that DB on the slave. I think that's a sufficiently close coupling to say that the error is the result of an operator action. OTOH, the occurrence of deadlocks is (usually) a lot more dependent on random-chance timing of different transactions, and you usually can't point to any action that intentionally caused a deadlock. > Then again - in theory, there's no reason why we couldn't drop a > database on the master when it's in use, kicking out everyone using it > with this very same error code. We don't happen to handle it that way > right now, but... Yeah, that was in the back of my mind too. "DROP DATABASE foo FORCE", maybe? regards, tom lane
On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Then again - in theory, there's no reason why we couldn't drop a >> database on the master when it's in use, kicking out everyone using it >> with this very same error code. We don't happen to handle it that way >> right now, but... > > Yeah, that was in the back of my mind too. "DROP DATABASE foo FORCE", > maybe? I have to think some people would find that useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Error code for "terminating connection due to conflict with recovery"
From
Magnus Hagander
Date:
On Tue, Feb 1, 2011 at 03:29, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Then again - in theory, there's no reason why we couldn't drop a >>> database on the master when it's in use, kicking out everyone using it >>> with this very same error code. We don't happen to handle it that way >>> right now, but... >> >> Yeah, that was in the back of my mind too. "DROP DATABASE foo FORCE", >> maybe? > > I have to think some people would find that useful. Yes. If nothing else, it would save some typing :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, 2011-01-31 at 20:27 -0500, Robert Haas wrote: > So I don't see why one particular kind of recovery conflict > should be in a different class than all the others. This point has been explained many times and is very clear in the code. It has a clear functional purpose, not decoration or mere personal opinion. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, 2011-01-31 at 20:52 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: > >>> Seems a little weird to me, since the administrator hasn't done > >>> anything. > > >> Sure he has: he issued the DROP DATABASE command that's causing the > >> system to disconnect standby sessions. > > > Well, I'm not sure how much this matters - as long as it's a dedicated > > error code, the user can write code to DTRT somehow. But I don't buy > > your argument. Ultimately, user activity causes any kind of recovery > > conflict. > > Well, yeah, but the predictability of the failure is pretty variable. > In this case we can say that the error definitely would not have > occurred if somebody hadn't done a DROP DATABASE on the master while > there were live sessions in that DB on the slave. I think that's a > sufficiently close coupling to say that the error is the result of an > operator action. OTOH, the occurrence of deadlocks is (usually) a lot > more dependent on random-chance timing of different transactions, and > you usually can't point to any action that intentionally caused a > deadlock. ERRCODE_DATABASE_DROPPED 57P04 looks best The previous code was 57P01 so this is least change, if nothing else. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, 2011-02-01 at 07:35 +0100, Magnus Hagander wrote: > On Tue, Feb 1, 2011 at 03:29, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Then again - in theory, there's no reason why we couldn't drop a > >>> database on the master when it's in use, kicking out everyone using it > >>> with this very same error code. We don't happen to handle it that way > >>> right now, but... > >> > >> Yeah, that was in the back of my mind too. "DROP DATABASE foo FORCE", > >> maybe? > > > > I have to think some people would find that useful. > > Yes. > > If nothing else, it would save some typing :-) I like it also. It allows me to get rid of the concept of "non-retryable recovery conflicts", so we just have one code path that works in both normal mode and standby. Sweet. Here's the basic patch, will work on the refactoring if no objections. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
Simon Riggs <simon@2ndQuadrant.com> writes: > Here's the basic patch, will work on the refactoring if no objections. ResolveRecoveryConflictWithDatabase works when you're not in recovery? That seems pretty fragile at best. In any case, this is a 9.2 feature at the earliest, please do not expect people to spend time on it now. regards, tom lane
On Tue, 2011-02-01 at 11:43 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Here's the basic patch, will work on the refactoring if no objections. > > ResolveRecoveryConflictWithDatabase works when you're not in recovery? > That seems pretty fragile at best. In any case, this is a 9.2 feature > at the earliest, please do not expect people to spend time on it now. Err, no. Sorry if that wasn't clear. The patch works but requires refactoring, which I referred to above. What is now called ResolveRecoveryConflictWithDatabase() would be changed to a name that makes sense in both normal running and recovery. That seemed so obvious that I wouldn't be leaving it like that, so I just referred to that indirectly, which was confusing. That allows me to refactor out the code that we just spent too long discussing, so I was hoping to do that in this release. I wasn't expecting anybody else to spend time on it since it is simple and clear. I'll add it to 9.2. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Tue, Feb 1, 2011 at 3:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > ERRCODE_DATABASE_DROPPED 57P04 looks best So I guess the only remaining issue is whether we should distinguish the error message text, as well as the error codes. Tom was in favor of that upthread, and I am too. Right now we have: else if (RecoveryConflictPending && RecoveryConflictRetryable) { pgstat_report_recovery_conflict(RecoveryConflictReason); ereport(FATAL, (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), errmsg("terminating connection due to conflict with recovery"), errdetail_recovery_conflict())); } else if (RecoveryConflictPending) { /* Currently there is only one non-retryable recovery conflict */ Assert(RecoveryConflictReason == PROCSIG_RECOVERY_CONFLICT_DATABASE); pgstat_report_recovery_conflict(RecoveryConflictReason); ereport(FATAL, (errcode(ERRCODE_DATABASE_DROPPED), errmsg("terminating connection due to conflict with recovery"), errdetail_recovery_conflict())); } The simplest thing to do seems to be to make the second one read "terminating connection because the database has been dropped". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2011-02-02 at 21:21 -0500, Robert Haas wrote: > On Tue, Feb 1, 2011 at 3:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > ERRCODE_DATABASE_DROPPED 57P04 looks best > > So I guess the only remaining issue is whether we should distinguish > the error message text, as well as the error codes. Tom was in favor > of that upthread, and I am too. Right now we have: > > else if (RecoveryConflictPending && RecoveryConflictRetryable) > { > pgstat_report_recovery_conflict(RecoveryConflictReason); > ereport(FATAL, > > (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), > errmsg("terminating connection due to > conflict with recovery"), > errdetail_recovery_conflict())); > } > else if (RecoveryConflictPending) > { > /* Currently there is only one non-retryable > recovery conflict */ > Assert(RecoveryConflictReason == > PROCSIG_RECOVERY_CONFLICT_DATABASE); > pgstat_report_recovery_conflict(RecoveryConflictReason); > ereport(FATAL, > (errcode(ERRCODE_DATABASE_DROPPED), > errmsg("terminating connection due to > conflict with recovery"), > errdetail_recovery_conflict())); > } > > The simplest thing to do seems to be to make the second one read > "terminating connection because the database has been dropped". The error text is already differentiated by errdetail_recovery_conflict(). -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Feb 7, 2011 at 4:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> So I guess the only remaining issue is whether we should distinguish >> the error message text, as well as the error codes. Tom was in favor >> of that upthread, and I am too. Right now we have: > The error text is already differentiated by > errdetail_recovery_conflict(). OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company