Thread: Error code for "terminating connection due to conflict with recovery"

Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

08 January 2011, 10:52:10

While looking at the backend code, I realized that error code for
"terminating connection due to conflict with recovery" is
ERRCODE_ADMIN_SHUTDOWN.

I thought the error code is for somewhat a human interruption, such as
shutdown command issued by pg_ctl. Is the usage of the error code
appropreate?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

08 January 2011, 23:39:41

On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> While looking at the backend code, I realized that error code for
> "terminating connection due to conflict with recovery" is
> ERRCODE_ADMIN_SHUTDOWN.
>
> I thought the error code is for somewhat a human interruption, such as
> shutdown command issued by pg_ctl. Is the usage of the error code
> appropreate?

That doesn't sound right to me.  I'd have thought something in class 40.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

11 January 2011, 02:29:52

> On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> While looking at the backend code, I realized that error code for
>> "terminating connection due to conflict with recovery" is
>> ERRCODE_ADMIN_SHUTDOWN.
>>
>> I thought the error code is for somewhat a human interruption, such as
>> shutdown command issued by pg_ctl. Is the usage of the error code
>> appropreate?
> 
> That doesn't sound right to me.  I'd have thought something in class 40.

What about:

40004 CONFLICT WITH RECOVERY conflict_with_recovery
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

11 January 2011, 07:22:11

On Tue, Jan 11, 2011 at 1:30 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> While looking at the backend code, I realized that error code for
>>> "terminating connection due to conflict with recovery" is
>>> ERRCODE_ADMIN_SHUTDOWN.
>>>
>>> I thought the error code is for somewhat a human interruption, such as
>>> shutdown command issued by pg_ctl. Is the usage of the error code
>>> appropreate?
>>
>> That doesn't sound right to me.  I'd have thought something in class 40.
>
> What about:
>
> 40004 CONFLICT WITH RECOVERY conflict_with_recovery

We should respect the following convention, from errcodes.h:
* The convention is that new error codes defined by PostgreSQL in a* class defined by the standard have a subclass
valuethat begins* with 'P'. In addition, error codes defined by PostgreSQL clients* (such as ecpg) have a class value
thatbegins with 'Y'. 

And don't forget there are three places where the new error code would
need to be added.

Otherwise, +1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

13 January 2011, 03:13:48

>>> That doesn't sound right to me.  I'd have thought something in class 40.
>>
>> What about:
>>
>> 40004 CONFLICT WITH RECOVERY conflict_with_recovery
>
> We should respect the following convention, from errcodes.h:
>
>  * The convention is that new error codes defined by PostgreSQL in a
>  * class defined by the standard have a subclass value that begins
>  * with 'P'. In addition, error codes defined by PostgreSQL clients
>  * (such as ecpg) have a class value that begins with 'Y'.
>
> And don't forget there are three places where the new error code would
> need to be added.
>
> Otherwise, +1.

Ok. Here is the patch for this. I use 40P02, instead of 40004.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
*** a/doc/src/sgml/errcodes.sgml
--- b/doc/src/sgml/errcodes.sgml
***************
*** 985,990 ****
--- 985,995 ---- <entry>deadlock_detected</entry> </row> 
+ <row>
+ <entry><literal>40P02</literal></entry>
+ <entry>CONFLICT WITH RECOVERY</entry>
+ <entry>conflict_with_recovery</entry>
+ </row>  <row> <entry spanname="span13"><emphasis role="bold">Class 42 — Syntax Error or Access Rule
Violation</></entry>
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***************
*** 2909,2915 **** ProcessInterrupts(void)                      errdetail_recovery_conflict()));         else if
(RecoveryConflictPending)            ereport(FATAL,
 
!                     (errcode(ERRCODE_ADMIN_SHUTDOWN),               errmsg("terminating connection due to conflict
withrecovery"),                      errdetail_recovery_conflict()));         else
 
--- 2909,2915 ----                      errdetail_recovery_conflict()));         else if (RecoveryConflictPending)
      ereport(FATAL,
 
!                     (errcode(ERRCODE_CONFLICT_WITH_RECOVERY),               errmsg("terminating connection due to
conflictwith recovery"),                      errdetail_recovery_conflict()));         else
 
*** a/src/include/utils/errcodes.h
--- b/src/include/utils/errcodes.h
***************
*** 243,248 ****
--- 243,249 ---- #define ERRCODE_T_R_SERIALIZATION_FAILURE    MAKE_SQLSTATE('4','0', '0','0','1') #define
ERRCODE_T_R_STATEMENT_COMPLETION_UNKNOWN   MAKE_SQLSTATE('4','0', '0','0','3') #define ERRCODE_T_R_DEADLOCK_DETECTED
   MAKE_SQLSTATE('4','0', 'P','0','1')
 
+ #define ERRCODE_CONFLICT_WITH_RECOVERY    MAKE_SQLSTATE('4','0', 'P','0','2')  /* Class 42 - Syntax Error or Access
RuleViolation */ #define ERRCODE_SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION        MAKE_SQLSTATE('4','2', '0','0','0')
 
*** a/src/pl/plpgsql/src/plerrcodes.h
--- b/src/pl/plpgsql/src/plerrcodes.h
***************
*** 484,489 ****
--- 484,493 ---- },  {
+     "conflict_with_recovery", ERRCODE_CONFLICT_WITH_RECOVERY
+ },
+ 
+ {     "syntax_error_or_access_rule_violation", ERRCODE_SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION },

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

13 January 2011, 06:45:55

On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
> Ok. Here is the patch for this. I use 40P02, instead of 40004.

Please add this to the currently open CommitFest:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

13 January 2011, 20:04:00

> On Thu, Jan 13, 2011 at 2:13 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Ok. Here is the patch for this. I use 40P02, instead of 40004.
> 
> Please add this to the currently open CommitFest:
> 
> https://commitfest.postgresql.org/action/commitfest_view/open

Done. Comments are welcome. Unless there's objection, I will commit it
this weekend.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

13 January 2011, 20:15:16

Tatsuo Ishii <ishii@postgresql.org> writes:
>> Please add this to the currently open CommitFest:
>> https://commitfest.postgresql.org/action/commitfest_view/open

> Done. Comments are welcome. Unless there's objection, I will commit it
> this weekend.

If you're expecting anyone to actually *review* it during the CF,
that's a bit premature.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

13 January 2011, 20:30:26

> Tatsuo Ishii <ishii@postgresql.org> writes:
>>> Please add this to the currently open CommitFest:
>>> https://commitfest.postgresql.org/action/commitfest_view/open
> 
>> Done. Comments are welcome. Unless there's objection, I will commit it
>> this weekend.
> 
> If you're expecting anyone to actually *review* it during the CF,
> that's a bit premature.

No problem to wait for longer. I will wait by the end of January for
the present.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

14 January 2011, 09:20:42

On Thu, Jan 13, 2011 at 7:31 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Tatsuo Ishii <ishii@postgresql.org> writes:
>>>> Please add this to the currently open CommitFest:
>>>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>>> Done. Comments are welcome. Unless there's objection, I will commit it
>>> this weekend.
>>
>> If you're expecting anyone to actually *review* it during the CF,
>> that's a bit premature.
>
> No problem to wait for longer. I will wait by the end of January for
> the present.

Review:

The only possible point of concern I see here is the naming of the C
identifier.  Everything else in class 40 uses ERRCODE_T_R_whatever,
with T_R standing for transaction rollback.  It's not obvious to me
that that convention has any real value, but perhaps we ought to
follow it here for the sake of consistency?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

14 January 2011, 12:02:58

> Review:
> 
> The only possible point of concern I see here is the naming of the C
> identifier.  Everything else in class 40 uses ERRCODE_T_R_whatever,
> with T_R standing for transaction rollback.  It's not obvious to me
> that that convention has any real value, but perhaps we ought to
> follow it here for the sake of consistency?

Yeah. Actually at first I used "T_R" convention. After a few seconds
thought, I realized that "T_R" is not appropreate by the same reason
you feel. Possible other argument might be "Terminating connection
always involves transaction rollback. So using T_R is ok". I'm not
sure this argument is reasonable enough though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

14 January 2011, 12:48:27

On Fri, Jan 14, 2011 at 10:53 AM, Tatsuo Ishii <ishii@postgresql.org> wrote:
>> Review:
>>
>> The only possible point of concern I see here is the naming of the C
>> identifier.  Everything else in class 40 uses ERRCODE_T_R_whatever,
>> with T_R standing for transaction rollback.  It's not obvious to me
>> that that convention has any real value, but perhaps we ought to
>> follow it here for the sake of consistency?
>
> Yeah. Actually at first I used "T_R" convention. After a few seconds
> thought, I realized that "T_R" is not appropreate by the same reason
> you feel. Possible other argument might be "Terminating connection
> always involves transaction rollback. So using T_R is ok". I'm not
> sure this argument is reasonable enough though.

Looking at this a bit more carefully, I notice that there are two
cases when a recovery conflict occurs:

- we cancel the currently running statement, or
- we kill the whole connection

Should those use the same error code, or different ones?  This patch
doesn't appear to update all the places where recovery conflicts can
occur, which is probably not ideal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

14 January 2011, 13:04:14

Tatsuo Ishii <ishii@postgresql.org> writes:
>> Review:
>> The only possible point of concern I see here is the naming of the C
>> identifier.  Everything else in class 40 uses ERRCODE_T_R_whatever,
>> with T_R standing for transaction rollback.  It's not obvious to me
>> that that convention has any real value, but perhaps we ought to
>> follow it here for the sake of consistency?

> Yeah. Actually at first I used "T_R" convention. After a few seconds
> thought, I realized that "T_R" is not appropreate by the same reason
> you feel. Possible other argument might be "Terminating connection
> always involves transaction rollback. So using T_R is ok". I'm not
> sure this argument is reasonable enough though.

This is not only a matter of some macro name or other.  According to the
SQL standard, class 40 itself is defined as "transaction rollback".
If the error condition can't reasonably be regarded as a subcase of
that, you're making a bad choice of SQLSTATE code.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

14 January 2011, 13:30:01

On Fri, 2011-01-14 at 12:04 -0500, Tom Lane wrote:
> Tatsuo Ishii <ishii@postgresql.org> writes:
> >> Review:
> >> The only possible point of concern I see here is the naming of the C
> >> identifier.  Everything else in class 40 uses ERRCODE_T_R_whatever,
> >> with T_R standing for transaction rollback.  It's not obvious to me
> >> that that convention has any real value, but perhaps we ought to
> >> follow it here for the sake of consistency?
> 
> > Yeah. Actually at first I used "T_R" convention. After a few seconds
> > thought, I realized that "T_R" is not appropreate by the same reason
> > you feel. Possible other argument might be "Terminating connection
> > always involves transaction rollback. So using T_R is ok". I'm not
> > sure this argument is reasonable enough though.
> 
> This is not only a matter of some macro name or other.  According to the
> SQL standard, class 40 itself is defined as "transaction rollback".
> If the error condition can't reasonably be regarded as a subcase of
> that, you're making a bad choice of SQLSTATE code.

This whole thing is confused. No change is appropriate here at all.

We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
recovery conflicts.

We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
which occurs if someone drops the database that the user was connected
to when they get kicked off. That code exists specifically to inform the
user that they *cannot* reconnect. So pgpool should not be trying to
trap that error and reconnect.

So the whole basis for the existence of this patch is moot.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

14 January 2011, 14:51:09

On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> This whole thing is confused. No change is appropriate here at all.
>
> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
> recovery conflicts.
>
> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
> which occurs if someone drops the database that the user was connected
> to when they get kicked off. That code exists specifically to inform the
> user that they *cannot* reconnect. So pgpool should not be trying to
> trap that error and reconnect.

CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
and sometimes uses ERRCODE_ADMIN_SHUTDOWN.  It seems to me that it
wouldn't be a bad thing to be a bit more consistent, and perhaps to
have dedicated error codes for recovery conflicts.  This bit strikes
me as particularly strange:
               else if (RecoveryConflictPending && RecoveryConflictRetryable)               {
pgstat_report_recovery_conflict(RecoveryConflictReason);                      ereport(FATAL,
 

(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),                         errmsg("terminating connection due to
conflict with recovery"),                                        errdetail_recovery_conflict()));               }
       else if (RecoveryConflictPending)               {
pgstat_report_recovery_conflict(RecoveryConflictReason);                      ereport(FATAL,
          (errcode(ERRCODE_ADMIN_SHUTDOWN),                         errmsg("terminating connection due to
 
conflict with recovery"),                                        errdetail_recovery_conflict()));               }

That's the same error message at the same severity level with two
different SQLSTATEs depending on RecoveryConflictRetryable.  Seems a
bit cryptic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

21 January 2011, 00:24:49

On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> This whole thing is confused. No change is appropriate here at all.
>>
>> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
>> recovery conflicts.
>>
>> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
>> which occurs if someone drops the database that the user was connected
>> to when they get kicked off. That code exists specifically to inform the
>> user that they *cannot* reconnect. So pgpool should not be trying to
>> trap that error and reconnect.
>
> CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
> ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
> and sometimes uses ERRCODE_ADMIN_SHUTDOWN.  It seems to me that it
> wouldn't be a bad thing to be a bit more consistent, and perhaps to
> have dedicated error codes for recovery conflicts.  This bit strikes
> me as particularly strange:
>
>                else if (RecoveryConflictPending && RecoveryConflictRetryable)
>                {
>                        pgstat_report_recovery_conflict(RecoveryConflictReason);
>                        ereport(FATAL,
>
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>                          errmsg("terminating connection due to
> conflict with recovery"),
>                                         errdetail_recovery_conflict()));
>                }
>                else if (RecoveryConflictPending)
>                {
>                        pgstat_report_recovery_conflict(RecoveryConflictReason);
>                        ereport(FATAL,
>                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                          errmsg("terminating connection due to
> conflict with recovery"),
>                                         errdetail_recovery_conflict()));
>                }
>
> That's the same error message at the same severity level with two
> different SQLSTATEs depending on RecoveryConflictRetryable.  Seems a
> bit cryptic.

So what we do want to do about this?

I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.

I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.

It's no longer clear to me that we actually need a new error code for
this - using the same one everywhere seems like it might be
sufficient, unless someone wants to make an argument why it isn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

21 January 2011, 00:48:25

> On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> This whole thing is confused. No change is appropriate here at all.
>>>
>>> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
>>> recovery conflicts.
>>>
>>> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
>>> which occurs if someone drops the database that the user was connected
>>> to when they get kicked off. That code exists specifically to inform the
>>> user that they *cannot* reconnect. So pgpool should not be trying to
>>> trap that error and reconnect.
>>
>> CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
>> ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
>> and sometimes uses ERRCODE_ADMIN_SHUTDOWN.  It seems to me that it
>> wouldn't be a bad thing to be a bit more consistent, and perhaps to
>> have dedicated error codes for recovery conflicts.  This bit strikes
>> me as particularly strange:
>>
>>                else if (RecoveryConflictPending && RecoveryConflictRetryable)
>>                {
>>                        pgstat_report_recovery_conflict(RecoveryConflictReason);
>>                        ereport(FATAL,
>>
>> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>>                          errmsg("terminating connection due to
>> conflict with recovery"),
>>                                         errdetail_recovery_conflict()));
>>                }
>>                else if (RecoveryConflictPending)
>>                {
>>                        pgstat_report_recovery_conflict(RecoveryConflictReason);
>>                        ereport(FATAL,
>>                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>                          errmsg("terminating connection due to
>> conflict with recovery"),
>>                                         errdetail_recovery_conflict()));
>>                }
>>
>> That's the same error message at the same severity level with two
>> different SQLSTATEs depending on RecoveryConflictRetryable.  Seems a
>> bit cryptic.
>
> So what we do want to do about this?
>
> I'm pretty well convinced that we should NOT be issuing
> ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
> fixed by a trivial simplification of the code posted above, without
> introducing any new error code.

I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
conflict. And if your proposal does not need to introduce new error
code, I also agree with not inventing new error code.

> I'd also be in favor of changing the one that uses
> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
> the former might be taken to imply active user intervention, and for
> consistency.

+1.

> It's no longer clear to me that we actually need a new error code for
> this - using the same one everywhere seems like it might be
> sufficient, unless someone wants to make an argument why it isn't.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

21 January 2011, 05:16:36

On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote:
> > 
> > I'm pretty well convinced that we should NOT be issuing
> > ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
> > fixed by a trivial simplification of the code posted above, without
> > introducing any new error code.
> 
> I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
> conflict. And if your proposal does not need to introduce new error
> code, I also agree with not inventing new error code.

> > I'd also be in favor of changing the one that uses
> > ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
> > the former might be taken to imply active user intervention, and for
> > consistency.
> 
> +1.

We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
which is almost every error. So no change required there.

ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
reconnect or retry because the database we said we wished to connect to
no longer exists. That needs to be a different error code to a normal,
retryable error, so that pgpool can tell the difference between things
it can help with and things it cannot help with.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Florian Pflug

Date:

21 January 2011, 08:09:51

On Jan21, 2011, at 10:16 , Simon Riggs wrote:
> On Fri, 2011-01-21 at 13:49 +0900, Tatsuo Ishii wrote:
>> 
>>> I'm pretty well convinced that we should NOT be issuing
>>> ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
>>> fixed by a trivial simplification of the code posted above, without
>>> introducing any new error code.
>> 
>> I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
>> conflict. And if your proposal does not need to introduce new error
>> code, I also agree with not inventing new error code.
> 
>>> I'd also be in favor of changing the one that uses
>>> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
>>> the former might be taken to imply active user intervention, and for
>>> consistency.
>> 
>> +1.
> 
> We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
> which is almost every error. So no change required there.
> 
> ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
> reconnect or retry because the database we said we wished to connect to
> no longer exists. That needs to be a different error code to a normal,
> retryable error, so that pgpool can tell the difference between things
> it can help with and things it cannot help with.

Yeah. Clients absolutely need to be able to distinguish transient and
permanent errors. Otherwise, how would a client know when to retry
a transaction (as he needs to in case of a serialization anomaly) and
when to report the error to the user?

ERRCODE_T_R_SERIALIZATION_FAILURE  and ERRCODE_T_R_DEADLOCK_DETECTED
are probably both assumed to be transient failure by client aready. So
we should use those two for transient recovery conflicts (i.e. those
which go away if you retry) and something else for the others (like
database dropped)

This'd mean that the code is fine as it is, except that we should
raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED
in CheckRecoveryConflictDeadlock(). I might be missing something though -
Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there?

best regards,
Florian Pflug

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

21 January 2011, 08:49:02

On Fri, 2011-01-21 at 13:09 +0100, Florian Pflug wrote:
> > 
> >>> I'd also be in favor of changing the one that uses
> >>> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
> >>> the former might be taken to imply active user intervention, and for
> >>> consistency.
> >> 
> >> +1.
> > 
> > We already use ERRCODE_T_R_SERIALIZATION_FAILURE for retryable errors,
> > which is almost every error. So no change required there.
> > 
> > ERRCODE_ADMIN_SHUTDOWN is used only in situations where we cannot
> > reconnect or retry because the database we said we wished to connect to
> > no longer exists. That needs to be a different error code to a normal,
> > retryable error, so that pgpool can tell the difference between things
> > it can help with and things it cannot help with.
> 
> Yeah. Clients absolutely need to be able to distinguish transient and
> permanent errors. Otherwise, how would a client know when to retry
> a transaction (as he needs to in case of a serialization anomaly) and
> when to report the error to the user?
> 
> ERRCODE_T_R_SERIALIZATION_FAILURE  and ERRCODE_T_R_DEADLOCK_DETECTED
> are probably both assumed to be transient failure by client aready. So
> we should use those two for transient recovery conflicts (i.e. those
> which go away if you retry) and something else for the others (like
> database dropped)
> 
> This'd mean that the code is fine as it is, except that we should
> raise ERRCODE_T_R_DEADLOCK_DETECTED instead of ERRCODE_QUERY_CANCELED
> in CheckRecoveryConflictDeadlock(). I might be missing something though -
> Simon, what were your reasons for using ERRCODE_QUERY_CANCELED there?

Ah, thanks Florian. Now I understand. There are two related issues here.

1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the
specific patch should be rejected as is. No changes are required in
ProcessInterrupts(), nor new errcodes.

2. Robert is correct that CheckRecoveryConflictDeadlock() returns
ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had
switched away from the original discussion onto another part of the
code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a
mistake; CheckRecoveryConflictDeadlock() should return
ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12
May 2010.

Tatsuo, would you like to modify the patch to correct the issue in
CheckRecoveryConflictDeadlock() ? Or would you prefer me to fix?

This should be backpatched to 9.0.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

21 January 2011, 09:28:40

On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Ah, thanks Florian. Now I understand. There are two related issues here.
>
> 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the
> specific patch should be rejected as is. No changes are required in
> ProcessInterrupts(), nor new errcodes.

Can you please justify that statement instead of simply asserting it?
Tatsuo-san and I both seem to agree that it looks wrong.
ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's
used elsewhere when a SIGTERM is received and the database is shutting
down.  That's a world away from what's actually happening here.
Wanting to have a different error code for this type of failure may
make sense, but that doesn't mean that this is the right one.

> 2. Robert is correct that CheckRecoveryConflictDeadlock() returns
> ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had
> switched away from the original discussion onto another part of the
> code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a
> mistake; CheckRecoveryConflictDeadlock() should return
> ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12
> May 2010.

This part sounds good.

> This should be backpatched to 9.0.

Hmm, I don't necessarily agree.  The standard for changing behavior in
an existing release is fairly high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

30 January 2011, 15:56:26

On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Ah, thanks Florian. Now I understand. There are two related issues here.
>>
>> 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the
>> specific patch should be rejected as is. No changes are required in
>> ProcessInterrupts(), nor new errcodes.
>
> Can you please justify that statement instead of simply asserting it?
> Tatsuo-san and I both seem to agree that it looks wrong.
> ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's
> used elsewhere when a SIGTERM is received and the database is shutting
> down.  That's a world away from what's actually happening here.
> Wanting to have a different error code for this type of failure may
> make sense, but that doesn't mean that this is the right one.
>
>> 2. Robert is correct that CheckRecoveryConflictDeadlock() returns
>> ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had
>> switched away from the original discussion onto another part of the
>> code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a
>> mistake; CheckRecoveryConflictDeadlock() should return
>> ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12
>> May 2010.
>
> This part sounds good.
>
>> This should be backpatched to 9.0.
>
> Hmm, I don't necessarily agree.  The standard for changing behavior in
> an existing release is fairly high.

It seems like we have consensus to change
CheckRecoveryConflictDetected() to return
ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also
change that in 9.0 (votes: Robert - for, Simon - against) and arguably
not on whether to change the case that returns ERROR_ADMIN_SHUTDOWN
for a recovery conflict (votes: Robert, Tatsuo-san - for, Simon -
against).

Anyone else want to weigh in?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tatsuo Ishii

Date:

30 January 2011, 20:33:42

Robert,

> On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jan 21, 2011 at 7:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Ah, thanks Florian. Now I understand. There are two related issues here.
>>>
>>> 1. The discussion around ERRCODE_ADMIN_SHUTDOWN is incorrect and the
>>> specific patch should be rejected as is. No changes are required in
>>> ProcessInterrupts(), nor new errcodes.
>>
>> Can you please justify that statement instead of simply asserting it?
>> Tatsuo-san and I both seem to agree that it looks wrong.
>> ERRCODE_ADMIN_SHUTDOWN is in class 57, operator intervention, and it's
>> used elsewhere when a SIGTERM is received and the database is shutting
>> down.  That's a world away from what's actually happening here.
>> Wanting to have a different error code for this type of failure may
>> make sense, but that doesn't mean that this is the right one.
>>
>>> 2. Robert is correct that CheckRecoveryConflictDeadlock() returns
>>> ERRCODE_QUERY_CANCELED. Thanks to Florian for noting that we had
>>> switched away from the original discussion onto another part of the
>>> code, which confused me. I agree the use of ERRCODE_QUERY_CANCELED is a
>>> mistake; CheckRecoveryConflictDeadlock() should return
>>> ERRCODE_T_R_DEADLOCK_DETECTED. This was an omission from my commit of 12
>>> May 2010.
>>
>> This part sounds good.
>>
>>> This should be backpatched to 9.0.
>>
>> Hmm, I don't necessarily agree.  The standard for changing behavior in
>> an existing release is fairly high.
>
> It seems like we have consensus to change
> CheckRecoveryConflictDetected() to return
> ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also
> change that in 9.0 (votes: Robert - for, Simon - against)

Please note that I'm with you.

> and arguably
> not on whether to change the case that returns ERROR_ADMIN_SHUTDOWN
> for a recovery conflict (votes: Robert, Tatsuo-san - for, Simon -
> against).
>
> Anyone else want to weigh in?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

30 January 2011, 21:12:22

Tatsuo Ishii <ishii@postgresql.org> writes:
> On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> It seems like we have consensus to change
>> CheckRecoveryConflictDetected() to return
>> ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also
>> change that in 9.0 (votes: Robert - for, Simon - against)

> Please note that I'm with you.

I'm inclined to agree that 9.0 is new enough that changing this
shouldn't be too traumatic.  If we leave it till 9.1 there will
probably be more pain not less.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

30 January 2011, 21:18:12

On Sun, Jan 30, 2011 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Tatsuo Ishii <ishii@postgresql.org> writes:
>> On Fri, Jan 21, 2011 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> It seems like we have consensus to change
>>> CheckRecoveryConflictDetected() to return
>>> ERRCODE_T_R_DEADLOCK_DETECTED in 9.1, but not on whether to also
>>> change that in 9.0 (votes: Robert - for, Simon - against)
>
>> Please note that I'm with you.
>
> I'm inclined to agree that 9.0 is new enough that changing this
> shouldn't be too traumatic.  If we leave it till 9.1 there will
> probably be more pain not less.

I have done a poor job of summarizing the previous votes.  Simon was
arguing FOR back-patching, and I was arguing AGAINST it.

But if you and Tatsuo-san are both in favor of it, then perhaps we
should do it.  However, that might require people to write application
code which has to be aware of which minor version it's talking to,
which is usually something we try hard to avoid.

Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

30 January 2011, 21:42:30

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jan 30, 2011 at 8:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm inclined to agree that 9.0 is new enough that changing this
>> shouldn't be too traumatic. �If we leave it till 9.1 there will
>> probably be more pain not less.

> But if you and Tatsuo-san are both in favor of it, then perhaps we
> should do it.  However, that might require people to write application
> code which has to be aware of which minor version it's talking to,
> which is usually something we try hard to avoid.

Well, the thing is that if we delay changing this until 9.1, then pretty
much every app that cares about this will have to be prepared to deal
with both possibilities, and will thus have to contain some
version-sensitive code, and will have to keep dealing with that for the
support lifetime of 9.0.x.  My thought is that if it changes in 9.0.4,
client authors will probably be able to say "we support >= 9.0.4" and
just deal with one behavior.  9.0 hasn't yet had so much uptake that
this is an infeasible position, especially given the rate at which we're
still finding serious bugs in it.  9.0.early isn't going to be getting
run by the sorts of DBAs who would care about apps that care about this.

IOW, I think we can still consider changing error codes that are new in
9.0.x as a bug fix.

> Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN?

Pretty much the same argument here, I think: if we are going to change
the SQLSTATE we should do it now not later.  However, I think Simon
was actually arguing to not change this one either now or later, and
that might also be a defensible position.

BTW, so far as this goes:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php
we should certainly *not* have the same text for two different
SQLSTATEs.  If it's worth distinguishing two cases then it's worth
providing different texts that make it clear what the cases are.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

30 January 2011, 21:53:36

On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Any opinion on what to do about the one that's returning ERRCODE_ADMIN_SHUTDOWN?
>
> Pretty much the same argument here, I think: if we are going to change
> the SQLSTATE we should do it now not later.  However, I think Simon
> was actually arguing to not change this one either now or later, and
> that might also be a defensible position.
>
> BTW, so far as this goes:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php
> we should certainly *not* have the same text for two different
> SQLSTATEs.  If it's worth distinguishing two cases then it's worth
> providing different texts that make it clear what the cases are.

Well, then we either need to change the error codes to be the same, or
the texts to be different.

The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the
database gets dropped out from underneath the HS backend.  I don't
think it's worth having a separate path just to handle that case; if
the user retries the operation it should quickly become clear that the
database is gone.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

30 January 2011, 22:18:55

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> BTW, so far as this goes:
>> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php
>> we should certainly *not* have the same text for two different
>> SQLSTATEs. �If it's worth distinguishing two cases then it's worth
>> providing different texts that make it clear what the cases are.

> Well, then we either need to change the error codes to be the same, or
> the texts to be different.

> The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the
> database gets dropped out from underneath the HS backend.  I don't
> think it's worth having a separate path just to handle that case; if
> the user retries the operation it should quickly become clear that the
> database is gone.

Somone, Tatsuo-san IIRC, was saying that pgpool would really like to
know the difference so that it doesn't have to retry at all.  I'm not
sure how useful that really is, but that's the argument for having a
distinct SQLSTATE.  If we do believe that that's useful, I think it
should be a unique new SQLSTATE that never means anything other than
"your database got deleted from under you".  Piggybacking that meaning
on an existing SQLSTATE definitely seems completely bogus to me.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

30 January 2011, 22:47:15

On Sun, Jan 30, 2011 at 9:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jan 30, 2011 at 8:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> BTW, so far as this goes:
>>> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01152.php
>>> we should certainly *not* have the same text for two different
>>> SQLSTATEs.  If it's worth distinguishing two cases then it's worth
>>> providing different texts that make it clear what the cases are.
>
>> Well, then we either need to change the error codes to be the same, or
>> the texts to be different.
>
>> The only case in which we emit ERRCODE_ADMIN_SHUTDOWN is when the
>> database gets dropped out from underneath the HS backend.  I don't
>> think it's worth having a separate path just to handle that case; if
>> the user retries the operation it should quickly become clear that the
>> database is gone.
>
> Somone, Tatsuo-san IIRC, was saying that pgpool would really like to
> know the difference so that it doesn't have to retry at all.  I'm not
> sure how useful that really is, but that's the argument for having a
> distinct SQLSTATE.  If we do believe that that's useful, I think it
> should be a unique new SQLSTATE that never means anything other than
> "your database got deleted from under you".  Piggybacking that meaning
> on an existing SQLSTATE definitely seems completely bogus to me.

Actually, it was Simon and Florian who were arguing that we needed to
distinguish these cases from other types of recovery conflict;
Tatsuo-san was arguing that we needed to distinguish a
dropped-database-recovery-conflict from a cluster shutdown - the
current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing.

ISTM we can invent zero, one, or two new error codes here.  If we
invent zero, then we change all recovery conflicts to look like
serialization failures and call it good.  If we invent one, then we
make retryable recovery conflicts look like serialization failures and
the dropped-database case gets a newly minted error code that means
just that.  Or we can invent two, and make serialization failures
different from recovery conflicts, and retryable recovery conflicts
different from the dropped-database variety.

I don't have a terribly strong opinion as between those options.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Bruce Momjian

Date:

31 January 2011, 10:53:57

> Actually, it was Simon and Florian who were arguing that we needed to
> distinguish these cases from other types of recovery conflict;
> Tatsuo-san was arguing that we needed to distinguish a
> dropped-database-recovery-conflict from a cluster shutdown - the
> current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing.
> 
> ISTM we can invent zero, one, or two new error codes here.  If we
> invent zero, then we change all recovery conflicts to look like
> serialization failures and call it good.  If we invent one, then we
> make retryable recovery conflicts look like serialization failures and
> the dropped-database case gets a newly minted error code that means
> just that.  Or we can invent two, and make serialization failures
> different from recovery conflicts, and retryable recovery conflicts
> different from the dropped-database variety.
> 
> I don't have a terribly strong opinion as between those options.

As a novice I am not sure why we _wouldn't_ create two new separate
error codes --- it not not like they cost us anything, and they
certainly sound distinct.  The requirement to retry is clearly something
we want to avoid if we get a new error code.

Backpatching to 9.0 makes sense too, though the problem is the delay in
getting the code into a released minor version.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Error code for "terminating connection due to conflict with recovery"

From

"Kevin Grittner"

Date:

31 January 2011, 11:31:33

Bruce Momjian <bruce@momjian.us> wrote:
> As a novice I am not sure why we _wouldn't_ create two new
> separate error codes
The argument for using SQLSTATE 40001 for failures which are
strictly due to concurrency problems, and are likely to work if the
transaction is retried, is that there is already a lot of software
which knows how to do that.  On the other hand, going into such code
to turn that into a list of concurrency failure states is probably
only going to cause pain to those with applications intended to work
with multiple DBMS products without much modification.
-Kevin

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 11:37:38

On Mon, 2011-01-31 at 09:46 -0500, Bruce Momjian wrote:
> > Actually, it was Simon and Florian who were arguing that we needed to
> > distinguish these cases from other types of recovery conflict;
> > Tatsuo-san was arguing that we needed to distinguish a
> > dropped-database-recovery-conflict from a cluster shutdown - the
> > current choice of ERRCODE_ADMIN_SHUTDOWN makes that confusing.
> > 
> > ISTM we can invent zero, one, or two new error codes here.  If we
> > invent zero, then we change all recovery conflicts to look like
> > serialization failures and call it good.  If we invent one, then we
> > make retryable recovery conflicts look like serialization failures and
> > the dropped-database case gets a newly minted error code that means
> > just that.  Or we can invent two, and make serialization failures
> > different from recovery conflicts, and retryable recovery conflicts
> > different from the dropped-database variety.
> > 
> > I don't have a terribly strong opinion as between those options.
> 
> As a novice I am not sure why we _wouldn't_ create two new separate
> error codes --- it not not like they cost us anything, and they
> certainly sound distinct.  The requirement to retry is clearly something
> we want to avoid if we get a new error code.

It's the way it was because of discussion during 9.0.
The errors for "serialization error" and "deadlock" are appropriate
because everybody knows these exist. If you invent a new error code then
you'll need to re-program loads of applications which would all "just
work" if we use those error codes. Kevin specifically requested it for
that reason.

That leaves the error code for the drop database case. As both Tatsuo
and myself have said, it needs to be different so we do not retry it. I
don't personally think an edge case like that needs a whole new error
code to explain it.

> Backpatching to 9.0 makes sense too, though the problem is the delay in
> getting the code into a released minor version.

I treat this as a bugfix, so backpatch is appropriate. My earlier "fix"
in 9.0 beta didn't cover the case pointed out by Robert/Florian and as
Tatsuo points out, we need to have a retryable error for all common
cases.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Bruce Momjian

Date:

31 January 2011, 11:51:29

Kevin Grittner wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
>  
> > As a novice I am not sure why we _wouldn't_ create two new
> > separate error codes
>  
> The argument for using SQLSTATE 40001 for failures which are
> strictly due to concurrency problems, and are likely to work if the
> transaction is retried, is that there is already a lot of software
> which knows how to do that.  On the other hand, going into such code
> to turn that into a list of concurrency failure states is probably
> only going to cause pain to those with applications intended to work
> with multiple DBMS products without much modification.

The way they usually handle that is by having a class of codes that
designate that behavior, but I can see now that the number can't be
subdivided.  :-(

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 12:32:24

On Mon, Jan 31, 2011 at 10:31 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Bruce Momjian <bruce@momjian.us> wrote:
>> As a novice I am not sure why we _wouldn't_ create two new
>> separate error codes
>
> The argument for using SQLSTATE 40001 for failures which are
> strictly due to concurrency problems, and are likely to work if the
> transaction is retried, is that there is already a lot of software
> which knows how to do that.  On the other hand, going into such code
> to turn that into a list of concurrency failure states is probably
> only going to cause pain to those with applications intended to work
> with multiple DBMS products without much modification.

Yeah, I think that one's pretty logical.  I think my vote is to either
change the drop-database case to be the same as that, or to use a new
error code.  ERRCODE_ADMIN_SHUTDOWN is just strange.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 15:42:20

On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote:
> On Mon, Jan 31, 2011 at 10:31 AM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
> > Bruce Momjian <bruce@momjian.us> wrote:
> >> As a novice I am not sure why we _wouldn't_ create two new
> >> separate error codes
> >
> > The argument for using SQLSTATE 40001 for failures which are
> > strictly due to concurrency problems, and are likely to work if the
> > transaction is retried, is that there is already a lot of software
> > which knows how to do that.  On the other hand, going into such code
> > to turn that into a list of concurrency failure states is probably
> > only going to cause pain to those with applications intended to work
> > with multiple DBMS products without much modification.
> 
> Yeah, I think that one's pretty logical. 

>  I think my vote is to either
> change the drop-database case to be the same as that

No, we shouldn't have a "can retry" errcode for a "must not retry" case.

> , or to use a new
> error code.  ERRCODE_ADMIN_SHUTDOWN is just strange.

It's not strange at all. It's the same error code as we use for all of
the other cases listed. We need that because it is the current
catch-all errcode for "cannot retry".

The purpose of errcodes is to allow programs to check them and then act.
It's pointless to add a new errcode that is so rare that nobody will
ever program for it because they won't expect it, let alone test for it.
Or at least won't assign any sensible priority to handling that error.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 15:58:47

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote:
>> , or to use a new
>> error code.  ERRCODE_ADMIN_SHUTDOWN is just strange.

> It's not strange at all. It's the same error code as we use for all of
> the other cases listed. We need that because it is the current
> catch-all errcode for "cannot retry".

> The purpose of errcodes is to allow programs to check them and then act.
> It's pointless to add a new errcode that is so rare that nobody will
> ever program for it because they won't expect it, let alone test for it.
> Or at least won't assign any sensible priority to handling that error.

The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a
connection pooler to expect that *all* its connections are going bad,
not just the ones that are connected to a specific database.  I think
this is a bad decision.  Programs that are interested in testing for this
case at all are likely to need to be worried about that distinction.

Also, while I believe that ERRCODE_T_R_DEADLOCK_DETECTED is a reasonable
catchall retry code, I don't think it's equally sane to think that
ERRCODE_ADMIN_SHUTDOWN is a catchall non-retry code.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 16:05:04

On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Mon, 2011-01-31 at 11:24 -0500, Robert Haas wrote:
> >> , or to use a new
> >> error code.  ERRCODE_ADMIN_SHUTDOWN is just strange.
> 
> > It's not strange at all. It's the same error code as we use for all of
> > the other cases listed. We need that because it is the current
> > catch-all errcode for "cannot retry".
> 
> > The purpose of errcodes is to allow programs to check them and then act.
> > It's pointless to add a new errcode that is so rare that nobody will
> > ever program for it because they won't expect it, let alone test for it.
> > Or at least won't assign any sensible priority to handling that error.
> 
> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a
> connection pooler to expect that *all* its connections are going bad,
> not just the ones that are connected to a specific database.  I think
> this is a bad decision.  Programs that are interested in testing for this
> case at all are likely to need to be worried about that distinction.

That's a reasonable argument.

My objection to a new code is only to one that is so specific that
people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR.

Can we invent a new "catch-all" that might be used here? Something that
means "unknown operational error, not sure what to do".

ERRCODE_ADMIN_OTHER or ERRCODE_ADMIN_UNCLASSIFIED.

> Also, while I believe that ERRCODE_T_R_DEADLOCK_DETECTED is a reasonable
> catchall retry code, I don't think it's equally sane to think that
> ERRCODE_ADMIN_SHUTDOWN is a catchall non-retry code.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 17:25:14

Simon Riggs <simon@2ndQuadrant.com> writes:
> On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote:
>> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a
>> connection pooler to expect that *all* its connections are going bad,
>> not just the ones that are connected to a specific database.  I think
>> this is a bad decision.  Programs that are interested in testing for this
>> case at all are likely to need to be worried about that distinction.

> That's a reasonable argument.

> My objection to a new code is only to one that is so specific that
> people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR.

What's wrong with ERRCODE_DATABASE_DROPPED, or something like that?

> Can we invent a new "catch-all" that might be used here? Something that
> means "unknown operational error, not sure what to do".

Because that's not the situation here.  We know exactly what a pooler
should do.  It might be an infrequent case, but obscurantism isn't going
to help anyone.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Josh Berkus

Date:

31 January 2011, 17:31:28

On 1/31/11 11:26 AM, Simon Riggs wrote:
> The purpose of errcodes is to allow programs to check them and then act.
> It's pointless to add a new errcode that is so rare that nobody will
> ever program for it because they won't expect it, let alone test for it.
> Or at least won't assign any sensible priority to handling that error.

Personally, I would prefer a new error code because I *intend* to write
specific code to deal with replication cancellation.  Without a specific
code, then I cannot distinguish replication cancel from whatever else
it's lumped in with, except by regexping the error message.

If we're not going to get a new code, then I'd prefer DEADLOCK_DETECTED
because the behavior we expect from the client (try the whole
transaction over from the beginning) is the same.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 19:07:39

On Mon, 2011-01-31 at 16:24 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote:
> >> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a
> >> connection pooler to expect that *all* its connections are going bad,
> >> not just the ones that are connected to a specific database.  I think
> >> this is a bad decision.  Programs that are interested in testing for this
> >> case at all are likely to need to be worried about that distinction.
> 
> > That's a reasonable argument.
> 
> > My objection to a new code is only to one that is so specific that
> > people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR.
> 
> What's wrong with ERRCODE_DATABASE_DROPPED, or something like that?
> 
> > Can we invent a new "catch-all" that might be used here? Something that
> > means "unknown operational error, not sure what to do".
> 
> Because that's not the situation here.  We know exactly what a pooler
> should do.  It might be an infrequent case, but obscurantism isn't going
> to help anyone.

OK, that makes sense to me.

I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error,
rather than a Transaction Rollback code. So sqlstate 28P02

The sensible handling of such an error is not to retry, or at least to
switch to an alternate if one is available.

Ready to commit if no objection.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 19:21:14

On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Mon, 2011-01-31 at 16:24 -0500, Tom Lane wrote:
>> Simon Riggs <simon@2ndQuadrant.com> writes:
>> > On Mon, 2011-01-31 at 14:58 -0500, Tom Lane wrote:
>> >> The trouble with ERRCODE_ADMIN_SHUTDOWN is that it might lead a
>> >> connection pooler to expect that *all* its connections are going bad,
>> >> not just the ones that are connected to a specific database.  I think
>> >> this is a bad decision.  Programs that are interested in testing for this
>> >> case at all are likely to need to be worried about that distinction.
>>
>> > That's a reasonable argument.
>>
>> > My objection to a new code is only to one that is so specific that
>> > people have to program for ERRCODE_BLUE_MOON_ON_A_LEAP_YEAR.
>>
>> What's wrong with ERRCODE_DATABASE_DROPPED, or something like that?
>>
>> > Can we invent a new "catch-all" that might be used here? Something that
>> > means "unknown operational error, not sure what to do".
>>
>> Because that's not the situation here.  We know exactly what a pooler
>> should do.  It might be an infrequent case, but obscurantism isn't going
>> to help anyone.
>
> OK, that makes sense to me.
>
> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error,
> rather than a Transaction Rollback code. So sqlstate 28P02
>
> The sensible handling of such an error is not to retry, or at least to
> switch to an alternate if one is available.
>
> Ready to commit if no objection.

ISTM it should still be in class 40.  There's nothing wrong with the
user's authorization; we've just decided to roll back the transaction
for our own purposes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 20:09:18

On Mon, 2011-01-31 at 18:21 -0500, Robert Haas wrote:
> > Ready to commit if no objection.
> 
> ISTM it should still be in class 40.  There's nothing wrong with the
> user's authorization; we've just decided to roll back the transaction
> for our own purposes.

OK.

BTW, anybody know why we have PL/pgSQL condition codes for conditions
that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and
ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to
me.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 20:12:47

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error,
>> rather than a Transaction Rollback code. So sqlstate 28P02

> ISTM it should still be in class 40.  There's nothing wrong with the
> user's authorization; we've just decided to roll back the transaction
> for our own purposes.

I agree, 28 is a completely off-point category.  But it wasn't in 40
before, either --- we are talking about where it currently says
ADMIN_SHUTDOWN, no?  I'd vote for keeping it in class 57 (operator
intervention), as that is both sensible and a minimal change from
current behavior.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Josh Berkus

Date:

31 January 2011, 20:13:54

> BTW, anybody know why we have PL/pgSQL condition codes for conditions
> that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and
> ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to
> me.

So we can support autonomous transactions in the future?


--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 20:17:04

On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error,
>>> rather than a Transaction Rollback code. So sqlstate 28P02
>
>> ISTM it should still be in class 40.  There's nothing wrong with the
>> user's authorization; we've just decided to roll back the transaction
>> for our own purposes.
>
> I agree, 28 is a completely off-point category.  But it wasn't in 40
> before, either --- we are talking about where it currently says
> ADMIN_SHUTDOWN, no?  I'd vote for keeping it in class 57 (operator
> intervention), as that is both sensible and a minimal change from
> current behavior.

Seems a little weird to me, since the administrator hasn't done
anything.  It's the system that has decide to roll the transaction
back, not the operator.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 20:17:26

On Mon, Jan 31, 2011 at 7:13 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> BTW, anybody know why we have PL/pgSQL condition codes for conditions
>> that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and
>> ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to
>> me.
>
> So we can support autonomous transactions in the future?

Huh?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 20:26:16

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I agree, 28 is a completely off-point category. �But it wasn't in 40
>> before, either --- we are talking about where it currently says
>> ADMIN_SHUTDOWN, no? �I'd vote for keeping it in class 57 (operator
>> intervention), as that is both sensible and a minimal change from
>> current behavior.

> Seems a little weird to me, since the administrator hasn't done
> anything.

Sure he has: he issued the DROP DATABASE command that's causing the
system to disconnect standby sessions.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

31 January 2011, 20:30:47

On Mon, 2011-01-31 at 19:12 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jan 31, 2011 at 6:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> I would make ERRCODE_DATABASE_DROPPED an Invalid Authorization error,
> >> rather than a Transaction Rollback code. So sqlstate 28P02
> 
> > ISTM it should still be in class 40.  There's nothing wrong with the
> > user's authorization; we've just decided to roll back the transaction
> > for our own purposes.
> 
> I agree, 28 is a completely off-point category.  But it wasn't in 40
> before, either --- we are talking about where it currently says
> ADMIN_SHUTDOWN, no?  I'd vote for keeping it in class 57 (operator
> intervention), as that is both sensible and a minimal change from
> current behavior.

Sorry about that chaps, didn't see your emails: I just committed.

57 was another one I considered, as was 08 connection failures. I don't
think 40 was the right place, but my bed was calling. I'll sleep now and
fix in my morning according to what y'all decide.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 20:33:48

Simon Riggs <simon@2ndQuadrant.com> writes:
> BTW, anybody know why we have PL/pgSQL condition codes for conditions
> that can't be trapped by PL/pgSQL? ERRCODE_ADMIN_SHUTDOWN and
> ERRCODE_DATABASE_DROPPED are always FATAL. Seems like pointless code to
> me.

There's a difference between not being able to trap the error and not
being able to name it at all.  You might wish to throw it, for instance.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 21:34:39

On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Jan 31, 2011 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I agree, 28 is a completely off-point category.  But it wasn't in 40
>>> before, either --- we are talking about where it currently says
>>> ADMIN_SHUTDOWN, no?  I'd vote for keeping it in class 57 (operator
>>> intervention), as that is both sensible and a minimal change from
>>> current behavior.
>
>> Seems a little weird to me, since the administrator hasn't done
>> anything.
>
> Sure he has: he issued the DROP DATABASE command that's causing the
> system to disconnect standby sessions.

Well, I'm not sure how much this matters - as long as it's a dedicated
error code, the user can write code to DTRT somehow.  But I don't buy
your argument.  Ultimately, user activity causes any kind of recovery
conflict.  So I don't see why one particular kind of recovery conflict
should be in a different class than all the others.  It's the
administrator (I guess) who ran VACUUM FREEZE and blew up every query
running on the standby, too.  But the user doesn't directly control
when recovery conflicts get fired on the standby - it's the system
that decides to do that.  Right now, we happen to ignore
max_standby_delay for drop database, but in general the coupling
between when an action happens on the master and when conflicts occur
on the standby is fairly loose.

Then again - in theory, there's no reason why we couldn't drop a
database on the master when it's in use, kicking out everyone using it
with this very same error code.  We don't happen to handle it that way
right now, but...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

31 January 2011, 21:52:20

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> Seems a little weird to me, since the administrator hasn't done
>>> anything.

>> Sure he has: he issued the DROP DATABASE command that's causing the
>> system to disconnect standby sessions.

> Well, I'm not sure how much this matters - as long as it's a dedicated
> error code, the user can write code to DTRT somehow.  But I don't buy
> your argument.  Ultimately, user activity causes any kind of recovery
> conflict.

Well, yeah, but the predictability of the failure is pretty variable.
In this case we can say that the error definitely would not have
occurred if somebody hadn't done a DROP DATABASE on the master while
there were live sessions in that DB on the slave.  I think that's a
sufficiently close coupling to say that the error is the result of an
operator action.  OTOH, the occurrence of deadlocks is (usually) a lot
more dependent on random-chance timing of different transactions, and
you usually can't point to any action that intentionally caused a
deadlock.

> Then again - in theory, there's no reason why we couldn't drop a
> database on the master when it's in use, kicking out everyone using it
> with this very same error code.  We don't happen to handle it that way
> right now, but...

Yeah, that was in the back of my mind too.  "DROP DATABASE foo FORCE",
maybe?
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

31 January 2011, 22:29:40

On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Then again - in theory, there's no reason why we couldn't drop a
>> database on the master when it's in use, kicking out everyone using it
>> with this very same error code.  We don't happen to handle it that way
>> right now, but...
>
> Yeah, that was in the back of my mind too.  "DROP DATABASE foo FORCE",
> maybe?

I have to think some people would find that useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Magnus Hagander

Date:

01 February 2011, 02:35:35

On Tue, Feb 1, 2011 at 03:29, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Then again - in theory, there's no reason why we couldn't drop a
>>> database on the master when it's in use, kicking out everyone using it
>>> with this very same error code.  We don't happen to handle it that way
>>> right now, but...
>>
>> Yeah, that was in the back of my mind too.  "DROP DATABASE foo FORCE",
>> maybe?
>
> I have to think some people would find that useful.

Yes.

If nothing else, it would save some typing :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

01 February 2011, 04:13:10

On Mon, 2011-01-31 at 20:27 -0500, Robert Haas wrote:

> So I don't see why one particular kind of recovery conflict
> should be in a different class than all the others. 

This point has been explained many times and is very clear in the code.
It has a clear functional purpose, not decoration or mere personal
opinion.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

01 February 2011, 04:18:08

On Mon, 2011-01-31 at 20:52 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Jan 31, 2011 at 7:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
> >>> Seems a little weird to me, since the administrator hasn't done
> >>> anything.
> 
> >> Sure he has: he issued the DROP DATABASE command that's causing the
> >> system to disconnect standby sessions.
> 
> > Well, I'm not sure how much this matters - as long as it's a dedicated
> > error code, the user can write code to DTRT somehow.  But I don't buy
> > your argument.  Ultimately, user activity causes any kind of recovery
> > conflict.
> 
> Well, yeah, but the predictability of the failure is pretty variable.
> In this case we can say that the error definitely would not have
> occurred if somebody hadn't done a DROP DATABASE on the master while
> there were live sessions in that DB on the slave.  I think that's a
> sufficiently close coupling to say that the error is the result of an
> operator action.  OTOH, the occurrence of deadlocks is (usually) a lot
> more dependent on random-chance timing of different transactions, and
> you usually can't point to any action that intentionally caused a
> deadlock.

ERRCODE_DATABASE_DROPPED    57P04   looks best

The previous code was 57P01 so this is least change, if nothing else.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

01 February 2011, 05:36:38

On Tue, 2011-02-01 at 07:35 +0100, Magnus Hagander wrote:
> On Tue, Feb 1, 2011 at 03:29, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Jan 31, 2011 at 8:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Then again - in theory, there's no reason why we couldn't drop a
> >>> database on the master when it's in use, kicking out everyone using it
> >>> with this very same error code.  We don't happen to handle it that way
> >>> right now, but...
> >>
> >> Yeah, that was in the back of my mind too.  "DROP DATABASE foo FORCE",
> >> maybe?
> >
> > I have to think some people would find that useful.
>
> Yes.
>
> If nothing else, it would save some typing :-)

I like it also. It allows me to get rid of the concept of "non-retryable
recovery conflicts", so we just have one code path that works in both
normal mode and standby. Sweet.

Here's the basic patch, will work on the refactoring if no objections.

--
 Simon Riggs           http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

Attachment

drop_database_force.v1.patch

Re: Error code for "terminating connection due to conflict with recovery"

From

Tom Lane

Date:

01 February 2011, 12:48:09

Simon Riggs <simon@2ndQuadrant.com> writes:
> Here's the basic patch, will work on the refactoring if no objections.

ResolveRecoveryConflictWithDatabase works when you're not in recovery?
That seems pretty fragile at best.  In any case, this is a 9.2 feature
at the earliest, please do not expect people to spend time on it now.
        regards, tom lane

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

01 February 2011, 16:26:14

On Tue, 2011-02-01 at 11:43 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Here's the basic patch, will work on the refactoring if no objections.
> 
> ResolveRecoveryConflictWithDatabase works when you're not in recovery?
> That seems pretty fragile at best.  In any case, this is a 9.2 feature
> at the earliest, please do not expect people to spend time on it now.

Err, no. Sorry if that wasn't clear. 

The patch works but requires refactoring, which I referred to above.
What is now called ResolveRecoveryConflictWithDatabase() would be
changed to a name that makes sense in both normal running and recovery.
That seemed so obvious that I wouldn't be leaving it like that, so I
just referred to that indirectly, which was confusing.

That allows me to refactor out the code that we just spent too long
discussing, so I was hoping to do that in this release. I wasn't
expecting anybody else to spend time on it since it is simple and clear.

I'll add it to 9.2.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

02 February 2011, 22:21:15

On Tue, Feb 1, 2011 at 3:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> ERRCODE_DATABASE_DROPPED    57P04   looks best

So I guess the only remaining issue is whether we should distinguish
the error message text, as well as the error codes.  Tom was in favor
of that upthread, and I am too.  Right now we have:
               else if (RecoveryConflictPending && RecoveryConflictRetryable)               {
pgstat_report_recovery_conflict(RecoveryConflictReason);                      ereport(FATAL, 

(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),                         errmsg("terminating connection due to
conflict with recovery"),                                        errdetail_recovery_conflict()));               }
       else if (RecoveryConflictPending)               {                       /* Currently there is only one
non-retryable
recovery conflict */                       Assert(RecoveryConflictReason ==
PROCSIG_RECOVERY_CONFLICT_DATABASE);                       pgstat_report_recovery_conflict(RecoveryConflictReason);
                 ereport(FATAL,                                       (errcode(ERRCODE_DATABASE_DROPPED),
         errmsg("terminating connection due to 
conflict with recovery"),                                        errdetail_recovery_conflict()));               }

The simplest thing to do seems to be to make the second one read
"terminating connection because the database has been dropped".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Error code for "terminating connection due to conflict with recovery"

From

Simon Riggs

Date:

07 February 2011, 17:14:45

On Wed, 2011-02-02 at 21:21 -0500, Robert Haas wrote:
> On Tue, Feb 1, 2011 at 3:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > ERRCODE_DATABASE_DROPPED    57P04   looks best
> 
> So I guess the only remaining issue is whether we should distinguish
> the error message text, as well as the error codes.  Tom was in favor
> of that upthread, and I am too.  Right now we have:
> 
>                 else if (RecoveryConflictPending && RecoveryConflictRetryable)
>                 {
>                         pgstat_report_recovery_conflict(RecoveryConflictReason);
>                         ereport(FATAL,
> 
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>                           errmsg("terminating connection due to
> conflict with recovery"),
>                                          errdetail_recovery_conflict()));
>                 }
>                 else if (RecoveryConflictPending)
>                 {
>                         /* Currently there is only one non-retryable
> recovery conflict */
>                         Assert(RecoveryConflictReason ==
> PROCSIG_RECOVERY_CONFLICT_DATABASE);
>                         pgstat_report_recovery_conflict(RecoveryConflictReason);
>                         ereport(FATAL,
>                                         (errcode(ERRCODE_DATABASE_DROPPED),
>                           errmsg("terminating connection due to
> conflict with recovery"),
>                                          errdetail_recovery_conflict()));
>                 }
> 
> The simplest thing to do seems to be to make the second one read
> "terminating connection because the database has been dropped".

The error text is already differentiated by
errdetail_recovery_conflict().

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services

Re: Error code for "terminating connection due to conflict with recovery"

From

Robert Haas

Date:

07 February 2011, 22:22:23

On Mon, Feb 7, 2011 at 4:11 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> So I guess the only remaining issue is whether we should distinguish
>> the error message text, as well as the error codes.  Tom was in favor
>> of that upthread, and I am too.  Right now we have:
> The error text is already differentiated by
> errdetail_recovery_conflict().

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company