Re: [HACKERS] Determine if an error is transient by its error code. - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: [HACKERS] Determine if an error is transient by its error code.
Date
Msg-id CAMsr+YG3VkoV4P2SAqi16PeBMb8WBKi_FG7p0jFktz1rd4n1cg@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Determine if an error is transient by its error code.  ("Dominick O'Dierno" <odiernod@gmail.com>)
Responses Re: [HACKERS] Determine if an error is transient by its error code.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 20 March 2017 at 10:26, Dominick O'Dierno <odiernod@gmail.com> wrote:
> Hello folks,
>
> I'm trying to define a transient fault detection strategy for a client
> application when calling a postgres database.
>
> Essentially I want to determine by the error code if it is worth retrying
> the call (transient) or if the error was due to a bad query or programmer
> error, in which case don't retry.
>
> Going through the codes as posted here
> https://www.postgresql.org/docs/9.6/static/errcodes-appendix.html I had a go
> at making a list of error codes which may be transient:
>
> 53000: insufficient_resources
> 53100: disk_full
> 53200: out_of_memory
> 53300: too_many_connections
> 53400: configuration_limit_exceeded
> 57000: operator_intervention
> 57014: query_canceled
> 57P01: admin_shutdown
> 57P02: crash_shutdown
> 57P03: cannot_connect_now
> 57P04: database_dropped
> 58000: system_error
> 58030: io_error

Depends on how transient you mean, really.

I/O error, disk full, cannot_connect_now, etc may or may not require
admin intervention.

I would argue that database_dropped isn't transient. But I guess you
might be re-creating it?

> These next few I am not sure whether they should be treated as transient or
> not, but I am guessing so
>
> 55P03: lock_not_available

Yeah, I'd say that's transient.

> 55006: object_in_use

Same.

> 55000: object_not_in_prerequisite_state

Varies. This can be a bit of a catchall error, encompassing things
that need configuration changes, things that need system state changes
(won't work in recover or whatever), and things that will change in a
short span of time.

In general you'll need classes of retry:

* just reissue the query (deadlock retry, etc)
* reconnect and retry

etc.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] ICU integration
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Determine if an error is transient by its error code.