Thread: Handling transaction failure due to concurrency errors

Handling transaction failure due to concurrency errors

From
Christopher BROWN
Date:
Hello,

I have an application running on a server, that (once per hour, in
general) imports batches of data across several tables.  The INSERT
and UPDATE operations apply to several tables, and are executed in a
specific repeatable order, all within one big TRANSACTION_SERIALIZABLE
operation (one commit at the end, no partial commits or rollbacks). It
almost always works without error. The files are provided from an
external organisation, and when an error occurs, they are
automatically notified by e-mail.  Sometimes there is inconsistent
data in the files, so they need to know when to fix this.

However, sometimes it fails even when there's nothing wrong with their
file, and the message indicates a transaction concurrency error.  I'm
having trouble isolating the error; normally the tables affected by
the transaction aren't affected by any other action (they are normally
read by other processes to create transformed data in other tables).
If I repeat the operation, with the same input, it works without
error.  As everything is done in the same transaction, I can't figure
out what's failing.  I'm unable to recreate the error in a development
environment, so I don't really know how to implement a solution that
recognises this specific case and could queue a re-try attempt (I
don't want to re-try for any other case, because I want to reject
files with bad data, so I need to figure out how to recognise
different causes).

Here's an edited stack trace:

16:37:05.878 ERROR - [AbstractRecordLoader.java:295:]
PrjCRF-20180223-160501.txt:715-111595 (PrcPv)
java.sql.BatchUpdateException: L'élément du batch 18 634 insert into
mpf_operation_store (stime, ref_mpf_operation, ref_mpf_store) values
('2018-02-23 15:37:02.024000 +01:00:00',
 40066, 199) on conflict (ref_mpf_operation, ref_mpf_store) do update
set stime = excluded.stime a été annulé. Appeler getNextException pour
en connaître la cause.
        at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2762)
        at org.postgresql.core.v3.QueryExecutorImpl$ErrorTrackingResultHandler.handleError(QueryExecutorImpl.java:362)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1999)
        at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1180)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1201)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:412)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2929)
16:37:05.878 ERROR - [AbstractRecordLoader.java:308:]
PrjCRF-20180223-160501.txt:715-111595 (PrcPv)
org.postgresql.util.PSQLException: ERREUR: n'a pas pu sérialiser un
accès à cause des dépendances de lecture/écriture
parmi les transactions
  Détail : Reason code: Canceled on identification as a pivot, during
conflict in checking.
  Indice : La transaction pourrait réussir après une nouvelle tentative.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
        at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1180)
        at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1201)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:412)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2929)

Is there some attribute of the SQLException that I can reliably use to
detect this specific error case?  I don't want to resort to parsing
the error message if at all possible.  As you can see in the stack
trace, the text from the production server is part English, part
French (the production server is in French, the development server is
in English)

Any suggestions would be much appreciated.

Thanks,
Christopher


Re: Handling transaction failure due to concurrency errors

From
Tom Lane
Date:
Christopher BROWN <brown@reflexe.fr> writes:
> I have an application running on a server, that (once per hour, in
> general) imports batches of data across several tables.  The INSERT
> and UPDATE operations apply to several tables, and are executed in a
> specific repeatable order, all within one big TRANSACTION_SERIALIZABLE
> operation (one commit at the end, no partial commits or rollbacks). It
> almost always works without error. The files are provided from an
> external organisation, and when an error occurs, they are
> automatically notified by e-mail.  Sometimes there is inconsistent
> data in the files, so they need to know when to fix this.

> However, sometimes it fails even when there's nothing wrong with their
> file, and the message indicates a transaction concurrency error.  I'm
> having trouble isolating the error; normally the tables affected by
> the transaction aren't affected by any other action (they are normally
> read by other processes to create transformed data in other tables).

The short answer, most likely, is that the interlocks that check
for serializability violations are approximate in the conservative
direction, and will sometimes throw a serializability error even when
the changes made by two concurrent transactions didn't really conflict.

Basically, if you use SERIALIZABLE mode, you *must* be prepared to retry
on serialization failure, and you shouldn't really question too hard
why you got such an error.  It's the price you pay for not having to
analyze the concurrent behavior of your application logic in detail.

What you might consider is automatically retrying a few times, and only
kicking out an error for human consideration if the serializability
failure seems persistent.

            regards, tom lane


Re: Handling transaction failure due to concurrency errors

From
Christopher BROWN
Date:
Hi Tom,

Thanks for the quick reply.  OK for the explanation, and I don't mind implementing the retry logic for this case... I just don't know how to detect when my code encounters this case (as opposed to other cases that can arise, such as unresolved foreign keys when importing data; I don't want to get into an infinite retry loop because it will never work in these other cases).

For example, are there some specific values I can query for using SQLException.getErrorCode() or SQLException.getSQLState(), or perhaps I can check to see if the SQLException is an instance of a specific subclass?  I don't know where to start, because trial and error isn't an option: I'm unable to cause the same errors locally that I'm sometimes seeing on a live server.

​Thanks
Christopher



On 2 March 2018 at 16:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Christopher BROWN <brown@reflexe.fr> writes:
> I have an application running on a server, that (once per hour, in
> general) imports batches of data across several tables.  The INSERT
> and UPDATE operations apply to several tables, and are executed in a
> specific repeatable order, all within one big TRANSACTION_SERIALIZABLE
> operation (one commit at the end, no partial commits or rollbacks). It
> almost always works without error. The files are provided from an
> external organisation, and when an error occurs, they are
> automatically notified by e-mail.  Sometimes there is inconsistent
> data in the files, so they need to know when to fix this.

> However, sometimes it fails even when there's nothing wrong with their
> file, and the message indicates a transaction concurrency error.  I'm
> having trouble isolating the error; normally the tables affected by
> the transaction aren't affected by any other action (they are normally
> read by other processes to create transformed data in other tables).

The short answer, most likely, is that the interlocks that check
for serializability violations are approximate in the conservative
direction, and will sometimes throw a serializability error even when
the changes made by two concurrent transactions didn't really conflict.

Basically, if you use SERIALIZABLE mode, you *must* be prepared to retry
on serialization failure, and you shouldn't really question too hard
why you got such an error.  It's the price you pay for not having to
analyze the concurrent behavior of your application logic in detail.

What you might consider is automatically retrying a few times, and only
kicking out an error for human consideration if the serializability
failure seems persistent.

                        regards, tom lane

Re: Handling transaction failure due to concurrency errors

From
Tom Lane
Date:
Christopher BROWN <brown@reflexe.fr> writes:
> Thanks for the quick reply.  OK for the explanation, and I don't mind
> implementing the retry logic for this case... I just don't know how to
> detect when my code encounters this case (as opposed to other cases that
> can arise, such as unresolved foreign keys when importing data; I don't
> want to get into an infinite retry loop because it will never work in these
> other cases).

Yes, you should only retry in this way for the specific case of a
serialization failure (SQLSTATE 40001).

            regards, tom lane


Re: Handling transaction failure due to concurrency errors

From
Christopher BROWN
Date:
Tom,

That's what I was looking for, so thanks, I'll give it a go.

Best regards,
Christopher



On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Christopher BROWN <brown@reflexe.fr> writes:
> Thanks for the quick reply.  OK for the explanation, and I don't mind
> implementing the retry logic for this case... I just don't know how to
> detect when my code encounters this case (as opposed to other cases that
> can arise, such as unresolved foreign keys when importing data; I don't
> want to get into an infinite retry loop because it will never work in these
> other cases).

Yes, you should only retry in this way for the specific case of a
serialization failure (SQLSTATE 40001).

                        regards, tom lane

Re: Handling transaction failure due to concurrency errors

From
jwhiting@redhat.com
Date:
Hi Christopher,
 On a side note you should look at the Byteman project. Byteman makes
fault injection for testing purposes very easy.

http://byteman.jboss.org/

 In your case using Byteman gets round the effort involved to precisely
re-create a failure scenario. Testing when your re-try logic kicks in
(or not).

Jeremy

On Fri, 2018-03-02 at 16:27 +0100, Christopher BROWN wrote:
> Tom,
> 
> That's what I was looking for, so thanks, I'll give it a go.
> 
> Best regards,
> Christopher
> 
> 
> 
> On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Christopher BROWN <brown@reflexe.fr> writes:
> > > Thanks for the quick reply.  OK for the explanation, and I don't
> > mind
> > > implementing the retry logic for this case... I just don't know
> > how to
> > > detect when my code encounters this case (as opposed to other
> > cases that
> > > can arise, such as unresolved foreign keys when importing data; I
> > don't
> > > want to get into an infinite retry loop because it will never
> > work in these
> > > other cases).
> > 
> > Yes, you should only retry in this way for the specific case of a
> > serialization failure (SQLSTATE 40001).
> > 
> >                         regards, tom lane
> 
> 


Re: Handling transaction failure due to concurrency errors

From
Christopher BROWN
Date:
Hi Jeremy,

Thanks for the suggestion, trying out Byteman was already on my TODO list for other cases, and it could be handy here too.

​Thanks,

Christopher




On 8 March 2018 at 11:59, <jwhiting@redhat.com> wrote:
Hi Christopher,
 On a side note you should look at the Byteman project. Byteman makes
fault injection for testing purposes very easy.

http://byteman.jboss.org/

 In your case using Byteman gets round the effort involved to precisely
re-create a failure scenario. Testing when your re-try logic kicks in
(or not).

Jeremy

On Fri, 2018-03-02 at 16:27 +0100, Christopher BROWN wrote:
> Tom,
>
> That's what I was looking for, so thanks, I'll give it a go.
>
> Best regards,
> Christopher
>
>
>
> On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Christopher BROWN <brown@reflexe.fr> writes:
> > > Thanks for the quick reply.  OK for the explanation, and I don't
> > mind
> > > implementing the retry logic for this case... I just don't know
> > how to
> > > detect when my code encounters this case (as opposed to other
> > cases that
> > > can arise, such as unresolved foreign keys when importing data; I
> > don't
> > > want to get into an infinite retry loop because it will never
> > work in these
> > > other cases).
> >
> > Yes, you should only retry in this way for the specific case of a
> > serialization failure (SQLSTATE 40001).
> >
> >                         regards, tom lane
>
>