Thread: Handling transaction failure due to concurrency errors
Hello, I have an application running on a server, that (once per hour, in general) imports batches of data across several tables. The INSERT and UPDATE operations apply to several tables, and are executed in a specific repeatable order, all within one big TRANSACTION_SERIALIZABLE operation (one commit at the end, no partial commits or rollbacks). It almost always works without error. The files are provided from an external organisation, and when an error occurs, they are automatically notified by e-mail. Sometimes there is inconsistent data in the files, so they need to know when to fix this. However, sometimes it fails even when there's nothing wrong with their file, and the message indicates a transaction concurrency error. I'm having trouble isolating the error; normally the tables affected by the transaction aren't affected by any other action (they are normally read by other processes to create transformed data in other tables). If I repeat the operation, with the same input, it works without error. As everything is done in the same transaction, I can't figure out what's failing. I'm unable to recreate the error in a development environment, so I don't really know how to implement a solution that recognises this specific case and could queue a re-try attempt (I don't want to re-try for any other case, because I want to reject files with bad data, so I need to figure out how to recognise different causes). Here's an edited stack trace: 16:37:05.878 ERROR - [AbstractRecordLoader.java:295:] PrjCRF-20180223-160501.txt:715-111595 (PrcPv) java.sql.BatchUpdateException: L'élément du batch 18 634 insert into mpf_operation_store (stime, ref_mpf_operation, ref_mpf_store) values ('2018-02-23 15:37:02.024000 +01:00:00', 40066, 199) on conflict (ref_mpf_operation, ref_mpf_store) do update set stime = excluded.stime a été annulé. Appeler getNextException pour en connaître la cause. at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2762) at org.postgresql.core.v3.QueryExecutorImpl$ErrorTrackingResultHandler.handleError(QueryExecutorImpl.java:362) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1999) at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1180) at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1201) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:412) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2929) 16:37:05.878 ERROR - [AbstractRecordLoader.java:308:] PrjCRF-20180223-160501.txt:715-111595 (PrcPv) org.postgresql.util.PSQLException: ERREUR: n'a pas pu sérialiser un accès à cause des dépendances de lecture/écriture parmi les transactions Détail : Reason code: Canceled on identification as a pivot, during conflict in checking. Indice : La transaction pourrait réussir après une nouvelle tentative. at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998) at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1180) at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1201) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:412) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2929) Is there some attribute of the SQLException that I can reliably use to detect this specific error case? I don't want to resort to parsing the error message if at all possible. As you can see in the stack trace, the text from the production server is part English, part French (the production server is in French, the development server is in English) Any suggestions would be much appreciated. Thanks, Christopher
Christopher BROWN <brown@reflexe.fr> writes: > I have an application running on a server, that (once per hour, in > general) imports batches of data across several tables. The INSERT > and UPDATE operations apply to several tables, and are executed in a > specific repeatable order, all within one big TRANSACTION_SERIALIZABLE > operation (one commit at the end, no partial commits or rollbacks). It > almost always works without error. The files are provided from an > external organisation, and when an error occurs, they are > automatically notified by e-mail. Sometimes there is inconsistent > data in the files, so they need to know when to fix this. > However, sometimes it fails even when there's nothing wrong with their > file, and the message indicates a transaction concurrency error. I'm > having trouble isolating the error; normally the tables affected by > the transaction aren't affected by any other action (they are normally > read by other processes to create transformed data in other tables). The short answer, most likely, is that the interlocks that check for serializability violations are approximate in the conservative direction, and will sometimes throw a serializability error even when the changes made by two concurrent transactions didn't really conflict. Basically, if you use SERIALIZABLE mode, you *must* be prepared to retry on serialization failure, and you shouldn't really question too hard why you got such an error. It's the price you pay for not having to analyze the concurrent behavior of your application logic in detail. What you might consider is automatically retrying a few times, and only kicking out an error for human consideration if the serializability failure seems persistent. regards, tom lane
Hi Tom,
Thanks for the quick reply. OK for the explanation, and I don't mind implementing the retry logic for this case... I just don't know how to detect when my code encounters this case (as opposed to other cases that can arise, such as unresolved foreign keys when importing data; I don't want to get into an infinite retry loop because it will never work in these other cases).
For example, are there some specific values I can query for using SQLException.getErrorCode() or SQLException.getSQLState(), or perhaps I can check to see if the SQLException is an instance of a specific subclass? I don't know where to start, because trial and error isn't an option: I'm unable to cause the same errors locally that I'm sometimes seeing on a live server.
Thanks
Christopher
On 2 March 2018 at 16:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Christopher BROWN <brown@reflexe.fr> writes:
> I have an application running on a server, that (once per hour, in
> general) imports batches of data across several tables. The INSERT
> and UPDATE operations apply to several tables, and are executed in a
> specific repeatable order, all within one big TRANSACTION_SERIALIZABLE
> operation (one commit at the end, no partial commits or rollbacks). It
> almost always works without error. The files are provided from an
> external organisation, and when an error occurs, they are
> automatically notified by e-mail. Sometimes there is inconsistent
> data in the files, so they need to know when to fix this.
> However, sometimes it fails even when there's nothing wrong with their
> file, and the message indicates a transaction concurrency error. I'm
> having trouble isolating the error; normally the tables affected by
> the transaction aren't affected by any other action (they are normally
> read by other processes to create transformed data in other tables).
The short answer, most likely, is that the interlocks that check
for serializability violations are approximate in the conservative
direction, and will sometimes throw a serializability error even when
the changes made by two concurrent transactions didn't really conflict.
Basically, if you use SERIALIZABLE mode, you *must* be prepared to retry
on serialization failure, and you shouldn't really question too hard
why you got such an error. It's the price you pay for not having to
analyze the concurrent behavior of your application logic in detail.
What you might consider is automatically retrying a few times, and only
kicking out an error for human consideration if the serializability
failure seems persistent.
regards, tom lane
Christopher BROWN <brown@reflexe.fr> writes: > Thanks for the quick reply. OK for the explanation, and I don't mind > implementing the retry logic for this case... I just don't know how to > detect when my code encounters this case (as opposed to other cases that > can arise, such as unresolved foreign keys when importing data; I don't > want to get into an infinite retry loop because it will never work in these > other cases). Yes, you should only retry in this way for the specific case of a serialization failure (SQLSTATE 40001). regards, tom lane
Tom,
That's what I was looking for, so thanks, I'll give it a go.
Best regards,
Christopher
On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Christopher BROWN <brown@reflexe.fr> writes:
> Thanks for the quick reply. OK for the explanation, and I don't mind
> implementing the retry logic for this case... I just don't know how to
> detect when my code encounters this case (as opposed to other cases that
> can arise, such as unresolved foreign keys when importing data; I don't
> want to get into an infinite retry loop because it will never work in these
> other cases).
Yes, you should only retry in this way for the specific case of a
serialization failure (SQLSTATE 40001).
regards, tom lane
Hi Christopher, On a side note you should look at the Byteman project. Byteman makes fault injection for testing purposes very easy. http://byteman.jboss.org/ In your case using Byteman gets round the effort involved to precisely re-create a failure scenario. Testing when your re-try logic kicks in (or not). Jeremy On Fri, 2018-03-02 at 16:27 +0100, Christopher BROWN wrote: > Tom, > > That's what I was looking for, so thanks, I'll give it a go. > > Best regards, > Christopher > > > > On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Christopher BROWN <brown@reflexe.fr> writes: > > > Thanks for the quick reply. OK for the explanation, and I don't > > mind > > > implementing the retry logic for this case... I just don't know > > how to > > > detect when my code encounters this case (as opposed to other > > cases that > > > can arise, such as unresolved foreign keys when importing data; I > > don't > > > want to get into an infinite retry loop because it will never > > work in these > > > other cases). > > > > Yes, you should only retry in this way for the specific case of a > > serialization failure (SQLSTATE 40001). > > > > regards, tom lane > >
Hi Jeremy,
Thanks for the suggestion, trying out Byteman was already on my TODO list for other cases, and it could be handy here too.
Thanks,
Christopher
On 8 March 2018 at 11:59, <jwhiting@redhat.com> wrote:
Hi Christopher,
On a side note you should look at the Byteman project. Byteman makes
fault injection for testing purposes very easy.
http://byteman.jboss.org/
In your case using Byteman gets round the effort involved to precisely
re-create a failure scenario. Testing when your re-try logic kicks in
(or not).
Jeremy
On Fri, 2018-03-02 at 16:27 +0100, Christopher BROWN wrote:
> Tom,
>
> That's what I was looking for, so thanks, I'll give it a go.
>
> Best regards,
> Christopher
>
>
>
> On 2 March 2018 at 16:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Christopher BROWN <brown@reflexe.fr> writes:
> > > Thanks for the quick reply. OK for the explanation, and I don't
> > mind
> > > implementing the retry logic for this case... I just don't know
> > how to
> > > detect when my code encounters this case (as opposed to other
> > cases that
> > > can arise, such as unresolved foreign keys when importing data; I
> > don't
> > > want to get into an infinite retry loop because it will never
> > work in these
> > > other cases).
> >
> > Yes, you should only retry in this way for the specific case of a
> > serialization failure (SQLSTATE 40001).
> >
> > regards, tom lane
>
>