Re: Documenting when to retry on serialization failure - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Documenting when to retry on serialization failure |
Date | |
Msg-id | CA+hUKG+PTRwdakDZ3hR263PJb8CcxmozuXCgKPKYgeVx-dOSAA@mail.gmail.com Whole thread Raw |
In response to | Documenting when to retry on serialization failure (Simon Riggs <simon.riggs@enterprisedb.com>) |
Responses |
Re: Documenting when to retry on serialization failure
|
List | pgsql-hackers |
On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs <simon.riggs@enterprisedb.com> wrote: > "Applications using this level must be prepared to retry transactions > due to serialization failures." > ... > "When an application receives this error message, it should abort the > current transaction and retry the whole transaction from the > beginning." > > I note that the specific error codes this applies to are not > documented, so lets discuss what the docs for that would look like. +1 for naming the error. > I had a conversation with Kevin Grittner about retry some years back > and it seemed clear that the application should re-execute application > logic from the beginning, rather than just slavishly re-execute the > same SQL. But that is not documented either. Right, the result of the first statement could cause the application to do something completely different the second time through. I personally think the best way for applications to deal with this problem (and at least also deadlock, serialisation failure's pessimistic cousin) is to represent transactions as blocks of code that can be automatically retried, however that looks in your client language. It might be that you pass a function/closure/whatever-you-call-it to the transaction management code so it can rerun it if necessary, or that a function is decorated in some way that some magic infrastructure understands, but that's a little tricky to write about in a general enough way for our manual. (A survey of how this looks with various different libraries and tools might make a neat conference talk though.) But isn't that exactly what that existing sentence "... from the beginning" is trying to say, especially with the follow sentence ("The second time through...")? Hhm, yeah, perhaps that next sentence could be clearer. > Is *automatic* retry possible? In all cases? None? Or maybe Some? I'm aware of a couple of concrete cases that confound attempts to retry automatically: sometimes we report a unique constraint violation or an exclusion constraint failure, when we have the information required to diagnose a serialisation anomaly. In those cases, we really should figure out how to spit out 40001 (otherwise what is general purpose auto retry code supposed to do with UCV?). We fixed a single-index variant of this problem in commit fcff8a57. I have an idea for how this might be fixed for the multi-index UCV[1] and exclusion constraint[2] variants of the problem, but haven't actually tried yet. If there are other things that stand in the way of reliable automated retry (= a list of error codes a client library could look for) then I'd love to have a list of them. > But what about the case of a single statement transaction? Can we just > re-execute then? I guess if it didn't run anything other than > IMMUTABLE functions then it should be OK, assuming the inputs > themselves were immutable, which we've no way for the user to declare. > Could we allow a user-defined auto_retry parameter? I've wondered about that too, but so far it didn't seem worth the effort, since application developers need another solution for multi-statement retry anyway. > We don't mention that a transaction might just repeatedly fail either. According to the VLDB paper, the "safe retry" property (§ 5.4) means that a retry won't abort for the same reason (due to a cycle with the same set of other transactions as your last attempt), unless prepared transactions are involved (§ 7.1). This means that the whole system continues to make some kind of progress in the absence of 2PC, though of course your transaction might or might not fail because of a cycle with some other set of transactions. Maybe that is too technical for our manual, which already provides the link to that paper, but it's interesting to note that you can suffer from a stuck busy-work loop until conflicting prepared xacts go away, with a naive automatic-retry-forever system. [1] https://www.postgresql.org/message-id/flat/CAGPCyEZG76zjv7S31v_xPeLNRuzj-m%3DY2GOY7PEzu7vhB%3DyQog%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/CAMTXbE-sq9JoihvG-ccC70jpjMr%2BDWmnYUj%2BVdnFRFSRuaaLZQ%40mail.gmail.com
pgsql-hackers by date: