Thread: PostgreSql: Canceled on conflict out to old pivot
Hi
We have a PostgreSql 15 server serving around 30 databases, one schema each with the same layout. Each database is used by one application instance. The application consistently uses transactions with isolation level serializable to access the database, optimizing by using explicit read only transactions, where applicable. Once the server reaches 100% we get an increased amount of serialize conflict errors. This is expected, due to more concurrent access. But I fail to explain this kind of error:
ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on identification as a pivot, with conflict out to old committed transaction 61866959.
Detail: Reason code: Canceled on identification as a pivot, with conflict out to old committed transaction 61866959.
There is a variation of the error:
PSQLException: ERROR: could not serialize access due to read/write dependencies among transactions
Detail: Reason code: Canceled on conflict out to old pivot 61940806.
Detail: Reason code: Canceled on conflict out to old pivot 61940806.
We're logging the id, begin and end of every transaction. Transaction 61940806 was committed without errors. The transaction responsible for the above error was started 40min later (and failed immediately). With 61866959 it is even more extreme: the first conflict error occurred 2.5h after 61866959 was committed.
The DB table access pattern is too complex to lay out here. There are like 20 tables that are read/written to. Transactions are usually short living. The longest transaction that could occur is 1 min long. My understanding of serializable isolation is that only overlapping transactions can conflict. I can be pretty sure that in the above cases there is no single transaction, which overlaps with 61940806 and with the failing transaction 40 min later. Such long running transactions would cause different types of errors in our system ("out of shared memory", "You might need to increase max_pred_locks_per_transaction").
The DB table access pattern is too complex to lay out here. There are like 20 tables that are read/written to. Transactions are usually short living. The longest transaction that could occur is 1 min long. My understanding of serializable isolation is that only overlapping transactions can conflict. I can be pretty sure that in the above cases there is no single transaction, which overlaps with 61940806 and with the failing transaction 40 min later. Such long running transactions would cause different types of errors in our system ("out of shared memory", "You might need to increase max_pred_locks_per_transaction").
Why does PostgreSql detect a conflict with a transaction which was committed more than 1h before? Can there be a long dependency chain between many short running transactions? Does the high load prevent Postgres from doing some clean up?
Cheers,
Eduard
On Mon, Nov 27, 2023 at 2:17 AM Wirch, Eduard <eduard.w@smart-host.com> wrote:
HiWe have a PostgreSql 15 server serving around 30 databases, one schema each with the same layout. Each database is used by one application instance. The application consistently uses transactions with isolation level serializable to access the database, optimizing by using explicit read only transactions, where applicable. Once the server reaches 100% we get an increased amount of serialize conflict errors.
Maybe I haven't had my coffee yet, but 100% of what?