BUG #18457: Possible data loss needs to be evaluated - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18457: Possible data loss needs to be evaluated |
Date | |
Msg-id | 18457-dc62fafe74c67bdd@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18457: Possible data loss needs to be evaluated
Re: BUG #18457: Possible data loss needs to be evaluated |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18457 Logged by: Julius Ernesti Email address: julius.ernesti@kpi4me.de PostgreSQL version: 14.10 Operating system: Rocky Linux 9 Description: Our services connect to the database via libpq. It is ensured that connections are not used by two threads at the same time, with several physical connections to the database. Here's what happened: A very large transaction, which makes inserts into different tables, could not be executed due to a unique key constraint. Due to a not yet known problem, this transaction was not rolled back properly and the connection was in an invalid state. The Portgresql server log logged the following messages every second over a period of hours: 2024-05-05 22:15:27.461 CEST [2287043] ERROR: current transaction is aborted, commands ignored until end of transaction block 2024-05-05 22:15:27.461 CEST [2287043] STATEMENT: SAVEPOINT SP_264211 This is probably because a thread has continued to use the broken connection to execute transactions. I assume that this thread also received corresponding messages, which I cannot verify due to the lack of logs. Other processes were still able to communicate successfully with the database. Inserts were executed and generated rowids, which are generated via sequences, were also returned. In retrospect, I can no longer find these rows in the database. This means that inserts were apparently executed, but no data exists in the final result. The inserts correspond to the same tables as the failed transaction mentioned above. As a result, we had a data loss of 1 day's work, which unfortunately was only noticed afterwards. Assumption: I am sure that many unfavorable factors have led to a fatal error here. I think that bugs from different systems have overlapped. We have not managed to reproduce the invalid state described. I think that the transactions described were successfully executed in the database. As we only need to store data first and do not select it at that point, the error only became apparent so late. At the moment I assume that the command “STATEMENT: SAVEPOINT SP_264211” caused my data to be rolled back again without me noticing in the service. I am sure that there is a link to the failed transaction, that perhaps some tables were still locked and the rollback droped those inserted rows. My knowledge of postgres has reached its limit here. I would like to evaluate whether inserts from other connections, which have been successfully executed, can be rolled back by another connection via the statements 2024-05-05 22:15:27.461 CEST [2287043] ERROR: current transaction is aborted, commands ignored until end of transaction block 2024-05-05 22:15:27.461 CEST [2287043] STATEMENT: SAVEPOINT SP_264211 For the sake of simplicity, here is another example: There are two connections A and B Via A: insert into test (id, val) values (2, 'this will be a unique key constraint error'); --> The Rollback statement will not be executed due to an unknown error Via B: insert into test (id, val) values (4, 'no error'); --> Successful Transaction Via A: insert into test (id, val) values (5, 'no error, but the is currently a running transaction') --> Failed. Maybe a rollback is executed, I dont know Via B; select * from test where id = 4; --> Empty query We are still trying to find out how this could have happened. If the example described actually leads to successful transactions being canceled again, I would describe this as a bug in Postgres. Our codes have been running for 4 years and we have never seen an error like this. Best regards. Julius Ernesti
pgsql-bugs by date: