Thread: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18025 Logged by: Kishor Hargude Email address: hargudekishor@gmail.com PostgreSQL version: 14.8 Operating system: rhel Description: Hello , I have just witnessed the data loss scenario. Scenario is like, there was checkpoint operation failures going on the DB server since last 8 hours which means no successful checkpoint happened in the DB server since last 8 hours. Then DB server went into the crash mode due to the exhausted disk space and did not came up as part of crash recovery. Actually the victim had moved few WALs from the pg_wal to other location and reimporting those wal on original location also did not solved the problem. DB server was not able to find out the valid checkpoint record. The victim was not having the backup which he could use that backup to recover the data with the help of available archived WALs. So , the victim had only one option left in his hand that is pg_resetwal. We have tried every possible solution but did not worked so we did not left with more choices other than pg_Resetwal Now, The victim has to opt for pg_resetwal but the cost for this option is 8 hours of data loss as last successful checkpoint was happened 8 hours back. There is not other option available other than pg_resetwal. Above mentioned scenario can occurred with many of the PostgreSQL customers as the customer does not bother about to monitor the DB logs for finding out the errors/fatals/PANICs on daily basis or hourly basis (for the events like checkpoint failures//archiving failures). Due the ignorance of monitoring the DB logs or improper backup recovery strategies ,any customer can meet with the above mentioned scenario which costs to data loss. In my opinion, to limit the data loss or damage to the data in above scenario, we should think on changing the behaviour of the checkpoint failure operations. right now , we are just erroring out the checkpoint failures and allowing write transctions on DB server which kept generating the WALs. Now as checkpoints are unable to recycle or remove the unwanted WALs from pg_wal , the WALs keeps accumulating and eventually leads to disk full issue and server gets crashed.And then above scenario may happens. I am sharing my thought to limit the data loss in above situation: I learned that failure of the checkpoint operation can lead to data loss . The dataloss of several hours/days etc. Customers will not accept their mistakes but blame on PostgreSQL. I think we can improve the behaviour of the PostgreSQL when it comes to constantly failures of the checkpoints operation. We should stop accepting the write transactions and allow only read only transactions as soon as postgres detects the failure of the checkpoint operation (which can not be resolved) . This will give opportunity customer to import the databases from problematic cluster and build new cluster. But this will definitely limit the loss of the data if any customer suffer through the scenario mentioned as above. This approach also will give immediate alarm to the customer to fix the issue if its fixable otherwise do the recovery steps. Having downtime is always always better than loosing data. Thank You.
Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
From
Laurenz Albe
Date:
On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote: > I have just witnessed the data loss scenario. > > Scenario is like, there was checkpoint operation failures going on the DB > server since last 8 hours which means no successful checkpoint happened in > the DB server since last 8 hours. Then DB server went into the crash mode > due to the exhausted disk space and did not came up as part of crash > recovery. Mistake #1: you did not monitor disk space. > Actually the victim had moved few WALs from the pg_wal to other > location and reimporting those wal on original location also did not solved > the problem. Mistake #2: manually messing with the database directory. > DB server was not able to find out the valid checkpoint record. > The victim was not having the backup which he could use that backup to > recover the data with the help of available archived WALs. Mistake #0: no backup. > So , the victim > had only one option left in his hand that is pg_resetwal. We have tried > every possible solution but did not worked so we did not left with more > choices other than pg_Resetwal Mistake #3: run pg_resetwal "We have tried every possible solution" sounds a bit like "we tried all the haphazard things that came to our mind". Sorry, this is not a bug, this is a pilot error. If PostgreSQL crashes because "pg_wal" runs out of disk space, you increase the disk space, start PostgreSQL and let it complete crash recovery. It is as simple as that. Yours, Laurenz Albe
Re: BUG #18025: Probably we need to change behaviour of the checkpoint failures in PG
From
Michael Paquier
Date:
On Mon, Jul 17, 2023 at 09:53:32AM +0200, Laurenz Albe wrote: > On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote: >> Scenario is like, there was checkpoint operation failures going on the DB >> server since last 8 hours which means no successful checkpoint happened in >> the DB server since last 8 hours. Then DB server went into the crash mode >> due to the exhausted disk space and did not came up as part of crash >> recovery. > > Mistake #1: you did not monitor disk space. max_wal_size is a very critical piece to adjust. It is usually recommended to split pg_wal/ into its own partition so as the space allocated for WAL records is predictable across checkpoints. This is not a perfect science as max_wal_size is a soft limit so usually one needs an extra margin with a WAL partition. There have been some patches floating around to make that a hard limit, as well, but I don't think we've ever agreed on the semantics that would be acceptable when reaching the upper limit authorized. -- Michael