On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote:
> I have just witnessed the data loss scenario.
>
> Scenario is like, there was checkpoint operation failures going on the DB
> server since last 8 hours which means no successful checkpoint happened in
> the DB server since last 8 hours. Then DB server went into the crash mode
> due to the exhausted disk space and did not came up as part of crash
> recovery.
Mistake #1: you did not monitor disk space.
> Actually the victim had moved few WALs from the pg_wal to other
> location and reimporting those wal on original location also did not solved
> the problem.
Mistake #2: manually messing with the database directory.
> DB server was not able to find out the valid checkpoint record.
> The victim was not having the backup which he could use that backup to
> recover the data with the help of available archived WALs.
Mistake #0: no backup.
> So , the victim
> had only one option left in his hand that is pg_resetwal. We have tried
> every possible solution but did not worked so we did not left with more
> choices other than pg_Resetwal
Mistake #3: run pg_resetwal
"We have tried every possible solution" sounds a bit like "we tried all the
haphazard things that came to our mind".
Sorry, this is not a bug, this is a pilot error.
If PostgreSQL crashes because "pg_wal" runs out of disk space, you increase
the disk space, start PostgreSQL and let it complete crash recovery. It is
as simple as that.
Yours,
Laurenz Albe