Force the old transactions logs cleanup even if checkpoint is skipped - Mailing list pgsql-hackers

From Zakhlystov, Daniil (Nebius)
Subject Force the old transactions logs cleanup even if checkpoint is skipped
Date
Msg-id AM9P190MB12346310F38B3FAF9287D1FFB5D6A@AM9P190MB1234.EURP190.PROD.OUTLOOK.COM
Whole thread Raw
Responses Re: Force the old transactions logs cleanup even if checkpoint is skipped
Re: Force the old transactions logs cleanup even if checkpoint is skipped
List pgsql-hackers
Hi, hackers!

I've stumbled into an interesting problem. Currently, if Postgres has nothing to write, it would skip the checkpoint
creationdefined by the checkpoint timeout setting. However, we might face a temporary archiving problem (for example,
somenetwork issues) that might lead to a pile of wal files stuck in pg_wal. After this temporary issue has gone, we
wouldstill be unable to archive them since we effectively skip the checkpoint because we have nothing to write. 

That might lead to a problem - suppose you've run out of disk space because of the temporary failure of the archiver.
Afterthis temporary failure has gone, Postgres would be unable to recover from it automatically and will require human
attentionto initiate a CHECKPOINT call. 

I suggest changing this behavior by trying to clean up the old WAL even if we skip the main checkpoint routine. I've
attachedthe patch that does exactly that. 

What do you think?

To reproduce the issue, you might repeat the following steps:

1. Init Postgres:
pg_ctl initdb -D /Users/usernamedt/test_archiver

2. Add the archiver script to simulate failure:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

false

3. Then alter the PostgreSQL conf:

archive_mode = on
checkpoint_timeout = 30s
archive_command = /Users/usernamedt/command.sh
log_min_messages = debug1

4. Then start Postgres:
/usr/local/pgsql/bin/pg_ctl -D /Users/usernamedt/test_archiver -l logfile start

5. Insert some data:
pgbench -i -s 30 -d postgres

6. Trigger checkpoint to flush all data:
psql -c "checkpoint;"

7. Alter the archiver script to simulate the end of archiver issues:
➜  ~ cat /Users/usernamedt/command.sh
#!/bin/bash

true

8. Check that the WAL files are actually archived but not removed:
➜  ~ ls -lha /Users/usernamedt/test_archiver/pg_wal/archive_status | head
total 0
drwx------@ 48 usernamedt  LD\Domain Users   1.5K Oct 17 17:44 .
drwx------@ 50 usernamedt  LD\Domain Users   1.6K Oct 17 17:43 ..
-rw-------@  1 usernamedt  LD\Domain Users     0B Oct 17 17:42 000000010000000000000040.done
...
-rw-------@  1 usernamedt  LD\Domain Users     0B Oct 17 17:43 00000001000000000000006D.done

2023-10-17 18:03:44.621 +04 [71737] DEBUG:  checkpoint skipped because system is idle

Thanks,

Daniil Zakhlystov
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: run pgindent on a regular basis / scripted manner
Next
From: Tom Lane
Date:
Subject: Re: run pgindent on a regular basis / scripted manner