Thread: PgBackRest fails due to filesystem full
List,
I am running PgbackRest-2.52.1 on RHEL9.3 and EDB16 to backup to a remote repo server . Everything was working fine and backups were regularly taken with cron scheduler daily.
But due to a / partition full 100 % utilization, the pgbackrest backup failed the other day. I came to know the backup script is not working for the backup which is scheduled daily from a cron scheduler. I made space in / file system by removing few log files from /var/pgbackrest/DBCluster1
I tried to reschedule the backup script (after deleting some log files from /var and now / is having 50 % free space ) but after running for 2 or 3 minutes pgbackrest fails as follows.
[root@dbtest log]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=full backup
2025-04-07 14:29:36.171 P00 INFO: backup command begin 2.52.1: --delta --exec-id=4175219-0893aa9e --log-level-console=info --log-level-file=debug --pg1-host=10.x.0.y --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --process-max=5 --repo1-block --repo1-bundle --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --repo1-retention-diff=6 --repo1-retention-full=3 --stanza=DBCluster1_Repo --start-fast --type=full
2025-04-07 14:29:36.171 P00 INFO: backup command begin 2.52.1: --delta --exec-id=4175219-0893aa9e --log-level-console=info --log-level-file=debug --pg1-host=10.x.0.y --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --process-max=5 --repo1-block --repo1-bundle --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --repo1-retention-diff=6 --repo1-retention-full=3 --stanza=DBCluster1_Repo --start-fast --type=full
2025-04-07 14:29:40.007 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2025-04-07 14:29:41.383 P00 INFO: backup start archive = 00000001000001EB0000004C, lsn = 1EB/4C0003D8
2025-04-07 14:29:41.383 P00 INFO: check archive for prior segment 00000001000001EB0000004B
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout
HINT: check the archive_command to ensure that all options are correct (especially --stanza).
HINT: check the PostgreSQL server log for errors.
HINT: run the 'start' command if the stanza was previously stopped.
2025-04-07 14:29:41.383 P00 INFO: backup start archive = 00000001000001EB0000004C, lsn = 1EB/4C0003D8
2025-04-07 14:29:41.383 P00 INFO: check archive for prior segment 00000001000001EB0000004B
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout
HINT: check the archive_command to ensure that all options are correct (especially --stanza).
HINT: check the PostgreSQL server log for errors.
HINT: run the 'start' command if the stanza was previously stopped.
Again I ran the backup script but each time it fails with error (each time the WAL segment error with a new WAL segment number )
2025-04-07 14:30:41.383 P00 INFO: backup command end: aborted with exception [082]
2025-04-07 14:33:03.382 P00 INFO: check archive for prior segment 00000001000001EB0000004D
ERROR: [082]: WAL segment 00000001000001EB0000004D was not archived before the 60000ms timeoutHINT: check the archive_command to ensure that all options are correct (especially --stanza).
HINT: check the PostgreSQL server log for errors.
HINT: run the 'start' command if the stanza was previously stopped.
2025-04-07 14:34:03.382 P00 INFO: backup command end: aborted with exception [082]
This may be due to the WAL segment from the DB server being unable to sync that time when the file system was full at the Repo Server side which was observed by me after 2 days !!
Any hints how can I rectify this issue and put pgbackrest working back ??
How can I enforce the consistency of the Backups and WAL files since there may be missing WAL files in between when the RepoServer file system is full ?
Thanks in advance
Krishane
On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout
This is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-check
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
On Tue, Apr 8, 2025 at 10:28 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeoutThis is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-check
I have run the check and it says successful !!
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --log-level-console=info check
[root@dbtest ~]# 2025-04-09 10:52:26.148 P00 INFO: check command begin 2.52.1: --exec-id=384808-715e8496 --log-level-console=info --log-level-file=debug --pg1-host=10.x.x.x --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --stanza=DBCluster1_Repo
2025-04-09 10:52:30.502 P00 INFO: check repo1 configuration (primary)
2025-04-09 10:52:31.003 P00 INFO: check repo1 archive for WAL (primary)
2025-04-09 10:52:36.305 P00 INFO: WAL segment 00000001000001ED00000017 successfully archived to '/data/DB_BKUPS/archive/DBCluster1_Repo/16-1/00000001000001ED/00000001000001ED00000017-8609407e8b9a1827a9d9b3e170dcc53e7af46bac.gz' on repo1
2025-04-09 10:52:36.721 P00 INFO: check command end: completed successfully (10575ms)
[root@dbtest ~]# 2025-04-09 10:52:26.148 P00 INFO: check command begin 2.52.1: --exec-id=384808-715e8496 --log-level-console=info --log-level-file=debug --pg1-host=10.x.x.x --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --stanza=DBCluster1_Repo
2025-04-09 10:52:30.502 P00 INFO: check repo1 configuration (primary)
2025-04-09 10:52:31.003 P00 INFO: check repo1 archive for WAL (primary)
2025-04-09 10:52:36.305 P00 INFO: WAL segment 00000001000001ED00000017 successfully archived to '/data/DB_BKUPS/archive/DBCluster1_Repo/16-1/00000001000001ED/00000001000001ED00000017-8609407e8b9a1827a9d9b3e170dcc53e7af46bac.gz' on repo1
2025-04-09 10:52:36.721 P00 INFO: check command end: completed successfully (10575ms)
Then I ran
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=diff backup to test pgbackrest works fine !!!!
It says
2025-04-09 10:53:52.521 P00 INFO: backup '20250407-150858F' cannot be resumed: resume only valid for full backup
^C2025-04-09 10:54:03.351 P00 INFO: backup command end: terminated on signal [SIGINT]
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=diff backup to test pgbackrest works fine !!!!
It says
2025-04-09 10:53:52.521 P00 INFO: backup '20250407-150858F' cannot be resumed: resume only valid for full backup
^C2025-04-09 10:54:03.351 P00 INFO: backup command end: terminated on signal [SIGINT]
But the # sudo -u postgres pgbackrest --stanza=DBCluster1_Repo info command never shows such a backup 20250407-150858F exists. The existing backups were 20250316-232631F and prior 2 full backups to this .
Similarly diff backups I have the last one 20250316-232631F_20250329-172215D and prior diffs only nothing later than this date . and one INCR incr backup: 20250316-232631F_20250330-083923I noting later date than this.. So since 2025 03 30 all backups Full/diff/incr fails ( since the / partition ran out of space )
Nothing else reported by the info command..
How can I proceed to bring pgbackrest back to take backups to normal ? [ WAL files are missing then can we never take the Full backups / diff /inc ? What is the workaround / solution to deal with this situation ?]
Any hints much appreciated ..
Krishane
Cheers,Greg--Crunchy Data - https://www.crunchydata.comEnterprise Postgres Software Products & Tech Support
Try creating a new stanza, and doing a full backup from it.
On Wed, Apr 9, 2025 at 1:49 AM KK CHN <kkchn.in@gmail.com> wrote:
On Tue, Apr 8, 2025 at 10:28 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeoutThis is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-checkI have run the check and it says successful !![root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --log-level-console=info check
[root@dbtest ~]# 2025-04-09 10:52:26.148 P00 INFO: check command begin 2.52.1: --exec-id=384808-715e8496 --log-level-console=info --log-level-file=debug --pg1-host=10.x.x.x --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --stanza=DBCluster1_Repo
2025-04-09 10:52:30.502 P00 INFO: check repo1 configuration (primary)
2025-04-09 10:52:31.003 P00 INFO: check repo1 archive for WAL (primary)
2025-04-09 10:52:36.305 P00 INFO: WAL segment 00000001000001ED00000017 successfully archived to '/data/DB_BKUPS/archive/DBCluster1_Repo/16-1/00000001000001ED/00000001000001ED00000017-8609407e8b9a1827a9d9b3e170dcc53e7af46bac.gz' on repo1
2025-04-09 10:52:36.721 P00 INFO: check command end: completed successfully (10575ms)Then I ran
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=diff backup to test pgbackrest works fine !!!!
It says
2025-04-09 10:53:52.521 P00 INFO: backup '20250407-150858F' cannot be resumed: resume only valid for full backup
^C2025-04-09 10:54:03.351 P00 INFO: backup command end: terminated on signal [SIGINT]But the # sudo -u postgres pgbackrest --stanza=DBCluster1_Repo info command never shows such a backup 20250407-150858F exists. The existing backups were 20250316-232631F and prior 2 full backups to this .Similarly diff backups I have the last one 20250316-232631F_20250329-172215D and prior diffs only nothing later than this date . and one INCR incr backup: 20250316-232631F_20250330-083923I noting later date than this.. So since 2025 03 30 all backups Full/diff/incr fails ( since the / partition ran out of space )Nothing else reported by the info command..How can I proceed to bring pgbackrest back to take backups to normal ? [ WAL files are missing then can we never take the Full backups / diff /inc ? What is the workaround / solution to deal with this situation ?]Any hints much appreciated ..KrishaneCheers,Greg--Crunchy Data - https://www.crunchydata.comEnterprise Postgres Software Products & Tech Support
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!