Thread: PgBackRest fails due to filesystem full

PgBackRest fails due to filesystem full

From
KK CHN
Date:
List, 

 I am running PgbackRest-2.52.1 on RHEL9.3  and  EDB16  to backup to a remote repo server .   Everything was working fine and backups were regularly taken  with  cron scheduler daily.  

 But due to a   /  partition full 100 % utilization, the pgbackrest  backup failed the other day.  I came to know the backup script is not working for the backup which is scheduled daily from a cron scheduler.  I made  space in  /  file system by removing few  log files  from /var/pgbackrest/DBCluster1 

I tried to  reschedule the backup script (after deleting some log files from  /var and now / is having 50 % free space ) but after running for 2 or 3 minutes pgbackrest fails as follows. 


[root@dbtest log]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=full backup
2025-04-07 14:29:36.171 P00   INFO: backup command begin 2.52.1: --delta --exec-id=4175219-0893aa9e --log-level-console=info --log-level-file=debug --pg1-host=10.x.0.y --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --process-max=5 --repo1-block --repo1-bundle --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --repo1-retention-diff=6 --repo1-retention-full=3 --stanza=DBCluster1_Repo  --start-fast --type=full
2025-04-07 14:29:40.007 P00   INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2025-04-07 14:29:41.383 P00   INFO: backup start archive = 00000001000001EB0000004C, lsn = 1EB/4C0003D8
2025-04-07 14:29:41.383 P00   INFO: check archive for prior segment 00000001000001EB0000004B
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout
       HINT: check the archive_command to ensure that all options are correct (especially --stanza).
       HINT: check the PostgreSQL server log for errors.
       HINT: run the 'start' command if the stanza was previously stopped.

Again I ran the backup script but each time it fails with error (each time the WAL segment error with a new WAL segment number ) 

2025-04-07 14:30:41.383 P00   INFO: backup command end: aborted with exception [082]

     2025-04-07 14:33:03.382 P00   INFO: check archive for prior segment 00000001000001EB0000004D
ERROR: [082]: WAL segment 00000001000001EB0000004D was not archived before the 60000ms timeout
       HINT: check the archive_command to ensure that all options are correct (especially --stanza).
       HINT: check the PostgreSQL server log for errors.
       HINT: run the 'start' command if the stanza was previously stopped.

2025-04-07 14:34:03.382 P00   INFO: backup command end: aborted with exception [082]



  This may be due to the WAL segment from the DB server being unable to sync that time when the file system was full at the Repo Server side which was observed by me after 2 days !!

Any hints how can I rectify this issue and put pgbackrest working back ?? 

How can I  enforce  the consistency of the Backups and WAL files since there may be missing WAL files in between when the RepoServer file system is full ?



Thanks in advance
Krishane





Re: PgBackRest fails due to filesystem full

From
Greg Sabino Mullane
Date:
On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout

This is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-check
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: PgBackRest fails due to filesystem full

From
KK CHN
Date:


On Tue, Apr 8, 2025 at 10:28 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout

This is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-check

I have run the check and it says successful !!

[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo  --log-level-console=info check 

[root@dbtest ~]# 2025-04-09 10:52:26.148 P00   INFO: check command begin 2.52.1: --exec-id=384808-715e8496 --log-level-console=info --log-level-file=debug --pg1-host=10.x.x.x   --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --stanza=DBCluster1_Repo
2025-04-09 10:52:30.502 P00   INFO: check repo1 configuration (primary)
2025-04-09 10:52:31.003 P00   INFO: check repo1 archive for WAL (primary)
2025-04-09 10:52:36.305 P00   INFO: WAL segment 00000001000001ED00000017 successfully archived to '/data/DB_BKUPS/archive/DBCluster1_Repo/16-1/00000001000001ED/00000001000001ED00000017-8609407e8b9a1827a9d9b3e170dcc53e7af46bac.gz' on repo1
2025-04-09 10:52:36.721 P00   INFO: check command end: completed successfully (10575ms)




Then I ran 
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=diff backup     to test pgbackrest works fine !!!!

It says 

2025-04-09 10:53:52.521 P00   INFO: backup '20250407-150858F' cannot be resumed: resume only valid for full backup
^C2025-04-09 10:54:03.351 P00   INFO: backup command end: terminated on signal [SIGINT]

But the  # sudo -u postgres pgbackrest --stanza=DBCluster1_Repo info       command never shows such a backup   20250407-150858F exists.   The existing backups were 20250316-232631F and prior 2 full backups to this . 

Similarly   diff backups  I have the last one 20250316-232631F_20250329-172215D   and prior diffs only nothing later than this date .  and one INCR      incr backup: 20250316-232631F_20250330-083923I   noting later date than this..  So since 2025 03 30  all backups   Full/diff/incr fails  ( since the / partition ran out of space )

Nothing else reported by the info  command..  


How can I proceed to bring pgbackrest back to  take backups to normal ?     [  WAL files are missing then can we never take the Full backups / diff /inc  ? What is the workaround / solution to deal with this situation ?]

Any hints much appreciated .. 

Krishane
 
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Re: PgBackRest fails due to filesystem full

From
Ron Johnson
Date:
Try creating a new stanza, and doing a full backup from it.

On Wed, Apr 9, 2025 at 1:49 AM KK CHN <kkchn.in@gmail.com> wrote:


On Tue, Apr 8, 2025 at 10:28 PM Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Apr 7, 2025 at 5:32 AM KK CHN <kkchn.in@gmail.com> wrote:
ERROR: [082]: WAL segment 00000001000001EB0000004B was not archived before the 60000ms timeout

This is the part you need to focus on. Look at your Postgres logs and find out why the archiver is failing. You can also test this without trying a whole backup by using the "check" command: https://pgbackrest.org/command.html#command-check

I have run the check and it says successful !!

[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo  --log-level-console=info check 

[root@dbtest ~]# 2025-04-09 10:52:26.148 P00   INFO: check command begin 2.52.1: --exec-id=384808-715e8496 --log-level-console=info --log-level-file=debug --pg1-host=10.x.x.x   --pg1-host-user=enterprisedb --pg1-path=/data/edb/as16/data --pg-version-force=16 --repo1-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo1-path=/data/DB_BKUPS --stanza=DBCluster1_Repo
2025-04-09 10:52:30.502 P00   INFO: check repo1 configuration (primary)
2025-04-09 10:52:31.003 P00   INFO: check repo1 archive for WAL (primary)
2025-04-09 10:52:36.305 P00   INFO: WAL segment 00000001000001ED00000017 successfully archived to '/data/DB_BKUPS/archive/DBCluster1_Repo/16-1/00000001000001ED/00000001000001ED00000017-8609407e8b9a1827a9d9b3e170dcc53e7af46bac.gz' on repo1
2025-04-09 10:52:36.721 P00   INFO: check command end: completed successfully (10575ms)




Then I ran 
[root@dbtest ~]# sudo -u postgres pgbackrest --stanza=DBCluster1_Repo --type=diff backup     to test pgbackrest works fine !!!!

It says 

2025-04-09 10:53:52.521 P00   INFO: backup '20250407-150858F' cannot be resumed: resume only valid for full backup
^C2025-04-09 10:54:03.351 P00   INFO: backup command end: terminated on signal [SIGINT]

But the  # sudo -u postgres pgbackrest --stanza=DBCluster1_Repo info       command never shows such a backup   20250407-150858F exists.   The existing backups were 20250316-232631F and prior 2 full backups to this . 

Similarly   diff backups  I have the last one 20250316-232631F_20250329-172215D   and prior diffs only nothing later than this date .  and one INCR      incr backup: 20250316-232631F_20250330-083923I   noting later date than this..  So since 2025 03 30  all backups   Full/diff/incr fails  ( since the / partition ran out of space )

Nothing else reported by the info  command..  


How can I proceed to bring pgbackrest back to  take backups to normal ?     [  WAL files are missing then can we never take the Full backups / diff /inc  ? What is the workaround / solution to deal with this situation ?]

Any hints much appreciated .. 

Krishane
 
 
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!