Thread: Running rsync backups in pg15

Running rsync backups in pg15

From
Murthy Nunna
Date:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?

 

Thank you in advance for your ideas.

 

 

Re: Running rsync backups in pg15

From
Ron Johnson
Date:
On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?


Run pgbackrest instead of rsync,

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Evan Rempel
Date:
We use a similar approach, but instead of using rsync, we use our backup software directly which is an incremental forever tool. Allows backup of TB DBs in short minutes. Switching to pgbackrest is actually a step backwards for us.

But as the OP states, if you have to keep the postgresql session open for the pg_start_backup and the pg_stop_backup then we will have to do a significant architectual change.

Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in different sessions?


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: November 7, 2024 9:34 AM
To: pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15
 
On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?


Run pgbackrest instead of rsync,

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Scott Ribe
Date:
> On Nov 7, 2024, at 10:47 AM, Evan Rempel <erempel@uvic.ca> wrote:
>
> Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in
differentsessions? 

Would it be feasible for you to write a small tool which executes pg_start_backup, runs in the background, and on
commandexecutes pg_stop_backup? Could maybe be done with a shell script & pipes... 




Re: Running rsync backups in pg15

From
Ron Johnson
Date:
On Thu, Nov 7, 2024 at 12:47 PM Evan Rempel <erempel@uvic.ca> wrote:
We use a similar approach, but instead of using rsync, we use our backup software directly which is an incremental forever tool. Allows backup of TB DBs in short minutes. Switching to pgbackrest is actually a step backwards for us.

Last night's pgbackrest incremental backup of a 5.1TB database took a whopping 92 seconds.  How's that a backwards step?

Sure, the weekly full backup takes 84 minutes, but that's in so way shape or form painfully slow.
 
But as the OP states, if you have to keep the postgresql session open for the pg_start_backup and the pg_stop_backup then we will have to do a significant architectual change.

Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in different sessions?


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: November 7, 2024 9:34 AM
To: pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15
 
On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?


Run pgbackrest instead of rsync,

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Evan Rempel
Date:
Using pgbackrest requires that I have additional local disk space for me to stage the backup before I use my actual backup software to actually perform an off-server backup.

Incrementally forever backups means that there isn't a need to do a weekly full backup. I never have to do the 84 minutes of backup.

The same thing is true for performing restores. Pgbackrest requires a two (actually 3 if you include wal replay) stage restore. Once from my actual backup system into the staging area, and then a second to get the pgbackrest files restored to the postgresql data space. There are even more steps for pgbackrest if I have to go back to the last full backup and roll forward.

What I am hearing from everyone is that there isn't a way within postgresql to remove the requirement for keeping the session open. I will have to write scripts to handle a persistent connection for the duration of the backup 🙁


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>

On Thu, Nov 7, 2024 at 12:47 PM Evan Rempel <erempel@uvic.ca> wrote:
We use a similar approach, but instead of using rsync, we use our backup software directly which is an incremental forever tool. Allows backup of TB DBs in short minutes. Switching to pgbackrest is actually a step backwards for us.

Last night's pgbackrest incremental backup of a 5.1TB database took a whopping 92 seconds.  How's that a backwards step?

Sure, the weekly full backup takes 84 minutes, but that's in so way shape or form painfully slow.
 
But as the OP states, if you have to keep the postgresql session open for the pg_start_backup and the pg_stop_backup then we will have to do a significant architectual change.

Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in different sessions?


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: November 7, 2024 9:34 AM
To: pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15
 
On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?


Run pgbackrest instead of rsync,

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Gerald Drouillard
Date:
You could take a snapshot if your data directory is on a zfs file system.  Then send it to another server wherever you like.

On Thu, Nov 7, 2024 at 1:13 PM Evan Rempel <erempel@uvic.ca> wrote:
Using pgbackrest requires that I have additional local disk space for me to stage the backup before I use my actual backup software to actually perform an off-server backup.

Incrementally forever backups means that there isn't a need to do a weekly full backup. I never have to do the 84 minutes of backup.

The same thing is true for performing restores. Pgbackrest requires a two (actually 3 if you include wal replay) stage restore. Once from my actual backup system into the staging area, and then a second to get the pgbackrest files restored to the postgresql data space. There are even more steps for pgbackrest if I have to go back to the last full backup and roll forward.

What I am hearing from everyone is that there isn't a way within postgresql to remove the requirement for keeping the session open. I will have to write scripts to handle a persistent connection for the duration of the backup 🙁


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>

On Thu, Nov 7, 2024 at 12:47 PM Evan Rempel <erempel@uvic.ca> wrote:
We use a similar approach, but instead of using rsync, we use our backup software directly which is an incremental forever tool. Allows backup of TB DBs in short minutes. Switching to pgbackrest is actually a step backwards for us.

Last night's pgbackrest incremental backup of a 5.1TB database took a whopping 92 seconds.  How's that a backwards step?

Sure, the weekly full backup takes 84 minutes, but that's in so way shape or form painfully slow.
 
But as the OP states, if you have to keep the postgresql session open for the pg_start_backup and the pg_stop_backup then we will have to do a significant architectual change.

Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in different sessions?


--
Evan


From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: November 7, 2024 9:34 AM
To: pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15
 
On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?


Run pgbackrest instead of rsync,

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

RE: Running rsync backups in pg15

From
Murthy Nunna
Date:

rsync  backups are not just incremental backups. They are incremental merged backups. You always get full backup at the end of the backup.

 

My database is 18TB. The very first backup took 14 hours. We just keep overwriting this full backup with daily rsyncs. The daily rsyncs take about 15 to 30 minutes depending up on the activity since last backup.

 

I feel the new way (pg_backup_start) is a step backward. I did not really see any issue with the old way. When I crashed my cluster (pg14), it is nicely removing the backup label if it exists at the restart. I think that is good enough.

 

 

 

From: Doug Reynolds <mav@wastegate.net>
Sent: Thursday, November 7, 2024 1:50 PM
To: Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: Re: Running rsync backups in pg15

 

[EXTERNAL] – This message is from an external sender

I would be curious to see how long it would take to restore a 5TB database from each WAL ever created on the system.

 

We used to do a similar process with Oracle on a 30TB database years ago, but it literally took 26-28 hours to do a full backup.

 

Doug



On Nov 7, 2024, at 1:05 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:



On Thu, Nov 7, 2024 at 12:47 PM Evan Rempel <erempel@uvic.ca> wrote:

We use a similar approach, but instead of using rsync, we use our backup software directly which is an incremental forever tool. Allows backup of TB DBs in short minutes. Switching to pgbackrest is actually a step backwards for us.

 

Last night's pgbackrest incremental backup of a 5.1TB database took a whopping 92 seconds.  How's that a backwards step?

 

Sure, the weekly full backup takes 84 minutes, but that's in so way shape or form painfully slow.

 

But as the OP states, if you have to keep the postgresql session open for the pg_start_backup and the pg_stop_backup then we will have to do a significant architectual change.

 

Anyone know if there is a straight forward way to allows the pg_start_backup and the pg_stop_backup to be run in different sessions?

 

 

--

Evan

 


From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: November 7, 2024 9:34 AM
To: pgsql-admin@postgresql.org <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15

 

On Thu, Nov 7, 2024 at 11:35 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

1.       psql Select pg_start_backup

2.       rsync

3.       psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?

 

Run pgbackrest instead of rsync,

 

--

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!


 

--

Death to <Redacted>, and butter sauce.

Don't boil me, I'm still alive.

<Redacted> lobster!

Re: Running rsync backups in pg15

From
Laurenz Albe
Date:
On Thu, 2024-11-07 at 16:35 +0000, Murthy Nunna wrote:
> In PG14 and earlier, there is no requirement to keep database connection while rsync
> is in progress. However, there is a change in PG15+ that requires rsync to be while
> we have the same database session open that executes SELECT pg_backup_start('label').
> This change requires a rewrite of existing scripts we have.
>  
> It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.
>  
> Anybody found an elegant way of doing this?

You could try
https://github.com/cybertec-postgresql/safe-backup

Yours,
Laurenz Albe



Re: Running rsync backups in pg15

From
Ron Johnson
Date:
On Thu, Nov 7, 2024 at 3:59 PM Murthy Nunna <mnunna@fnal.gov> wrote:

rsync  backups are not just incremental backups. They are incremental merged backups. You always get full backup at the end of the backup.

 

My database is 18TB. The very first backup took 14 hours. We just keep overwriting this full backup with daily rsyncs. The daily rsyncs take about 15 to 30 minutes depending up on the activity since last backup.


That looks a whole lot like the results you get from async Streaming Replication.

Which is, of course, a DR solution, NOT a backup solution.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

RE: Running rsync backups in pg15

From
Murthy Nunna
Date:

Yes. We already have streaming replication. But we still rely on rsync backups for PITR. We keep two copies of rsync backups. The oldest full rsync is 7 days. We save all WALs for 8 days. We do not want to change this scheme. There are some bells and whistles during rsync to monitor the backup. With the introduction of new requirement ( need to run pg_backup_start/stop in same connection) we end up rewriting our backup scripts.

 

As noted earlier, I did a crash test in pg14 (which uses old functions pg_start_backup/stop in separate connections) but I did not encounter any issue like lingering backup_label file that prevents cluster restart etc. What I noticed is pg14 is nicely renaming the backup_label file before restarting the cluster.

 

Please (pretty please) somebody tell me (via a URL that explains is good enough) what problem is solved by introducing the new restriction of having to run pg_backup_start/stop in same connection. I am pretty sure I am missing the point… May be I should run the crash test in a different way to reproduce the issue.

 

 

From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: Friday, November 8, 2024 8:09 AM
To: pgsql-admin <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15

 

That looks a whole lot like the results you get from async Streaming Replication.

 

Which is, of course, a DR solution, NOT a backup solution.

 

--

Re: Running rsync backups in pg15

From
Ron Johnson
Date:
According to https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Section 26.3.3.2, SELECT pg_backup_start('label') is how you start Exclusive Low Level backups.

https://www.postgresql.org/docs/release/15.0/ says that this has been removed.

On Fri, Nov 8, 2024 at 10:07 AM Murthy Nunna <mnunna@fnal.gov> wrote:

Yes. We already have streaming replication. But we still rely on rsync backups for PITR. We keep two copies of rsync backups. The oldest full rsync is 7 days. We save all WALs for 8 days. We do not want to change this scheme. There are some bells and whistles during rsync to monitor the backup. With the introduction of new requirement ( need to run pg_backup_start/stop in same connection) we end up rewriting our backup scripts.

 

As noted earlier, I did a crash test in pg14 (which uses old functions pg_start_backup/stop in separate connections) but I did not encounter any issue like lingering backup_label file that prevents cluster restart etc. What I noticed is pg14 is nicely renaming the backup_label file before restarting the cluster.

 

Please (pretty please) somebody tell me (via a URL that explains is good enough) what problem is solved by introducing the new restriction of having to run pg_backup_start/stop in same connection. I am pretty sure I am missing the point… May be I should run the crash test in a different way to reproduce the issue.

 

 

From: Ron Johnson <ronljohnsonjr@gmail.com>
Sent: Friday, November 8, 2024 8:09 AM
To: pgsql-admin <pgsql-admin@postgresql.org>
Subject: Re: Running rsync backups in pg15

 

That looks a whole lot like the results you get from async Streaming Replication.

 

Which is, of course, a DR solution, NOT a backup solution.

 

--



--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Scott Ribe
Date:
> On Nov 8, 2024, at 8:32 AM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
>
> According to https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Section
26.3.3.2,SELECT pg_backup_start('label') is how you start Exclusive Low Level backups. 

That link is to PG 14 docs, exclusive backups were dropped in 15.


Re: Running rsync backups in pg15

From
Ron Johnson
Date:
Right.  I also linked to the 15 Release Notes.

On Fri, Nov 8, 2024 at 10:40 AM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Nov 8, 2024, at 8:32 AM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
>
> According to https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Section 26.3.3.2, SELECT pg_backup_start('label') is how you start Exclusive Low Level backups.

That link is to PG 14 docs, exclusive backups were dropped in 15.


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Re: Running rsync backups in pg15

From
Scott Ribe
Date:
> On Nov 8, 2024, at 8:43 AM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
>
> Right.  I also linked to the 15 Release Notes.
>
> On Fri, Nov 8, 2024 at 10:40 AM Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> > On Nov 8, 2024, at 8:32 AM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:
> >
> > According to https://www.postgresql.org/docs/14/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP Section
26.3.3.2,SELECT pg_backup_start('label') is how you start Exclusive Low Level backups. 
>
> That link is to PG 14 docs, exclusive backups were dropped in 15.

I seem to have read your post as the inverse of what you meant.


RE: Running rsync backups in pg15

From
Murthy Nunna
Date:

Yes, I know. Exclusive backups are removed.

 

From release notes of 15:

 

If the database server stops abruptly while in this mode, the server could fail to start. The non-exclusive backup mode is considered superior for all purposes. Functions pg_start_backup()/pg_stop_backup() have been renamed to pg_backup_start()/pg_backup_stop(), and the functions pg_backup_start_time() and pg_is_in_backup() have been removed.

Sorry my question was not clear. As per above release notes - If the database server stops abruptly while in this mode, the server could fail to start. However, I crash tested in pg14 (stopped abruptly during exclusive backup) but the cluster still started fine. Pg14 renamed backup_label to backup_label.old and restarted the cluster successfully.

 

Repeating my question: What problem, if any, is resolved by removing exclusive backups in pg15 and above? There must be certain conditions other than just abrupt server crash which prompted the removal of exclusive backups. Just trying to find what it is.

 

 

Re: Running rsync backups in pg15

From
Fujii Masao
Date:

On 2024/11/09 1:53, Murthy Nunna wrote:
> Yes, I know. Exclusive backups are removed.
> 
>  From release notes of 15:
> 
>   * Remove long-deprecated *exclusive backup mode*
<https://www.postgresql.org/docs/15/continuous-archiving.html#BACKUP-BASE-BACKUP> (DavidSteele, Nathan Bossart)
 
> 
> If the database server stops abruptly while in this mode, the server could fail to start. The non-exclusive backup
modeis considered superior for all purposes. Functions pg_start_backup()/pg_stop_backup() have been renamed to
pg_backup_start()/pg_backup_stop(),and the functions pg_backup_start_time() and pg_is_in_backup() have been removed.
 
> 
> Sorry my question was not clear. As per above release notes - If the database server stops abruptly while in this
mode,the server could fail to start. However, I crash tested in pg14 (stopped abruptly during exclusive backup) but the
clusterstill started fine. Pg14 renamed backup_label to backup_label.old and restarted the cluster successfully.
 

If the server crashes during backup mode and the WAL files indicated by backup_label
as the recovery starting point are removed (due to checkpoints during backup mode),
the server won't start. You can reproduce this issue with the following steps, for example.

--------------------
initdb -D data
pg_ctl -D data start
psql -c "select pg_start_backup('test', true)"
psql -c "select pg_switch_wal(); checkpoint"
psql -c "select pg_switch_wal(); checkpoint"
kill -9 $(head -1 data/postmaster.pid)
pg_ctl -D data start
--------------------

I understand some people still prefer exclusive backups for some reasons, which is
why I developed the pg_exclusive_backup extension that provides functions for
exclusive backups on PostgreSQL 15 or later. However, since this is an unofficial extension,
it's generally better to update your backup method or script to use non-exclusive backups.
https://github.com/MasaoFujii/pg_exclusive_backup

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: Running rsync backups in pg15

From
Murthy Nunna
Date:
Thanks, Fujii for your response.

I ran the test you outlined. Killed postmaster. Start failed. But simply removing the backup_label file made the server
torestart successfully.
 
There is no data corruption. Is there a scenario where data can be corrupted because of exclusive backup?

I saw your pg_extension where you put the old routines back with some additional checks... But I am hesitant to use the
oldfunctionality if there is really some issue that I am not realizing.
 

Thanks again!


Re: Running rsync backups in pg15

From
Laurenz Albe
Date:
On Sat, 2024-11-09 at 17:24 +0000, Murthy Nunna wrote:
> I ran the test you outlined. Killed postmaster. Start failed. But simply removing the backup_label
> file made the server to restart successfully.
> There is no data corruption. Is there a scenario where data can be corrupted because of exclusive backup?

You would normally end up with data corruption or an error if you removed "backup_label"
from a properly taken backup.  Manually removing "backup_label" is dangerous.  It is safe
to do if the server crashed while in exclusive backup mode, but it is detrimental anywhere
else.

Yours,
Laurenz Albe



RE: Running rsync backups in pg15

From
Murthy Nunna
Date:
Thanks, Laurenz. I appreciate your explanation.

So, in the test case that Fujii sent, the server crashed during backup so it was safe to remove "backup_label".

However, in case of a completed exclusive backup, if you remove "backup_label" and then try to bring up server using
thesame backed up data as PGDATA, then you will have corrupted server. I get it. But I see this case as a user who is
deliberatelycorrupting the server. There are tons of scenarios to corrupt a database from outside postgres. Postgres
cannotbe responsible for these type of user introduced corruptions.
 

In my humble opinion with respect to all, if this is the only reason (removing "backup_label" thus introducing
corruption)for removing exclusive backups, I think it is an oversight by the Postgres Development Group approving this
change.


-----Original Message-----
From: Laurenz Albe <laurenz.albe@cybertec.at> 
Sent: Saturday, November 9, 2024 11:34 PM
To: Murthy Nunna <mnunna@fnal.gov>; Pgsql-admin <pgsql-admin@lists.postgresql.org>
Subject: Re: Running rsync backups in pg15

You would normally end up with data corruption or an error if you removed "backup_label"
from a properly taken backup.  Manually removing "backup_label" is dangerous.  It is safe to do if the server crashed
whilein exclusive backup mode, but it is detrimental anywhere else.
 

Yours,
Laurenz Albe

Re: Running rsync backups in pg15

From
Laurenz Albe
Date:
On Sun, 2024-11-10 at 15:05 +0000, Murthy Nunna wrote:
> So, in the test case that Fujii sent, the server crashed during backup so it was safe to remove "backup_label".
>
> However, in case of a completed exclusive backup, if you remove "backup_label" and then try to bring up server
> using the same backed up data as PGDATA, then you will have corrupted server. I get it.
>
> In my humble opinion with respect to all, if this is the only reason (removing "backup_label" thus introducing
> corruption) for removing exclusive backups, I think it is an oversight by the Postgres Development Group
> approving this change.

The other reason is that in this day of automation it is unappealing that there is a case where a
crashed server cannot start without manual intervention.  And automating the removal of "backup_label"
is not a good idea...

But I see your point, and I argued similarly when the exclusive backup mode was removed.
However, the majority disagreed.

See the discussion in
https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net

Yours,
Laurenz Albe



RE: Running rsync backups in pg15

From
Murthy Nunna
Date:
Thanks, Laurenz. Makes me feel better this has been discussed. That is ok, majority rules.


RE: Running rsync backups in pg15

From
Date:

Hi,

 

First of all, have you ever thought what happens if your database crashed while you are updating your single backup?! It is really a dangerous scenario there!

 

But any way, I suppose you are using psql to do your pg_start_backup(), pg_stop_backup().

What about this one (I didn’t change function names):

 

Select pg_start_backup();

\! your_rsync_script…

Select pg_stop_bacup();

 

This could be more sophisticated if you want…

 

Regards

 

Michel SALAIS

De : Murthy Nunna <mnunna@fnal.gov>
Envoyé : jeudi 7 novembre 2024 17:35
À : pgsql-admin@postgresql.org
Objet : Running rsync backups in pg15

 

Hi,

 

In PG14 and earlier, there is no requirement to keep database connection while rsync is in progress. However, there is a change in PG15+ that requires rsync to be while we have the same database session open that executes SELECT pg_backup_start('label'). This change requires a rewrite of existing scripts we have.

 

Currently (pg14):

 

                In bash script (run from cron)

  1. psql Select pg_start_backup
  2. rsync
  3. psql Select pg_stop_backup

 

In pg15 and later:

 

In bash script (run from cron)

 

psql

Select pg_start_backup

! run-rsync-script

Select pg_stop_backup

 

It can be done, but it makes it ugly to check errors and so forth that occur in the rsync script.

 

Anybody found an elegant way of doing this?

 

Thank you in advance for your ideas.