Thread: Add function to return backup_label and tablespace_map

Add function to return backup_label and tablespace_map

From
Fujii Masao
Date:
Hi,

Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_map
filesfrom the result of pg_backup_stop() when taking a base backup using low level backup API. One issue when doing
thisis that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run
someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which
wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one,
I'mafraid.
 

To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and
tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after
pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file.
 

------------------------------
SELECT * FROM pg_backup_start('test');
\! cp -a $PGDATA /backup
SELECT * FROM pg_backup_stop();

\pset tuples_only on
\pset format unaligned

\o /backup/data/backup_label
SELECT labelfile FROM pg_backup_label();

\o /backup/data/tablespace_map
SELECT spcmapfile FROM pg_backup_label();
------------------------------

Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can
successfullyreach the consensus for adding the function, I will update the patch.
 

Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment

Re: Add function to return backup_label and tablespace_map

From
Bharath Rupireddy
Date:
On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Hi,
>
> Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and
tablespace_mapfiles from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue
whendoing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run
someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which
wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one,
I'mafraid. 
>
> To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and
tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after
pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file. 
>
> ------------------------------
> SELECT * FROM pg_backup_start('test');
> \! cp -a $PGDATA /backup
> SELECT * FROM pg_backup_stop();
>
> \pset tuples_only on
> \pset format unaligned
>
> \o /backup/data/backup_label
> SELECT labelfile FROM pg_backup_label();
>
> \o /backup/data/tablespace_map
> SELECT spcmapfile FROM pg_backup_label();
> ------------------------------
>
> Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can
successfullyreach the consensus for adding the function, I will update the patch. 
>
> Thought?

+1 for making it easy for the user to create backup_label and
tablespace_map files. With the patch, the label_file and
tblspc_map_file contents are preserved until the lifecycle of the
session or the next run of pg_backup_start, I'm not sure if we need to
worry more about it.

Why can't we have functions like pg_create_backup_label() and
pg_create_tablespace_map() which create the 'backup_label' and
'tablespace_map' files respectively in the data directory and also
return the contents as output columns?

Also, we can let users run these create functions only once (perhaps
after the pg_backup_stop is called which is when the contents will be
consistent). If we allow these functions to read the label_file or
tblspc_map_file contents during the backup before stop backup, they
may not be consistent. We can have a new sessionBackupState something
like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
after the new function calls sessionBackupState  goes to
SESSION_BACKUP_NONE) and the contents of label_file and
tblspc_map_file are freed up.

In the docs, it's good if we can clearly specify the steps to use all
of these functions.

Regards,
Bharath Rupireddy.



Re: Add function to return backup_label and tablespace_map

From
Bharath Rupireddy
Date:
On Fri, Jul 8, 2022 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >
> > Hi,
> >
> > Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and
tablespace_mapfiles from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue
whendoing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that
pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run
someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which
wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one,
I'mafraid. 
> >
> > To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and
tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after
pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file. 
> >
> > ------------------------------
> > SELECT * FROM pg_backup_start('test');
> > \! cp -a $PGDATA /backup
> > SELECT * FROM pg_backup_stop();
> >
> > \pset tuples_only on
> > \pset format unaligned
> >
> > \o /backup/data/backup_label
> > SELECT labelfile FROM pg_backup_label();
> >
> > \o /backup/data/tablespace_map
> > SELECT spcmapfile FROM pg_backup_label();
> > ------------------------------
> >
> > Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can
successfullyreach the consensus for adding the function, I will update the patch. 
> >
> > Thought?
>
> +1 for making it easy for the user to create backup_label and
> tablespace_map files. With the patch, the label_file and
> tblspc_map_file contents are preserved until the lifecycle of the
> session or the next run of pg_backup_start, I'm not sure if we need to
> worry more about it.
>
> Why can't we have functions like pg_create_backup_label() and
> pg_create_tablespace_map() which create the 'backup_label' and
> 'tablespace_map' files respectively in the data directory and also
> return the contents as output columns?
>
> Also, we can let users run these create functions only once (perhaps
> after the pg_backup_stop is called which is when the contents will be
> consistent). If we allow these functions to read the label_file or
> tblspc_map_file contents during the backup before stop backup, they
> may not be consistent. We can have a new sessionBackupState something
> like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and
> after the new function calls sessionBackupState  goes to
> SESSION_BACKUP_NONE) and the contents of label_file and
> tblspc_map_file are freed up.
>
> In the docs, it's good if we can clearly specify the steps to use all
> of these functions.

Forgot to mention a comment on the v1 patch: we'll need to revoke
permissions from the public for pg_backup_label (or whatever the new
function(s) that'll be introduced) as well similar to pg_backup_start
and pg_backup_stop.

Regards,
Bharath Rupireddy.



Re: Add function to return backup_label and tablespace_map

From
David Steele
Date:
On 7/7/22 12:43, Fujii Masao wrote:

> Since an exclusive backup method was dropped in v15, in v15 or later, we 
> need to create backup_label and tablespace_map files from the result of 
> pg_backup_stop() when taking a base backup using low level backup API. 
> One issue when doing this is that; there is no simple way to create 
> those files from two columns "labelfile" and "spcmapfile" that 
> pg_backup_stop() returns if we execute it via psql. Probaby we need to 
> store those columns in a temporary file and run some OS commands or 
> script to separate that file into backup_label and tablespace_map. 

Why not just select these columns into a temp table:

create temp table backup_result as select * from pg_backup_stop(...);

Then they can be easily dumped with \o by selecting from the temp table.

> To enable us to do that more easily, how about adding the 
> pg_backup_label() function that returns backup_label and tablespace_map? 
> I'm thinking to make this function available just after 
> pg_backup_start() finishes

This makes me nervous as I'm sure users will immediately start writing 
backup_label into PGDATA to make their lives easier. Having backup_label 
in PGDATA for a running cluster causes problems and is the major reason 
we deprecated and then removed the exclusive method. In addition, what 
little protection we had from this condition has been removed.

Regards,
-David



Re: Add function to return backup_label and tablespace_map

From
Bharath Rupireddy
Date:
On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote:
>
> > To enable us to do that more easily, how about adding the
> > pg_backup_label() function that returns backup_label and tablespace_map?
> > I'm thinking to make this function available just after
> > pg_backup_start() finishes
>
> This makes me nervous as I'm sure users will immediately start writing
> backup_label into PGDATA to make their lives easier. Having backup_label
> in PGDATA for a running cluster causes problems and is the major reason
> we deprecated and then removed the exclusive method. In addition, what
> little protection we had from this condition has been removed.

IIUC, with the new mechanism, we don't need a backup_label file to be
present in the data directory after pg_backup_stop? If yes, where will
the postgres recover from if it crashes after pg_backup_stop before
the next checkpoint? I'm trying to understand the significance of the
backup_label and tablespace_map contents after the removal of
exclusive backup.

Also, do we need the read_backup_label part of the code [1]?


[1]
    if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired,
                          &backupFromStandby))
    {
        List       *tablespaces = NIL;

        /*
         * Archive recovery was requested, and thanks to the backup label
         * file, we know how far we need to replay to reach consistency. Enter
         * archive recovery directly.
         */
        InArchiveRecovery = true;
        if (StandbyModeRequested)
            StandbyMode = true;

        /*
         * When a backup_label file is present, we want to roll forward from
         * the checkpoint it identifies, rather than using pg_control.
         */

Regards,
Bharath Rupireddy.



Re: Add function to return backup_label and tablespace_map

From
David Steele
Date:
On 7/8/22 07:53, Bharath Rupireddy wrote:
> On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote:
>>
>>> To enable us to do that more easily, how about adding the
>>> pg_backup_label() function that returns backup_label and tablespace_map?
>>> I'm thinking to make this function available just after
>>> pg_backup_start() finishes
>>
>> This makes me nervous as I'm sure users will immediately start writing
>> backup_label into PGDATA to make their lives easier. Having backup_label
>> in PGDATA for a running cluster causes problems and is the major reason
>> we deprecated and then removed the exclusive method. In addition, what
>> little protection we had from this condition has been removed.
> 
> IIUC, with the new mechanism, we don't need a backup_label file to be
> present in the data directory after pg_backup_stop? If yes, where will
> the postgres recover from if it crashes after pg_backup_stop before
> the next checkpoint? I'm trying to understand the significance of the
> backup_label and tablespace_map contents after the removal of
> exclusive backup.

backup_label should be written directly into the backup and should be 
present when the backup is restored and before recovery begins. It 
should not be present in a normally operating cluster or it will cause 
problems after crashes and restarts.

> Also, do we need the read_backup_label part of the code [1]?

Yes, since the backup_label is required for recovery.

Regards,
-David



Re: Add function to return backup_label and tablespace_map

From
Julien Rouhaud
Date:
On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:
>
> On 7/7/22 12:43, Fujii Masao wrote:
>
> > Since an exclusive backup method was dropped in v15, in v15 or later, we
> > need to create backup_label and tablespace_map files from the result of
> > pg_backup_stop() when taking a base backup using low level backup API.
> > One issue when doing this is that; there is no simple way to create
> > those files from two columns "labelfile" and "spcmapfile" that
> > pg_backup_stop() returns if we execute it via psql. Probaby we need to
> > store those columns in a temporary file and run some OS commands or
> > script to separate that file into backup_label and tablespace_map.
>
> Why not just select these columns into a temp table:
>
> create temp table backup_result as select * from pg_backup_stop(...);
>
> Then they can be easily dumped with \o by selecting from the temp table.

That wouldn't help people making backups from standby servers.



Re: Add function to return backup_label and tablespace_map

From
David Steele
Date:
On 7/8/22 08:22, Julien Rouhaud wrote:
> On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:
>>
>> On 7/7/22 12:43, Fujii Masao wrote:
>>
>>> Since an exclusive backup method was dropped in v15, in v15 or later, we
>>> need to create backup_label and tablespace_map files from the result of
>>> pg_backup_stop() when taking a base backup using low level backup API.
>>> One issue when doing this is that; there is no simple way to create
>>> those files from two columns "labelfile" and "spcmapfile" that
>>> pg_backup_stop() returns if we execute it via psql. Probaby we need to
>>> store those columns in a temporary file and run some OS commands or
>>> script to separate that file into backup_label and tablespace_map.
>>
>> Why not just select these columns into a temp table:
>>
>> create temp table backup_result as select * from pg_backup_stop(...);
>>
>> Then they can be easily dumped with \o by selecting from the temp table.
> 
> That wouldn't help people making backups from standby servers.

Ah, yes, good point. This should work on a standby, though:

select quote_literal(labelfile) as backup_label from pg_backup_stop(...) 
\gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :backup_label;

Regards,
-David



Re: Add function to return backup_label and tablespace_map

From
Christoph Berg
Date:
Re: David Steele
> > To enable us to do that more easily, how about adding the
> > pg_backup_label() function that returns backup_label and tablespace_map?
> > I'm thinking to make this function available just after
> > pg_backup_start() finishes

I was just wondering: Why is "labelfile" only returned by
pg_backup_stop()? All the info in there is already available at
pg_backup_start() time. Having the output available earlier would
allow writing the backup_label into the backup directory, or store it
along some filesystem snapshot that is already immutable by the time
pg_backup_stop is called.

If we rename all functions anyway for PG15, we could move the info
from stop to start.

> This makes me nervous as I'm sure users will immediately start writing
> backup_label into PGDATA to make their lives easier. Having backup_label in
> PGDATA for a running cluster causes problems and is the major reason we
> deprecated and then removed the exclusive method. In addition, what little
> protection we had from this condition has been removed.

Is that really an argument for making the life of everyone else
harder?

Christoph



Re: Add function to return backup_label and tablespace_map

From
David Steele
Date:

On 7/8/22 09:10, Christoph Berg wrote:
> Re: David Steele
>>> To enable us to do that more easily, how about adding the
>>> pg_backup_label() function that returns backup_label and tablespace_map?
>>> I'm thinking to make this function available just after
>>> pg_backup_start() finishes
> 
> I was just wondering: Why is "labelfile" only returned by
> pg_backup_stop()? All the info in there is already available at
> pg_backup_start() time. 

Not sure exactly why this decision was made in 9.6 (might be because 
tablespace_map does need to be generated at stop time), but I'm planning 
to add data to this file in PG16 that is only available at stop time. In 
particular, backup software would like to know the earliest possible 
time that can be used for PITR and right now this needs to be 
approximated. Would be good to have that in backup_label along with 
start time. Min recovery xid would also be very useful.

> Having the output available earlier would
> allow writing the backup_label into the backup directory, or store it
> along some filesystem snapshot that is already immutable by the time
> pg_backup_stop is called.

What is precluded by getting the backup label after pg_backup_stop()? 
Perhaps a more detailed example here would be helpful.

> If we rename all functions anyway for PG15, we could move the info
> from stop to start.
> 
>> This makes me nervous as I'm sure users will immediately start writing
>> backup_label into PGDATA to make their lives easier. Having backup_label in
>> PGDATA for a running cluster causes problems and is the major reason we
>> deprecated and then removed the exclusive method. In addition, what little
>> protection we had from this condition has been removed.
> 
> Is that really an argument for making the life of everyone else
> harder?

I don't see how anyone's life is made harder unless the plan is to write 
backup_label into PGDATA, which should not be done.

As we've noted before, there's no point in pretending that doing backup 
correctly is easy because it is definitely not.

Regards,
-David



Re: Add function to return backup_label and tablespace_map

From
David Steele
Date:
On 7/8/22 09:09, David Steele wrote:
> On 7/8/22 08:22, Julien Rouhaud wrote:
>> On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote:
>>>
>>> On 7/7/22 12:43, Fujii Masao wrote:
>>>
>>>> Since an exclusive backup method was dropped in v15, in v15 or 
>>>> later, we
>>>> need to create backup_label and tablespace_map files from the result of
>>>> pg_backup_stop() when taking a base backup using low level backup API.
>>>> One issue when doing this is that; there is no simple way to create
>>>> those files from two columns "labelfile" and "spcmapfile" that
>>>> pg_backup_stop() returns if we execute it via psql. Probaby we need to
>>>> store those columns in a temporary file and run some OS commands or
>>>> script to separate that file into backup_label and tablespace_map.
>>>
>>> Why not just select these columns into a temp table:
>>>
>>> create temp table backup_result as select * from pg_backup_stop(...);
>>>
>>> Then they can be easily dumped with \o by selecting from the temp table.
>>
>> That wouldn't help people making backups from standby servers.
> 
> Ah, yes, good point. This should work on a standby, though:
> 
> select quote_literal(labelfile) as backup_label from pg_backup_stop(...) 
> \gset
> \pset tuples_only on
> \pset format unaligned
> \o /backup_path/backup_label
> select :backup_label;

Looks like I made that more complicated than it needed to be:

select * from pg_backup_stop(...) \gset
\pset tuples_only on
\pset format unaligned
\o /backup_path/backup_label
select :'labelfile';

Regards,
-David



Re: Add function to return backup_label and tablespace_map

From
Kyotaro Horiguchi
Date:
At Fri, 8 Jul 2022 01:43:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> finishes. For example, this function allows us to take a backup using
> the following psql script file.
> 
> ------------------------------
> SELECT * FROM pg_backup_start('test');
> \! cp -a $PGDATA /backup
> SELECT * FROM pg_backup_stop();
> 
> \pset tuples_only on
> \pset format unaligned
> 
> \o /backup/data/backup_label
> SELECT labelfile FROM pg_backup_label();
> 
> \o /backup/data/tablespace_map
> SELECT spcmapfile FROM pg_backup_label();
> ------------------------------

As David mentioned, we can do the same thing now by using \gset, when
we want to save the files on the client side. (File copy is done on
the server side by the steps, though.)

Thinking about another scenario of generating those files server-side
(this is safer than the client-side method regarding to
line-separators and the pset settings, I think).  We can do that by
using admingpack instead, with simpler steps.

SELECT lsn, labelfile, spcmapfile
       pg_file_write('/tmp/backup_label', labelfile, false),
       pg_file_write('/tmp/tablespace_map', spcmapfile, false)
FROM pg_backup_stop();

However, if pg_file_write() fails, the data are gone.  But \gset also
works here.

select pg_backup_start('s1');
SELECT * FROM pg_backup_stop() \gset
SELECT pg_file_write('/tmp/backup_label', :'labelfile', false);
SELECT pg_file_write('/tmp/tablespace_map', :'spcmapfile', false);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Add function to return backup_label and tablespace_map

From
Fujii Masao
Date:

On 2022/07/08 23:11, David Steele wrote:
> Looks like I made that more complicated than it needed to be:
> 
> select * from pg_backup_stop(...) \gset
> \pset tuples_only on
> \pset format unaligned
> \o /backup_path/backup_label
> select :'labelfile';

Thanks! I had completely forgotten \gset command.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Add function to return backup_label and tablespace_map

From
Stephen Frost
Date:
Greetings,

* Fujii Masao (masao.fujii@oss.nttdata.com) wrote:
> On 2022/07/08 23:11, David Steele wrote:
> >Looks like I made that more complicated than it needed to be:
> >
> >select * from pg_backup_stop(...) \gset
> >\pset tuples_only on
> >\pset format unaligned
> >\o /backup_path/backup_label
> >select :'labelfile';
>
> Thanks! I had completely forgotten \gset command.

Seems like it might make sense to consider using a better format for
these files and also to allow us to more easily add things in the future
(ending LSN, ending time, etc) for backup tools to be able to leverage.
Perhaps we should change this to having just a single file returned,
instead of two, and use JSON for it, as a more general and extensible
format that we've already got code to work with..?

Thanks,

Stephen

Attachment