Thread: Add function to return backup_label and tablespace_map
Hi, Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_map filesfrom the result of pg_backup_stop() when taking a base backup using low level backup API. One issue when doing thisis that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'mafraid. To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file. ------------------------------ SELECT * FROM pg_backup_start('test'); \! cp -a $PGDATA /backup SELECT * FROM pg_backup_stop(); \pset tuples_only on \pset format unaligned \o /backup/data/backup_label SELECT labelfile FROM pg_backup_label(); \o /backup/data/tablespace_map SELECT spcmapfile FROM pg_backup_label(); ------------------------------ Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfullyreach the consensus for adding the function, I will update the patch. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Hi, > > Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_mapfiles from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue whendoing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'mafraid. > > To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file. > > ------------------------------ > SELECT * FROM pg_backup_start('test'); > \! cp -a $PGDATA /backup > SELECT * FROM pg_backup_stop(); > > \pset tuples_only on > \pset format unaligned > > \o /backup/data/backup_label > SELECT labelfile FROM pg_backup_label(); > > \o /backup/data/tablespace_map > SELECT spcmapfile FROM pg_backup_label(); > ------------------------------ > > Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfullyreach the consensus for adding the function, I will update the patch. > > Thought? +1 for making it easy for the user to create backup_label and tablespace_map files. With the patch, the label_file and tblspc_map_file contents are preserved until the lifecycle of the session or the next run of pg_backup_start, I'm not sure if we need to worry more about it. Why can't we have functions like pg_create_backup_label() and pg_create_tablespace_map() which create the 'backup_label' and 'tablespace_map' files respectively in the data directory and also return the contents as output columns? Also, we can let users run these create functions only once (perhaps after the pg_backup_stop is called which is when the contents will be consistent). If we allow these functions to read the label_file or tblspc_map_file contents during the backup before stop backup, they may not be consistent. We can have a new sessionBackupState something like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and after the new function calls sessionBackupState goes to SESSION_BACKUP_NONE) and the contents of label_file and tblspc_map_file are freed up. In the docs, it's good if we can clearly specify the steps to use all of these functions. Regards, Bharath Rupireddy.
On Fri, Jul 8, 2022 at 3:31 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Thu, Jul 7, 2022 at 10:14 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > Hi, > > > > Since an exclusive backup method was dropped in v15, in v15 or later, we need to create backup_label and tablespace_mapfiles from the result of pg_backup_stop() when taking a base backup using low level backup API. One issue whendoing this is that; there is no simple way to create those files from two columns "labelfile" and "spcmapfile" that pg_backup_stop()returns if we execute it via psql. Probaby we need to store those columns in a temporary file and run someOS commands or script to separate that file into backup_label and tablespace_map. This is not simple way, and which wouldprevent users from migrating their backup scripts using psql from an exclusive backup method to non-exclusive one, I'mafraid. > > > > To enable us to do that more easily, how about adding the pg_backup_label() function that returns backup_label and tablespace_map?I'm thinking to make this function available just after pg_backup_start() finishes, also even after pg_backup_stop()finishes. For example, this function allows us to take a backup using the following psql script file. > > > > ------------------------------ > > SELECT * FROM pg_backup_start('test'); > > \! cp -a $PGDATA /backup > > SELECT * FROM pg_backup_stop(); > > > > \pset tuples_only on > > \pset format unaligned > > > > \o /backup/data/backup_label > > SELECT labelfile FROM pg_backup_label(); > > > > \o /backup/data/tablespace_map > > SELECT spcmapfile FROM pg_backup_label(); > > ------------------------------ > > > > Attached is the WIP patch to add pg_backup_label function. No tests nor docs have been added yet, but if we can successfullyreach the consensus for adding the function, I will update the patch. > > > > Thought? > > +1 for making it easy for the user to create backup_label and > tablespace_map files. With the patch, the label_file and > tblspc_map_file contents are preserved until the lifecycle of the > session or the next run of pg_backup_start, I'm not sure if we need to > worry more about it. > > Why can't we have functions like pg_create_backup_label() and > pg_create_tablespace_map() which create the 'backup_label' and > 'tablespace_map' files respectively in the data directory and also > return the contents as output columns? > > Also, we can let users run these create functions only once (perhaps > after the pg_backup_stop is called which is when the contents will be > consistent). If we allow these functions to read the label_file or > tblspc_map_file contents during the backup before stop backup, they > may not be consistent. We can have a new sessionBackupState something > like SESSION_BACKUP_READY_TO_COLLECT_INFO or SESSION_BACKUP_DONE and > after the new function calls sessionBackupState goes to > SESSION_BACKUP_NONE) and the contents of label_file and > tblspc_map_file are freed up. > > In the docs, it's good if we can clearly specify the steps to use all > of these functions. Forgot to mention a comment on the v1 patch: we'll need to revoke permissions from the public for pg_backup_label (or whatever the new function(s) that'll be introduced) as well similar to pg_backup_start and pg_backup_stop. Regards, Bharath Rupireddy.
On 7/7/22 12:43, Fujii Masao wrote: > Since an exclusive backup method was dropped in v15, in v15 or later, we > need to create backup_label and tablespace_map files from the result of > pg_backup_stop() when taking a base backup using low level backup API. > One issue when doing this is that; there is no simple way to create > those files from two columns "labelfile" and "spcmapfile" that > pg_backup_stop() returns if we execute it via psql. Probaby we need to > store those columns in a temporary file and run some OS commands or > script to separate that file into backup_label and tablespace_map. Why not just select these columns into a temp table: create temp table backup_result as select * from pg_backup_stop(...); Then they can be easily dumped with \o by selecting from the temp table. > To enable us to do that more easily, how about adding the > pg_backup_label() function that returns backup_label and tablespace_map? > I'm thinking to make this function available just after > pg_backup_start() finishes This makes me nervous as I'm sure users will immediately start writing backup_label into PGDATA to make their lives easier. Having backup_label in PGDATA for a running cluster causes problems and is the major reason we deprecated and then removed the exclusive method. In addition, what little protection we had from this condition has been removed. Regards, -David
On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote: > > > To enable us to do that more easily, how about adding the > > pg_backup_label() function that returns backup_label and tablespace_map? > > I'm thinking to make this function available just after > > pg_backup_start() finishes > > This makes me nervous as I'm sure users will immediately start writing > backup_label into PGDATA to make their lives easier. Having backup_label > in PGDATA for a running cluster causes problems and is the major reason > we deprecated and then removed the exclusive method. In addition, what > little protection we had from this condition has been removed. IIUC, with the new mechanism, we don't need a backup_label file to be present in the data directory after pg_backup_stop? If yes, where will the postgres recover from if it crashes after pg_backup_stop before the next checkpoint? I'm trying to understand the significance of the backup_label and tablespace_map contents after the removal of exclusive backup. Also, do we need the read_backup_label part of the code [1]? [1] if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired, &backupFromStandby)) { List *tablespaces = NIL; /* * Archive recovery was requested, and thanks to the backup label * file, we know how far we need to replay to reach consistency. Enter * archive recovery directly. */ InArchiveRecovery = true; if (StandbyModeRequested) StandbyMode = true; /* * When a backup_label file is present, we want to roll forward from * the checkpoint it identifies, rather than using pg_control. */ Regards, Bharath Rupireddy.
On 7/8/22 07:53, Bharath Rupireddy wrote: > On Fri, Jul 8, 2022 at 5:12 PM David Steele <david@pgmasters.net> wrote: >> >>> To enable us to do that more easily, how about adding the >>> pg_backup_label() function that returns backup_label and tablespace_map? >>> I'm thinking to make this function available just after >>> pg_backup_start() finishes >> >> This makes me nervous as I'm sure users will immediately start writing >> backup_label into PGDATA to make their lives easier. Having backup_label >> in PGDATA for a running cluster causes problems and is the major reason >> we deprecated and then removed the exclusive method. In addition, what >> little protection we had from this condition has been removed. > > IIUC, with the new mechanism, we don't need a backup_label file to be > present in the data directory after pg_backup_stop? If yes, where will > the postgres recover from if it crashes after pg_backup_stop before > the next checkpoint? I'm trying to understand the significance of the > backup_label and tablespace_map contents after the removal of > exclusive backup. backup_label should be written directly into the backup and should be present when the backup is restored and before recovery begins. It should not be present in a normally operating cluster or it will cause problems after crashes and restarts. > Also, do we need the read_backup_label part of the code [1]? Yes, since the backup_label is required for recovery. Regards, -David
On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote: > > On 7/7/22 12:43, Fujii Masao wrote: > > > Since an exclusive backup method was dropped in v15, in v15 or later, we > > need to create backup_label and tablespace_map files from the result of > > pg_backup_stop() when taking a base backup using low level backup API. > > One issue when doing this is that; there is no simple way to create > > those files from two columns "labelfile" and "spcmapfile" that > > pg_backup_stop() returns if we execute it via psql. Probaby we need to > > store those columns in a temporary file and run some OS commands or > > script to separate that file into backup_label and tablespace_map. > > Why not just select these columns into a temp table: > > create temp table backup_result as select * from pg_backup_stop(...); > > Then they can be easily dumped with \o by selecting from the temp table. That wouldn't help people making backups from standby servers.
On 7/8/22 08:22, Julien Rouhaud wrote: > On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote: >> >> On 7/7/22 12:43, Fujii Masao wrote: >> >>> Since an exclusive backup method was dropped in v15, in v15 or later, we >>> need to create backup_label and tablespace_map files from the result of >>> pg_backup_stop() when taking a base backup using low level backup API. >>> One issue when doing this is that; there is no simple way to create >>> those files from two columns "labelfile" and "spcmapfile" that >>> pg_backup_stop() returns if we execute it via psql. Probaby we need to >>> store those columns in a temporary file and run some OS commands or >>> script to separate that file into backup_label and tablespace_map. >> >> Why not just select these columns into a temp table: >> >> create temp table backup_result as select * from pg_backup_stop(...); >> >> Then they can be easily dumped with \o by selecting from the temp table. > > That wouldn't help people making backups from standby servers. Ah, yes, good point. This should work on a standby, though: select quote_literal(labelfile) as backup_label from pg_backup_stop(...) \gset \pset tuples_only on \pset format unaligned \o /backup_path/backup_label select :backup_label; Regards, -David
Re: David Steele > > To enable us to do that more easily, how about adding the > > pg_backup_label() function that returns backup_label and tablespace_map? > > I'm thinking to make this function available just after > > pg_backup_start() finishes I was just wondering: Why is "labelfile" only returned by pg_backup_stop()? All the info in there is already available at pg_backup_start() time. Having the output available earlier would allow writing the backup_label into the backup directory, or store it along some filesystem snapshot that is already immutable by the time pg_backup_stop is called. If we rename all functions anyway for PG15, we could move the info from stop to start. > This makes me nervous as I'm sure users will immediately start writing > backup_label into PGDATA to make their lives easier. Having backup_label in > PGDATA for a running cluster causes problems and is the major reason we > deprecated and then removed the exclusive method. In addition, what little > protection we had from this condition has been removed. Is that really an argument for making the life of everyone else harder? Christoph
On 7/8/22 09:10, Christoph Berg wrote: > Re: David Steele >>> To enable us to do that more easily, how about adding the >>> pg_backup_label() function that returns backup_label and tablespace_map? >>> I'm thinking to make this function available just after >>> pg_backup_start() finishes > > I was just wondering: Why is "labelfile" only returned by > pg_backup_stop()? All the info in there is already available at > pg_backup_start() time. Not sure exactly why this decision was made in 9.6 (might be because tablespace_map does need to be generated at stop time), but I'm planning to add data to this file in PG16 that is only available at stop time. In particular, backup software would like to know the earliest possible time that can be used for PITR and right now this needs to be approximated. Would be good to have that in backup_label along with start time. Min recovery xid would also be very useful. > Having the output available earlier would > allow writing the backup_label into the backup directory, or store it > along some filesystem snapshot that is already immutable by the time > pg_backup_stop is called. What is precluded by getting the backup label after pg_backup_stop()? Perhaps a more detailed example here would be helpful. > If we rename all functions anyway for PG15, we could move the info > from stop to start. > >> This makes me nervous as I'm sure users will immediately start writing >> backup_label into PGDATA to make their lives easier. Having backup_label in >> PGDATA for a running cluster causes problems and is the major reason we >> deprecated and then removed the exclusive method. In addition, what little >> protection we had from this condition has been removed. > > Is that really an argument for making the life of everyone else > harder? I don't see how anyone's life is made harder unless the plan is to write backup_label into PGDATA, which should not be done. As we've noted before, there's no point in pretending that doing backup correctly is easy because it is definitely not. Regards, -David
On 7/8/22 09:09, David Steele wrote: > On 7/8/22 08:22, Julien Rouhaud wrote: >> On Fri, Jul 8, 2022 at 7:42 PM David Steele <david@pgmasters.net> wrote: >>> >>> On 7/7/22 12:43, Fujii Masao wrote: >>> >>>> Since an exclusive backup method was dropped in v15, in v15 or >>>> later, we >>>> need to create backup_label and tablespace_map files from the result of >>>> pg_backup_stop() when taking a base backup using low level backup API. >>>> One issue when doing this is that; there is no simple way to create >>>> those files from two columns "labelfile" and "spcmapfile" that >>>> pg_backup_stop() returns if we execute it via psql. Probaby we need to >>>> store those columns in a temporary file and run some OS commands or >>>> script to separate that file into backup_label and tablespace_map. >>> >>> Why not just select these columns into a temp table: >>> >>> create temp table backup_result as select * from pg_backup_stop(...); >>> >>> Then they can be easily dumped with \o by selecting from the temp table. >> >> That wouldn't help people making backups from standby servers. > > Ah, yes, good point. This should work on a standby, though: > > select quote_literal(labelfile) as backup_label from pg_backup_stop(...) > \gset > \pset tuples_only on > \pset format unaligned > \o /backup_path/backup_label > select :backup_label; Looks like I made that more complicated than it needed to be: select * from pg_backup_stop(...) \gset \pset tuples_only on \pset format unaligned \o /backup_path/backup_label select :'labelfile'; Regards, -David
At Fri, 8 Jul 2022 01:43:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > finishes. For example, this function allows us to take a backup using > the following psql script file. > > ------------------------------ > SELECT * FROM pg_backup_start('test'); > \! cp -a $PGDATA /backup > SELECT * FROM pg_backup_stop(); > > \pset tuples_only on > \pset format unaligned > > \o /backup/data/backup_label > SELECT labelfile FROM pg_backup_label(); > > \o /backup/data/tablespace_map > SELECT spcmapfile FROM pg_backup_label(); > ------------------------------ As David mentioned, we can do the same thing now by using \gset, when we want to save the files on the client side. (File copy is done on the server side by the steps, though.) Thinking about another scenario of generating those files server-side (this is safer than the client-side method regarding to line-separators and the pset settings, I think). We can do that by using admingpack instead, with simpler steps. SELECT lsn, labelfile, spcmapfile pg_file_write('/tmp/backup_label', labelfile, false), pg_file_write('/tmp/tablespace_map', spcmapfile, false) FROM pg_backup_stop(); However, if pg_file_write() fails, the data are gone. But \gset also works here. select pg_backup_start('s1'); SELECT * FROM pg_backup_stop() \gset SELECT pg_file_write('/tmp/backup_label', :'labelfile', false); SELECT pg_file_write('/tmp/tablespace_map', :'spcmapfile', false); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2022/07/08 23:11, David Steele wrote: > Looks like I made that more complicated than it needed to be: > > select * from pg_backup_stop(...) \gset > \pset tuples_only on > \pset format unaligned > \o /backup_path/backup_label > select :'labelfile'; Thanks! I had completely forgotten \gset command. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Greetings, * Fujii Masao (masao.fujii@oss.nttdata.com) wrote: > On 2022/07/08 23:11, David Steele wrote: > >Looks like I made that more complicated than it needed to be: > > > >select * from pg_backup_stop(...) \gset > >\pset tuples_only on > >\pset format unaligned > >\o /backup_path/backup_label > >select :'labelfile'; > > Thanks! I had completely forgotten \gset command. Seems like it might make sense to consider using a better format for these files and also to allow us to more easily add things in the future (ending LSN, ending time, etc) for backup tools to be able to leverage. Perhaps we should change this to having just a single file returned, instead of two, and use JSON for it, as a more general and extensible format that we've already got code to work with..? Thanks, Stephen