Thread: Updated backup APIs for non-exclusive backups
* Introduces a new argument for pg_start_backup(), for "exclusive". It defaults to true, which means that just calling pg_start_backup() like before will behave exactly like before. If it's called with exclusive='f', a non-exclusive backup is started, and the backup label is collected in memory instead of in the backup_label file.
Attachment
On Wed, Feb 10, 2016 at 7:46 AM, Magnus Hagander <magnus@hagander.net> wrote: > Per discussionat the developer meeting in Brussels, here's a patch that > makes some updates to the backup APIs, to support non-exclusive backups > without using pg_basebackup. The idea is to fix at least three main issues > that are there today -- that you cannot run concurrent backups, that the > backup_label file is created in the data directory which makes it impossible > to distinguish between a cluster restored from backup and one that crashed > while a backup was running, and a cluster can get "stuck" in backup mode if > the backup script/software crashes. > > To make this work, this patch: > > * Introduces a new argument for pg_start_backup(), for "exclusive". It > defaults to true, which means that just calling pg_start_backup() like > before will behave exactly like before. If it's called with exclusive='f', a > non-exclusive backup is started, and the backup label is collected in memory > instead of in the backup_label file. > > * If the client disconnects with a non-exclusive backup running, the backup > is automatically aborted. This is the same thing that pg_basebackup does. To > use these non-exclusive backups the backup software will have to maintain a > persistent connection to the database -- something that should not be a > problem for any of the modern ones out there (but would've been slightly > trickier back in the days when we suggested shellscripts) > > * A new version of pg_stop_backup is created, taking an argument specifying > if it's exclusive or not. The current version of pg_stop_backup() continues > to work for exclusive backups, with no changes to behavior. The new > pg_stop_backup will return a record of three columns instead of just the > value -- the LSN (pglsn), the backup label file (text) and the tablespace > map file (text). If used to cancel an exclusive backup, backup label file > and tablespace map will be NULL. At this point it's up to the backup > software to write down the files in the location of the backup. > > > Attached patch currently doesn't include full documentation, since Bruce was > going to restructure that section of the docs (also based on the > devmeeting). I will write up the docs once that is done (I assume it will be > soon enough, or I'll go do it regardless), but I wanted to get some review > in on the code while waiting. Wow, this is a great idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote: > Per discussionat the developer meeting in Brussels, here's a patch that > makes some updates to the backup APIs, to support non-exclusive backups > without using pg_basebackup. Thanks for following through with this! > * If the client disconnects with a non-exclusive backup running, the backup > is automatically aborted. This is the same thing that pg_basebackup does. > To use these non-exclusive backups the backup software will have to > maintain a persistent connection to the database -- something that should > not be a problem for any of the modern ones out there (but would've been > slightly trickier back in the days when we suggested shellscripts) I think we might want to make this one optional, but I'm not going to fight super hard for it. > * A new version of pg_stop_backup is created, taking an argument specifying > if it's exclusive or not. The current version of pg_stop_backup() continues > to work for exclusive backups, with no changes to behavior. The new > pg_stop_backup will return a record of three columns instead of just the > value -- the LSN (pglsn), the backup label file (text) and the tablespace > map file (text). I wonder if we shouldn't change the API a bit more aggressively. Right now we return the label and the map - but what if there's more files at some later point? One way would be to make it a SETOF function returning 'filename', 'content' or such. Makes it less clear what to do with the lsn admittedly. Regards, Andres
Hi,
On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup.
Thanks for following through with this!
> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does.
> To use these non-exclusive backups the backup software will have to
> maintain a persistent connection to the database -- something that should
> not be a problem for any of the modern ones out there (but would've been
> slightly trickier back in the days when we suggested shellscripts)
I think we might want to make this one optional, but I'm not going to
fight super hard for it.
> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text).
I wonder if we shouldn't change the API a bit more aggressively. Right
now we return the label and the map - but what if there's more files at
some later point? One way would be to make it a SETOF function returning
'filename', 'content' or such. Makes it less clear what to do with the
lsn admittedly.
* Andres Freund (andres@anarazel.de) wrote: > On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote: > > * If the client disconnects with a non-exclusive backup running, the backup > > is automatically aborted. This is the same thing that pg_basebackup does. > > To use these non-exclusive backups the backup software will have to > > maintain a persistent connection to the database -- something that should > > not be a problem for any of the modern ones out there (but would've been > > slightly trickier back in the days when we suggested shellscripts) > > I think we might want to make this one optional, but I'm not going to > fight super hard for it. I was thinking along the same lines. > > * A new version of pg_stop_backup is created, taking an argument specifying > > if it's exclusive or not. The current version of pg_stop_backup() continues > > to work for exclusive backups, with no changes to behavior. The new > > pg_stop_backup will return a record of three columns instead of just the > > value -- the LSN (pglsn), the backup label file (text) and the tablespace > > map file (text). > > I wonder if we shouldn't change the API a bit more aggressively. Right > now we return the label and the map - but what if there's more files at > some later point? One way would be to make it a SETOF function returning > 'filename', 'content' or such. Makes it less clear what to do with the > lsn admittedly. If we make the 'client disconnect means abort' optional then we'd also need to modify the API of stop backup to specify which backup to stop, I'd think. Thanks! Stephen
* Magnus Hagander (magnus@hagander.net) wrote: > On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote: > > > Per discussionat the developer meeting in Brussels, here's a patch that > > > makes some updates to the backup APIs, to support non-exclusive backups > > > without using pg_basebackup. > > > > Thanks for following through with this! > > > > > * If the client disconnects with a non-exclusive backup running, the > > backup > > > is automatically aborted. This is the same thing that pg_basebackup does. > > > To use these non-exclusive backups the backup software will have to > > > maintain a persistent connection to the database -- something that should > > > not be a problem for any of the modern ones out there (but would've been > > > slightly trickier back in the days when we suggested shellscripts) > > > > I think we might want to make this one optional, but I'm not going to > > fight super hard for it. > > Not sure what you're referring to here. Do you mean being able to make a > non-exclusive backup while not maintaining a connection? That's going to > make things a *lot* more complicated, as we'd have to invent something like > "backup slots" similar to what we're doing with replication slots. I doubt > it's worth all that work and complexity. Hrmmm. If that's the case then perhaps you're right. I liked the general idea of not having to maintain a TCP connection during the entire backup (TCP connections can be annoyingly finicky in certain environments...), but I'm not sure it's worth a lot of additional complexity. Thanks! Stephen
On 2/10/16 9:44 AM, Stephen Frost wrote: > * Magnus Hagander (magnus@hagander.net) wrote: >> On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote: >>>> * If the client disconnects with a non-exclusive backup running, the >>> backup >>>> is automatically aborted. This is the same thing that pg_basebackup does. >>>> To use these non-exclusive backups the backup software will have to >>>> maintain a persistent connection to the database -- something that should >>>> not be a problem for any of the modern ones out there (but would've been >>>> slightly trickier back in the days when we suggested shellscripts) >>> >>> I think we might want to make this one optional, but I'm not going to >>> fight super hard for it. >> >> Not sure what you're referring to here. Do you mean being able to make a >> non-exclusive backup while not maintaining a connection? That's going to >> make things a *lot* more complicated, as we'd have to invent something like >> "backup slots" similar to what we're doing with replication slots. I doubt >> it's worth all that work and complexity. Agreed. > Hrmmm. If that's the case then perhaps you're right. I liked the > general idea of not having to maintain a TCP connection during the > entire backup (TCP connections can be annoyingly finicky in certain > environments...), but I'm not sure it's worth a lot of additional > complexity. Well, pgBackRest holds a connection to PostgreSQL through the entire backup and will abort the backup if it is severed. The connection is always held locally, though, even if the master backup process is on a different server. I haven't gotten any reports of aborted backups due to the connection failing. -- -David david@pgmasters.net
* David Steele (david@pgmasters.net) wrote: > On 2/10/16 9:44 AM, Stephen Frost wrote: > > Hrmmm. If that's the case then perhaps you're right. I liked the > > general idea of not having to maintain a TCP connection during the > > entire backup (TCP connections can be annoyingly finicky in certain > > environments...), but I'm not sure it's worth a lot of additional > > complexity. > > Well, pgBackRest holds a connection to PostgreSQL through the entire > backup and will abort the backup if it is severed. The connection is > always held locally, though, even if the master backup process is on a > different server. I haven't gotten any reports of aborted backups due > to the connection failing. Yeah, I know, I had been thinking it might be nice to not do that at some point in the future, but thinking on it further, we've already got a "pick up where you left off" capability with pgbackrest, so it's really not a huge deal if the backup fails and has to be restarted, and this does remove the need (or at least deprecate) to use the "stop an already-running backup if you find one" option that pgbackrest has. Thanks! Stephen
On 2/10/16 7:46 AM, Magnus Hagander wrote: > Per discussion at the developer meeting in Brussels, here's a patch that > makes some updates to the backup APIs, to support non-exclusive backups > without using pg_basebackup. <...> This sounds like a great idea and I have signed up to review. > * A new version of pg_stop_backup is created, taking an argument > specifying if it's exclusive or not. The current version of > pg_stop_backup() continues to work for exclusive backups, with no > changes to behavior. The new pg_stop_backup will return a record of > three columns instead of just the value -- the LSN (pglsn), the backup > label file (text) and the tablespace map file (text). If used to cancel > an exclusive backup, backup label file and tablespace map will be NULL. > At this point it's up to the backup software to write down the files in > the location of the backup. It would be nice if this new pg_stop_backup() function also output the exact time when the backup stopped. Depending on how long pg_stop_backup() waits for WAL to be archived a time-stamp recorded after pg_stop_backup() returns can be pretty far off. This information is handy for automating selection of a backup when doing time-based PITR (or so the user can manually select). For exclusive backups it is possible to parse the .backup file to get this information but non-exclusive backups do not output the .backup file. I would be happy to see the time-stamp returned from the pg_start_backup() function as well. It's a bigger change, but once pg_start_backup() returns multiple columns it will be easier to add more columns in the future. In fact, it might be better if backup_label and tablespace_map were output by pg_start_backup(). This would give the backup software more information to work with from the start. Thanks! -- -David david@pgmasters.net
On 2/10/16 7:46 AM, Magnus Hagander wrote:
> Per discussion at the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. <...>
This sounds like a great idea and I have signed up to review.
> * A new version of pg_stop_backup is created, taking an argument
> specifying if it's exclusive or not. The current version of
> pg_stop_backup() continues to work for exclusive backups, with no
> changes to behavior. The new pg_stop_backup will return a record of
> three columns instead of just the value -- the LSN (pglsn), the backup
> label file (text) and the tablespace map file (text). If used to cancel
> an exclusive backup, backup label file and tablespace map will be NULL.
> At this point it's up to the backup software to write down the files in
> the location of the backup.
It would be nice if this new pg_stop_backup() function also output the
exact time when the backup stopped. Depending on how long
pg_stop_backup() waits for WAL to be archived a time-stamp recorded
after pg_stop_backup() returns can be pretty far off.
This information is handy for automating selection of a backup when
doing time-based PITR (or so the user can manually select). For
exclusive backups it is possible to parse the .backup file to get this
information but non-exclusive backups do not output the .backup file.
I would be happy to see the time-stamp returned from the
pg_start_backup() function as well. It's a bigger change, but once
pg_start_backup() returns multiple columns it will be easier to add more
columns in the future.
In fact, it might be better if backup_label and tablespace_map were
output by pg_start_backup(). This would give the backup software more
information to work with from the start.
On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote: > > I would be happy to see the time-stamp returned from the > > pg_start_backup() function as well. It's a bigger change, but once > > pg_start_backup() returns multiple columns it will be easier to add more > > columns in the future. > > > > In fact, it might be better if backup_label and tablespace_map were > > output by pg_start_backup(). This would give the backup software more > > information to work with from the start. > > > > I was trying to figure out why that's a bad idea, but I can't really say > why :) > > For pg_basebackup backups, I think the reason we put it at the end is > simply that we don't want to write it into the backup directory until the > rest of the backup is complete, since it's not going to be useful before > then. But that requirement can certainly be put on the backup client > instead of the server. With the newer API it's going to have to keep those > things around anyway. > > That would then change the function syntax for pg_start_backup() but > instead make pg_stop_backup() now behave the same as it did before. > > Would anybody else like to comment on if that's a good idea or if there are > any holes in it? :) I don't think that's a good idea. It makes it impossible to add information to labels about the minimum recovery point and such. Currently there's some rather fragile logic to discover that, but I'd really like to get rid of that at some point. Regards, Andres
On 2/10/16 11:01 AM, Andres Freund wrote: > On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote: >>> I would be happy to see the time-stamp returned from the >>> pg_start_backup() function as well. It's a bigger change, but once >>> pg_start_backup() returns multiple columns it will be easier to add more >>> columns in the future. >>> >>> In fact, it might be better if backup_label and tablespace_map were >>> output by pg_start_backup(). This would give the backup software more >>> information to work with from the start. >> >> I was trying to figure out why that's a bad idea, but I can't really say >> why :) >> >> For pg_basebackup backups, I think the reason we put it at the end is >> simply that we don't want to write it into the backup directory until the >> rest of the backup is complete, since it's not going to be useful before >> then. But that requirement can certainly be put on the backup client >> instead of the server. With the newer API it's going to have to keep those >> things around anyway. >> >> That would then change the function syntax for pg_start_backup() but >> instead make pg_stop_backup() now behave the same as it did before. >> >> Would anybody else like to comment on if that's a good idea or if there are >> any holes in it? :) > > I don't think that's a good idea. It makes it impossible to add > information to labels about the minimum recovery point and > such. Currently there's some rather fragile logic to discover that, but > I'd really like to get rid of that at some point. That makes sense. The backup_label "as is" could be output at the beginning but if we want to add the minimum recovery point it would need to be output at the end. It seems like tablespace_map could still be output at the beginning, though. -- -David david@pgmasters.net
On 2/10/16 10:50 AM, Magnus Hagander wrote: > On Wed, Feb 10, 2016 at 4:38 PM, David Steele <david@pgmasters.net > > This information is handy for automating selection of a backup when > doing time-based PITR (or so the user can manually select). For > exclusive backups it is possible to parse the .backup file to get this > information but non-exclusive backups do not output the .backup file. > > > The non-exclusive backups *do* output the backup_label file, it just > shows up as a text field instead of as a separate file. You're supposed > to write that data to a file in the backup program. I meant the .backup file (e.g. 000000010000008C0000001A.00000028.backup) that is archived along with the WAL for an exlcusive backup. I believe this is currently the only place to get the stop time (without reading the WAL segments). -- -David david@pgmasters.net
On 2016-02-10 11:06:01 -0500, David Steele wrote: > That makes sense. The backup_label "as is" could be output at the > beginning but if we want to add the minimum recovery point it would need > to be output at the end. > > It seems like tablespace_map could still be output at the beginning, though. I don't really see enough benefit to go that way. What are you thinking of using the information for ("This would give the backup software more information to work with from the start.")? Andres
On 2/10/16 11:12 AM, Andres Freund wrote: > On 2016-02-10 11:06:01 -0500, David Steele wrote: >> That makes sense. The backup_label "as is" could be output at the >> beginning but if we want to add the minimum recovery point it would need >> to be output at the end. >> >> It seems like tablespace_map could still be output at the beginning, though. > > I don't really see enough benefit to go that way. What are you thinking > of using the information for ("This would give the backup software more > information to work with from the start.")? Currently backup software that handles tablespaces needs to read the pg_tblspc directory to build an oid/path mapping in order to know which tablespace directories to copy. Since Postgres is already doing this when it creates tablespace_map it seems like it's redundant for the backup software to do it again. Since tablespace_map is created in do_pg_start_backup() and I don't see how it could be updated after that, I think it is logical to output it from pg_start_backup(). I don't feel strongly about it, though. -- -David david@pgmasters.net
Hi Magnus, thanks for working on this topic. What it does is very similar to what Barman's pgespresso extension does and I'd like to see it implemented soon in the core. I've added myself as reviewer for the patch on commitfest site. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Hi Magnus, I've finally found some time to take a look to the patch. It applies with some fuzziness on master, but the result looks correct. Unfortunately the OID of the new pg_stop_backup function conflicts with "pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67). After changing it the patch does not compile: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -O0 -g -fno-omit-frame-pointer -I../../../../src/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2 -I/usr/local/include -I/usr/local/opt/openssl/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po xlog.c:10000:19: error: use of undeclared identifier 'tblspc_mapfbuf'; did you mean 'tblspcmapfile'? initStringInfo(&tblspc_mapfbuf); ^~~~~~~~~~~~~~ tblspcmapfile xlog.c:9790:19: note: 'tblspcmapfile' declared here StringInfo tblspcmapfile, bool infotbssize, ^ xlog.c:10073:22: error: use of undeclared identifier 'tblspc_mapfbuf'; did you mean 'tblspcmapfile'? appendStringInfo(&tblspc_mapfbuf, "%s %s\n", ti->oid, ti->path); ^~~~~~~~~~~~~~ tblspcmapfile xlog.c:9790:19: note: 'tblspcmapfile' declared here StringInfo tblspcmapfile, bool infotbssize, ^ xlog.c:10092:19: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? initStringInfo(&labelfbuf); ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10099:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "START WAL LOCATION: %X/%X (file %s)\n", ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10101:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "CHECKPOINT LOCATION: %X/%X\n", ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10103:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n", ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10105:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "BACKUP FROM: %s\n", ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10107:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "START TIME: %s\n", strfbuf); ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10108:21: error: use of undeclared identifier 'labelfbuf'; did you mean 'labelfile'? appendStringInfo(&labelfbuf, "LABEL: %s\n", backupidstr); ^~~~~~~~~ labelfile xlog.c:9789:19: note: 'labelfile' declared here StringInfo labelfile, DIR *tblspcdir, List **tablespaces, ^ xlog.c:10142:15: error: use of undeclared identifier 'labelfbuf' if (fwrite(labelfbuf.data, labelfbuf.len, 1, fp) != 1 || ^ xlog.c:10142:31: error: use of undeclared identifier 'labelfbuf' if (fwrite(labelfbuf.data, labelfbuf.len, 1, fp) != 1 || ^ xlog.c:10151:10: error: use of undeclared identifier 'labelfbuf' pfree(labelfbuf.data); ^ xlog.c:10154:8: error: use of undeclared identifier 'tblspc_mapfbuf' if (tblspc_mapfbuf.len > 0) ^ xlog.c:10178:16: error: use of undeclared identifier 'tblspc_mapfbuf' if (fwrite(tblspc_mapfbuf.data, tblspc_mapfbuf.len, 1, fp) != 1 || ^ xlog.c:10178:37: error: use of undeclared identifier 'tblspc_mapfbuf' if (fwrite(tblspc_mapfbuf.data, tblspc_mapfbuf.len, 1, fp) != 1 || ^ xlog.c:10189:10: error: use of undeclared identifier 'tblspc_mapfbuf' pfree(tblspc_mapfbuf.data); ^ xlog.c:10193:17: error: use of undeclared identifier 'labelfbuf' *labelfile = labelfbuf.data; ^ xlog.c:10194:8: error: use of undeclared identifier 'tblspc_mapfbuf' if (tblspc_mapfbuf.len > 0) ^ xlog.c:10195:22: error: use of undeclared identifier 'tblspc_mapfbuf' *tblspcmapfile = tblspc_mapfbuf.data; ^ 19 errors generated. make[4]: *** [xlog.o] Error 1 make[3]: *** [transam-recursive] Error 2 make[2]: *** [access-recursive] Error 2 make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 I've searched in past commits for any recent change that involves the affected lines, but I have not found any. Maybe some changes are missing? Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Hi Magnus, On 3/2/16 12:49 PM, Marco Nenciarini wrote: > I've finally found some time to take a look to the patch. > > It applies with some fuzziness on master, but the result looks correct. > Unfortunately the OID of the new pg_stop_backup function conflicts with > "pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67). > > After changing it the patch does not compile: It's been two weeks since Marco found these issues in the patch. Please provide an updated patch soon or I will mark this "returned with feedback". Thanks, -- -David david@pgmasters.net
Hi Magnus,
On 3/2/16 12:49 PM, Marco Nenciarini wrote:
> I've finally found some time to take a look to the patch.
>
> It applies with some fuzziness on master, but the result looks correct.
> Unfortunately the OID of the new pg_stop_backup function conflicts with
> "pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).
>
> After changing it the patch does not compile:
It's been two weeks since Marco found these issues in the patch. Please
provide an updated patch soon or I will mark this "returned with feedback".
Hi Magnus,
I've finally found some time to take a look to the patch.
It applies with some fuzziness on master, but the result looks correct.
Unfortunately the OID of the new pg_stop_backup function conflicts with
"pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).
After changing it the patch does not compile:
gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -O0 -g -fno-omit-frame-pointer
-I../../../../src/include
-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2
-I/usr/local/include -I/usr/local/opt/openssl/include -c -o xlog.o
xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:10000:19: error: use of undeclared identifier 'tblspc_mapfbuf';
Attachment
Hi Magnus, On 3/19/16 8:15 AM, Magnus Hagander wrote: > I've attached an updated patch, which is rebased on current master and > includes the oid fix. Before doing a thorough review of this patch there are a few points I would like to consider: * I think it's really important to provide the stop time in some fashion when using this new technique. I would prefer a new column to be returned from pg_stop_backup() but I could live with STOP TIME being recorded in the label file. STOP TIME should probably be included in the label file anyway. * It seems like STOP WAL LOCATION should now also be recorded in the label file. Preferably this would used by recovery to determine when the database has reach consistency but that could be a future patch. I'm not very happy with the current method of using pg_control to get this information as it assumes that pg_control is copied last (at least based on the code comments). Thanks, -- -David david@pgmasters.net
On 3/19/16 8:15 AM, Magnus Hagander wrote:
> I've attached an updated patch, which is rebased on current master and
> includes the oid fix.
Before doing a thorough review of this patch there are a few points I
would like to consider:
* I think it's really important to provide the stop time in some fashion
when using this new technique. I would prefer a new column to be
returned from pg_stop_backup() but I could live with STOP TIME being
recorded in the label file. STOP TIME should probably be included in
the label file anyway.
* It seems like STOP WAL LOCATION should now also be recorded in the
label file. Preferably this would used by recovery to determine when
the database has reach consistency but that could be a future patch.
I'm not very happy with the current method of using pg_control to get
this information as it assumes that pg_control is copied last (at least
based on the code comments).
On 3/22/16 12:14 PM, Magnus Hagander wrote: > On Tue, Mar 22, 2016 at 5:08 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > On 3/19/16 8:15 AM, Magnus Hagander wrote: > > > I've attached an updated patch, which is rebased on current master and > > includes the oid fix. > > Before doing a thorough review of this patch there are a few points I > would like to consider: > > * I think it's really important to provide the stop time in some fashion > when using this new technique. I would prefer a new column to be > returned from pg_stop_backup() but I could live with STOP TIME being > recorded in the label file. STOP TIME should probably be included in > the label file anyway. > > Adding the stop time column should be a simple addition and I don't see > a problem with that. I think I misunderstood your original request on > that. Because you are just talking about returning a timestamptz with > the "right now" value for when you called pg_stop_backup()? Or to be > specific, just before pg_Stop_backup *finished*. Or do you mean when > pg_stop_backup() started? What would be ideal is the minimum time that could be used for PITR. In an exclusive backup that's the time the end-of-backup record is written to WAL. In a non-exlusive backup I'm not quite sure how that works. > Doing it in the backup label file is obviously a different target, where > we might need to consider backwards compatibility, Should we? Physical backups can only be restored in the same version so I'm not sure why it would be a problem? Do you mean for programs outside of Postgres that are parsing this file? > * It seems like STOP WAL LOCATION should now also be recorded in the > label file. Preferably this would used by recovery to determine when > the database has reach consistency but that could be a future patch. > I'm not very happy with the current method of using pg_control to get > this information as it assumes that pg_control is copied last (at least > based on the code comments). > > That seems entirely out of scope for this patch, though. Doesn't mean it > shouldn't be done, but that's a separate thing. Fair enough. -- -David david@pgmasters.net
On 3/22/16 12:14 PM, Magnus Hagander wrote:
> On Tue, Mar 22, 2016 at 5:08 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> On 3/19/16 8:15 AM, Magnus Hagander wrote:
>
> > I've attached an updated patch, which is rebased on current master and
> > includes the oid fix.
>
> Before doing a thorough review of this patch there are a few points I
> would like to consider:
>
> * I think it's really important to provide the stop time in some fashion
> when using this new technique. I would prefer a new column to be
> returned from pg_stop_backup() but I could live with STOP TIME being
> recorded in the label file. STOP TIME should probably be included in
> the label file anyway.
>
> Adding the stop time column should be a simple addition and I don't see
> a problem with that. I think I misunderstood your original request on
> that. Because you are just talking about returning a timestamptz with
> the "right now" value for when you called pg_stop_backup()? Or to be
> specific, just before pg_Stop_backup *finished*. Or do you mean when
> pg_stop_backup() started?
What would be ideal is the minimum time that could be used for PITR. In
an exclusive backup that's the time the end-of-backup record is written
to WAL. In a non-exlusive backup I'm not quite sure how that works.
> Doing it in the backup label file is obviously a different target, where
> we might need to consider backwards compatibility, Should we?
Physical backups can only be restored in the same version so I'm not
sure why it would be a problem? Do you mean for programs outside of
Postgres that are parsing this file?
Doing it in the backup label file is obviously a different target, where we might need to consider backwards compatibility, Should we?
On 3/22/16 12:31 PM, Magnus Hagander wrote: > On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote:> > > Adding the stop time column should be a simple addition and I don't see > > a problem with that. I think I misunderstood your original request on > > that. Because you are just talking about returning a timestamptz with > > the "right now" value for when you called pg_stop_backup()? Or to be > > specific, just before pg_Stop_backup *finished*. Or do you mean when > > pg_stop_backup() started? > > What would be ideal is the minimum time that could be used for PITR. In > an exclusive backup that's the time the end-of-backup record is written > to WAL. In a non-exlusive backup I'm not quite sure how that works. I guess I was hoping that you would know. I fine with just getting the current timestamp as is currently done in do_pg_stop_backup(). It's not perfect but it will be pretty close. I thought some more about putting STOP_WAL_LOCATION into the backup label and I think this is an important step. Without that the recovery process needs to use pg_control to determine when the database has reach consistency and that will only work if pg_control was copied last. In summary, I think the patch should be updated to include stop_time as a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label. The recovery process should read STOP_WAL_LOCATION from backup_label and use that to decide when the database has become consistent. Thanks, -- -David david@pgmasters.net
On 3/22/16 12:31 PM, Magnus Hagander wrote:On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net>
<mailto:david@pgmasters.net>> wrote:> Adding the stop time column should be a simple addition and I don't see
> a problem with that. I think I misunderstood your original request on
> that. Because you are just talking about returning a timestamptz with
> the "right now" value for when you called pg_stop_backup()? Or to be
> specific, just before pg_Stop_backup *finished*. Or do you mean when
> pg_stop_backup() started?
What would be ideal is the minimum time that could be used for PITR. In
an exclusive backup that's the time the end-of-backup record is written
to WAL. In a non-exlusive backup I'm not quite sure how that works.
I guess I was hoping that you would know. I fine with just getting the current timestamp as is currently done in do_pg_stop_backup(). It's not perfect but it will be pretty close.
I thought some more about putting STOP_WAL_LOCATION into the backup label and I think this is an important step. Without that the recovery process needs to use pg_control to determine when the database has reach consistency and that will only work if pg_control was copied last.
In summary, I think the patch should be updated to include stop_time as a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label. The recovery process should read STOP_WAL_LOCATION from backup_label and use that to decide when the database has become consistent.
I was chatting with Stephen about this earlier, and one thing we considered was that maybe we should return pg_control as a bytea field from pg_stop_backup() thereby strongly hinting to tools that they should write it out from there rather than copy it from the data directory (we can't stop them from copying it from the data directory like we do for backup_label of course, but if we return the contents and document that's how it should be done, it's pretty clear).
On Tue, Mar 29, 2016 at 4:36 PM, David Steele <david@pgmasters.net> wrote:On 3/22/16 12:31 PM, Magnus Hagander wrote:On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net>
<mailto:david@pgmasters.net>> wrote:> Adding the stop time column should be a simple addition and I don't see
> a problem with that. I think I misunderstood your original request on
> that. Because you are just talking about returning a timestamptz with
> the "right now" value for when you called pg_stop_backup()? Or to be
> specific, just before pg_Stop_backup *finished*. Or do you mean when
> pg_stop_backup() started?
What would be ideal is the minimum time that could be used for PITR. In
an exclusive backup that's the time the end-of-backup record is written
to WAL. In a non-exlusive backup I'm not quite sure how that works.
I guess I was hoping that you would know. I fine with just getting the current timestamp as is currently done in do_pg_stop_backup(). It's not perfect but it will be pretty close.
I thought some more about putting STOP_WAL_LOCATION into the backup label and I think this is an important step. Without that the recovery process needs to use pg_control to determine when the database has reach consistency and that will only work if pg_control was copied last.
In summary, I think the patch should be updated to include stop_time as a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label. The recovery process should read STOP_WAL_LOCATION from backup_label and use that to decide when the database has become consistent.Is that the only reason we need pg_control to be copied last? I thought there were other reasons for that.
I was chatting with Stephen about this earlier, and one thing we considered was that maybe we should return pg_control as a bytea field from pg_stop_backup() thereby strongly hinting to tools that they should write it out from there rather than copy it from the data directory (we can't stop them from copying it from the data directory like we do for backup_label of course, but if we return the contents and document that's how it should be done, it's pretty clear).But if we can actually remove the whole requirement on the order of the copy of pg_control by doing that it certainly seems worthwhile.So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...
On 3/29/16 2:09 PM, Magnus Hagander wrote: > I had a chat with Heikki, and here's another suggestion: > > 1. We don't touch the current exclusive backups at all, as previously > discussed, other than deprecating their use. For backwards compat. > > 2. For new backups, we return the contents of pg_control as a bytea from > pg_stop_backup(). We tell backup programs they are supposed to write > this out as pg_control.backup, *not* as pg_control. > > 3a. On recovery, if it's an exclusive backup, we do as we did before. > > 3b. on recovery, in non-exclusive backups (determined from > backup_label), we check that pg_control.backup exists *and* that > pg_control does *not* exist. That guards us reasonably against backup > programs that do the wrong thing, and we know we get the correct version > of pg_control. > > 4. (we can still add the stop location to the backup_label file in case > backup programs find it useful, but we don't use it in recovery) > > Thoughts about this approach? This certainly looks like it would work but it raises the barrier for implementing backups by quite a lot. It's fine for backrest or barman but it won't be pleasant for anyone who has home-grown scripts. -- -David david@pgmasters.net
On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander <magnus@hagander.net> wrote:So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...I had a chat with Heikki, and here's another suggestion:1. We don't touch the current exclusive backups at all, as previously discussed, other than deprecating their use. For backwards compat.2. For new backups, we return the contents of pg_control as a bytea from pg_stop_backup(). We tell backup programs they are supposed to write this out as pg_control.backup, *not* as pg_control.3a. On recovery, if it's an exlcusive backup, we do as we did before.3b. on recovery, in non-exclusive backups (determined from backup_label), we check that pg_control.backup exists *and* that pg_control does *not* exist.
On 3/29/16 2:09 PM, Magnus Hagander wrote:
> I had a chat with Heikki, and here's another suggestion:
>
> 1. We don't touch the current exclusive backups at all, as previously
> discussed, other than deprecating their use. For backwards compat.
>
> 2. For new backups, we return the contents of pg_control as a bytea from
> pg_stop_backup(). We tell backup programs they are supposed to write
> this out as pg_control.backup, *not* as pg_control.
>
> 3a. On recovery, if it's an exclusive backup, we do as we did before.
>
> 3b. on recovery, in non-exclusive backups (determined from
> backup_label), we check that pg_control.backup exists *and* that
> pg_control does *not* exist. That guards us reasonably against backup
> programs that do the wrong thing, and we know we get the correct version
> of pg_control.
>
> 4. (we can still add the stop location to the backup_label file in case
> backup programs find it useful, but we don't use it in recovery)
>
> Thoughts about this approach?
This certainly looks like it would work but it raises the barrier for
implementing backups by quite a lot. It's fine for backrest or barman
but it won't be pleasant for anyone who has home-grown scripts.
On Tue, Mar 29, 2016 at 11:39 PM, Magnus Hagander <magnus@hagander.net> wrote:On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander <magnus@hagander.net> wrote:So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...I had a chat with Heikki, and here's another suggestion:1. We don't touch the current exclusive backups at all, as previously discussed, other than deprecating their use. For backwards compat.2. For new backups, we return the contents of pg_control as a bytea from pg_stop_backup(). We tell backup programs they are supposed to write this out as pg_control.backup, *not* as pg_control.3a. On recovery, if it's an exlcusive backup, we do as we did before.3b. on recovery, in non-exclusive backups (determined from backup_label), we check that pg_control.backup exists *and* that pg_control does *not* exist.Currently pg_control has been read before backup_label file, so as per this proposal do you want to change that? If yes, I think that will make this patch more invasive with respect to handling of failure modes. Also as David points out, I also feel that it will raise the bar for usage of this API.
On 3/30/16 4:18 AM, Magnus Hagander wrote: > > On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: > > This certainly looks like it would work but it raises the barrier for > implementing backups by quite a lot. It's fine for backrest or barman > but it won't be pleasant for anyone who has home-grown scripts. > > > How much does it really raise the bar, though? > > It would go from "copy all files and make damn sure you copy pg_control > last, and rename it to pg_control.backup" to "take this binary blob you > got from the server and write it to pg_control.backup"? > > Also, the target of these APIs is specifically the backup tools and not > homewritten scripts. Then what would home-grown scripts use, the deprecated API that we know has issues? > A simple shellscript will have trouble enough using > it in the first place since it requires a persistent connection to the > database. All that's required is to spawn a psql process. I'm no shell expert but that's simple enough. > But those scripts are likely broken anyway. Yes, probably. Backup and especially archiving correctly are harder than most people realize. > <...> > > The main reason for Heikki to suggest this one over the other basic one > is that it brings protection against the "backup script/program crashed > halfway through but the user still tried to restore from that". They will > outright fail because there is no pg_control.backup in that case. But if we are going to make this complicated I'm not sure it's a big deal just to require pg_control to be copied last. pgBackRest already does that and Barman probably does, too. I don't see "don't copy pg_control" and "copy pg_control last" as all that different in terms of complexity. pgBackRest also *restores* pg_control last which I think is pretty important and would not be addressed by this solution. > If we > don't care about that, then we can go back to just saying "copy > pg_control last and we're done". But you yourself complained about that > requirement because it's too easy to get wrong (though you advocated > using backup_label to transfer the data over -- but that has the > potential for getting more complicated if we now or at any point in the > future want more than one field to transfer, for example). Perhaps. I'm still not convinced that getting some information from backup_label and other information from pg_control is a good solution. I would rather write the recovery point into the backup_label and use that instead of the value in pg_control. Text files are much easier to parse and push around accurately (and test). -- -David david@pgmasters.net
On 3/30/16 4:18 AM, Magnus Hagander wrote:
>
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> This certainly looks like it would work but it raises the barrier for
> implementing backups by quite a lot. It's fine for backrest or barman
> but it won't be pleasant for anyone who has home-grown scripts.
>
>
> How much does it really raise the bar, though?
>
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
>
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.
Then what would home-grown scripts use, the deprecated API that we know
has issues?
> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database.
All that's required is to spawn a psql process. I'm no shell expert but
that's simple enough.
> But those scripts are likely broken anyway.
Yes, probably. Backup and especially archiving correctly are harder
than most people realize.
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.
But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last. pgBackRest already
does that and Barman probably does, too.
I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.
pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.
> If we
> don't care about that, then we can go back to just saying "copy
> pg_control last and we're done". But you yourself complained about that
> requirement because it's too easy to get wrong (though you advocated
> using backup_label to transfer the data over -- but that has the
> potential for getting more complicated if we now or at any point in the
> future want more than one field to transfer, for example).
Perhaps. I'm still not convinced that getting some information from
backup_label and other information from pg_control is a good solution.
I would rather write the recovery point into the backup_label and use
that instead of the value in pg_control. Text files are much easier to
parse and push around accurately (and test).
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:On 3/30/16 4:18 AM, Magnus Hagander wrote:
>
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
> This certainly looks like it would work but it raises the barrier for
> implementing backups by quite a lot. It's fine for backrest or barman
> but it won't be pleasant for anyone who has home-grown scripts.
>
>
> How much does it really raise the bar, though?
>
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
>
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.
Then what would home-grown scripts use, the deprecated API that we know
has issues?They should use either pg_basebackup/pg_receivexlog, or they should use a tool like backrest or barman.> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database.
All that's required is to spawn a psql process. I'm no shell expert but
that's simple enough.Oh, it's certainly doable, but you also need to come back and talk to it at a later time (to call stop backup), and do something useful with a multiline output. None of that is particularly easy. Certainly doable, but it becomes the wrong tool for the job.> But those scripts are likely broken anyway.
Yes, probably. Backup and especially archiving correctly are harder
than most people realize.Exactly. Which is why we should discourage people from doing that, and instead point them to the tools that do the job right. This is the whole reason we're making this change in the first place.> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.
But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last. pgBackRest already
does that and Barman probably does, too.It does (I did doublecheck that at some point).I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.
pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).We can always extend the function with more columns later if we need - it's changing the ones we have that's a big problem there :)
On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote:I've attached an updated patch, which is rebased on current master and includes the oid fix.
+ <entry>Finish performing exclusive on-line backup (restricted to superusers or replication roles)</entry>
+ </row>
+ <row>
+ <entry>
+ <literal><function>pg_stop_backup(<parameter>exclusive</> <type>boolean</>)</function></literal>
+ </entry>
+ <entry><type>setof record</type></entry>
+ <entry>Finish performing exclusive or non-exclusive on-line backup (restricted to superusers or replication roles)</entry>
<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
CREATE OR REPLACE FUNCTION
- pg_start_backup(label text, fast boolean DEFAULT false)
+ pg_start_backup(label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true)
RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
On Sat, Mar 19, 2016 at 5:45 PM, Magnus Hagander <magnus@hagander.net> wrote:On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote:I've attached an updated patch, which is rebased on current master and includes the oid fix.+ <entry>Finish performing exclusive on-line backup (restricted to superusers or replication roles)</entry>
+ </row>
+ <row>
+ <entry>
+ <literal><function>pg_stop_backup(<parameter>exclusive</> <type>boolean</>)</function></literal>
+ </entry>
+ <entry><type>setof record</type></entry>
+ <entry>Finish performing exclusive or non-exclusive on-line backup (restricted to superusers or replication roles)</entry>
Isn't it better to indicate that user needs to form a backup_label and tablespace_map file from the output of this API and those needs to be dropped into data directory?
Also, I think below part of documentation for pg_start_backup() needs to be modified:<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
Similarly, there is a description for pg_stop_backup which needs to be modified.
CREATE OR REPLACE FUNCTION
- pg_start_backup(label text, fast boolean DEFAULT false)
+ pg_start_backup(label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true)
RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
One thing, that might be slightly inconvenient for user is if he or she wants to use this API for non-exclusive backups then, they need to pass the value of second parameter as well which doesn't seem to be a big issue.
On 3/19/16 8:15 AM, Magnus Hagander wrote: > I've attached an updated patch, which is rebased on current master and > includes the oid fix. I've now had a chance to go through this in detail and test thoroughly. We had a lot of discussion about copying pg_controllast and that led me to believe that there might be some sort of race condition with multiple backups running concurrently. I tried every trick I could think of and everything worked as expected so as far as I can see this requirement only applies to backups taken from a standby which agrees with the comments in do_pg_stop_backup(). Note that I only tested backups against a master database since that's what is modified by this patch. I'm still not entirely comfortable with how backups against a standby are done and it that case still think it would be best to write the stop wal location into backup_label, but as you said before that would be a completely separate patch. So in the end I'm fine with the code as it stands with the exception of the pg_stop_backup2() naming. I'd prefer a more verbose name here but I'm unable to come up with anything very good. How about pg_stop_backup_v2()? This function could also use a few more comments, at least at the top of the exclusive and non-exclusive blocks. Except for those minor details I believe this is "ready for committer" unless anybody has an objection. -- -David david@pgmasters.net
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:Also, I think below part of documentation for pg_start_backup() needs to be modified:<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
That one definitely needs to be fixed, as it's part of the reference. Good spot.Similarly, there is a description for pg_stop_backup which needs to be modified.That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?
<para>
<function>pg_stop_backup</> removes the label file and, if it exists,
the <filename>tablespace_map</> file created by
<function>pg_start_backup</>, and creates a backup history file in
the transaction log archive area. The history file includes the label given to
<function>pg_start_backup</>, the starting and ending transaction log locations for
the backup, and the starting and ending times of the backup. The return
value is the backup's ending transaction log location (which again
can be ignored). After recording the ending location, the current
transaction log insertion
point is automatically advanced to the next transaction log file, so that the
ending transaction log file can be archived immediately to complete the backup.
</para>
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:Also, I think below part of documentation for pg_start_backup() needs to be modified:<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
That one definitely needs to be fixed, as it's part of the reference. Good spot.Similarly, there is a description for pg_stop_backup which needs to be modified.That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?I am referring to below part of docs in func.sgml<para>
<function>pg_stop_backup</> removes the label file and, if it exists,
the <filename>tablespace_map</> file created by
<function>pg_start_backup</>, and creates a backup history file in
the transaction log archive area. The history file includes the label given to
<function>pg_start_backup</>, the starting and ending transaction log locations for
the backup, and the starting and ending times of the backup. The return
value is the backup's ending transaction log location (which again
can be ignored). After recording the ending location, the current
transaction log insertion
point is automatically advanced to the next transaction log file, so that the
ending transaction log file can be archived immediately to complete the backup.
</para>
Attachment
On 4/5/16 8:05 AM, Magnus Hagander wrote: > PFA a better one (I think), also with the rename and added comments that > David was asking for. > > Barring objections, I will apply this version. This version looks good to me. -- -David david@pgmasters.net
On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:Also, I think below part of documentation for pg_start_backup() needs to be modified:<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
That one definitely needs to be fixed, as it's part of the reference. Good spot.Similarly, there is a description for pg_stop_backup which needs to be modified.That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?I am referring to below part of docs in func.sgml<para>
<function>pg_stop_backup</> removes the label file and, if it exists,
the <filename>tablespace_map</> file created by
<function>pg_start_backup</>, and creates a backup history file in
the transaction log archive area. The history file includes the label given to
<function>pg_start_backup</>, the starting and ending transaction log locations for
the backup, and the starting and ending times of the backup. The return
value is the backup's ending transaction log location (which again
can be ignored). After recording the ending location, the current
transaction log insertion
point is automatically advanced to the next transaction log file, so that the
ending transaction log file can be archived immediately to complete the backup.
</para>
Ah, gotcha. Clearly I wasn't paying attention properly.PFA a better one (I think), also with the rename and added comments that David was asking for.
On Tue, Apr 5, 2016 at 5:35 PM, Magnus Hagander <magnus@hagander.net> wrote:On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:Also, I think below part of documentation for pg_start_backup() needs to be modified:<para>
<function>pg_start_backup</> accepts an
arbitrary user-defined label for the backup. (Typically this would be
the name under which the backup dump file will be stored.) The function
writes a backup label file (<filename>backup_label</>) and, if there
are any links in the <filename>pg_tblspc/</> directory, a tablespace map
file (<filename>tablespace_map</>) into the database cluster's data
directory, performs a checkpoint, and then returns the backup's starting
transaction log location as text. The user can ignore this result value,
but it is provided in case it is useful.
That one definitely needs to be fixed, as it's part of the reference. Good spot.Similarly, there is a description for pg_stop_backup which needs to be modified.That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?I am referring to below part of docs in func.sgml<para>
<function>pg_stop_backup</> removes the label file and, if it exists,
the <filename>tablespace_map</> file created by
<function>pg_start_backup</>, and creates a backup history file in
the transaction log archive area. The history file includes the label given to
<function>pg_start_backup</>, the starting and ending transaction log locations for
the backup, and the starting and ending times of the backup. The return
value is the backup's ending transaction log location (which again
can be ignored). After recording the ending location, the current
transaction log insertion
point is automatically advanced to the next transaction log file, so that the
ending transaction log file can be archived immediately to complete the backup.
</para>
Ah, gotcha. Clearly I wasn't paying attention properly.PFA a better one (I think), also with the rename and added comments that David was asking for.This version looks good. Apart from review, I have done some tests (including regression test), everything seems to be fine.
On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote: > I've pushed this version, and also added the item from the Brussels > developer meeting to actually rewrite the main backup docs to the open > items so they are definitely not forgotten for 9.6. Here's that PostgreSQL 9.6 open item: * Update of backup documentation (assigned to Bruce at [https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_MeetingBrussels Developer Meeting], but others are surely allowedto work on it as well) ** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the current documentation is now incorrect Unless Bruce endorsed this development strategy, I think it unfair for commit 7117685 to impose a deadline on his backup.sgml project. Did commit 7117685 indeed make the documentation "incorrect?" Coverage in the Backup and Restore chapter would be a good thing, but I don't think that gap alone makes 7117685, with its reference-only documentation, incorrect. Did 7117685 impair documentation in any other respect? Incidentally, I'm not clear on the extent of Bruce's plans to change backup documentation. Relevant meeting note fragment: Magnus: We need a new robust API fornon-exclusive backups Simon: Keep but deprecate the existing API. Need to find a betterway to ensure users have the required xlog in backups Craig: Our docs are in the wrong order. pg_basebackup shouldbe first, ahead of manual methods. Action point: Re-arrange backup docs page (Bruce) The chapter already does describe pg_basebackup before describing pg_start_backup; what else did the plan entail? Thanks, nm
The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?
On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> I've pushed this version, and also added the item from the Brussels
> developer meeting to actually rewrite the main backup docs to the open
> items so they are definitely not forgotten for 9.6.
Here's that PostgreSQL 9.6 open item:
* Update of backup documentation (assigned to Bruce at [https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting Brussels Developer Meeting], but others are surely allowed to work on it as well)
** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the current documentation is now incorrect
Unless Bruce endorsed this development strategy, I think it unfair for commit
7117685 to impose a deadline on his backup.sgml project. Did commit 7117685
indeed make the documentation "incorrect?" Coverage in the Backup and Restore
chapter would be a good thing, but I don't think that gap alone makes 7117685,
with its reference-only documentation, incorrect. Did 7117685 impair
documentation in any other respect?
Incidentally, I'm not clear on the extent of Bruce's plans to change backup
documentation. Relevant meeting note fragment:
Magnus: We need a new robust API fornon-exclusive backups
Simon: Keep but deprecate the existing API.
Need to find a better way to ensure users have the required xlog in backups
Craig: Our docs are in the wrong order. pg_basebackup should be first, ahead of manual methods.
Action point: Re-arrange backup docs page (Bruce)
The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?
On Wed, Apr 06, 2016 at 09:17:22AM +0200, Magnus Hagander wrote: > On Wed, Apr 6, 2016 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote: > > On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote: > > > I've pushed this version, and also added the item from the Brussels > > > developer meeting to actually rewrite the main backup docs to the open > > > items so they are definitely not forgotten for 9.6. > > > > Here's that PostgreSQL 9.6 open item: > > > > * Update of backup documentation (assigned to Bruce at [ > > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting > > Brussels Developer Meeting], but others are surely allowed to work on it as > > well) > > ** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the > > current documentation is now incorrect > > > > Unless Bruce endorsed this development strategy, I think it unfair for > > commit > > 7117685 to impose a deadline on his backup.sgml project. Did commit > > 7117685 > > > > I agree that's a horrible wording. What it meant to imply was a deadline > for *me* to help take that off Bruce's shoulders before then while also > opening the doors for others to work on it should they want. Re-reading it > now I can see that's not at all what it says. I'll reword (or rather, > remove most of that, since it's been discussed separately and doesn't > actually need to go on the open items list which is a list and not a > discussion) Ahh. Your original wording did admit your interpretation; I just didn't think of that interpretation. > > The chapter already does describe pg_basebackup before describing > > pg_start_backup; what else did the plan entail? > Recommending people to primarily use tried and tested ways of doing backups > (including a reference to a list, probably on the wiki, of known external > tools that "do things the right way", such as backrest or barman) rather > than focusing on examples that aren't necessarily even correct, and > certainly not complete. > > Mentioning the actual problems of exclusive base backups. (And obviously > talk about how using pg_start_backup/pg_stop_backup now makes it possible > to do "low level" backups without the risks of exclusive backups -- which > is the "incomplete" part from my note. > > More clearly outlining backup vs dump, and possibly (I can't actually > remember if we decided the second step) put base backups before pg_dump > since the topic is backups. Unless you especially want to self-impose the same tight resolution schedule that 9.6 regressions experience, let's move this to the "Non-bugs" section. Which do you prefer? I don't think the opportunity for more documentation in light of 7117685 constitutes a regression, and I don't want "Open Issues" to double as a parking lot for slow-moving non-regressions.
On Wed, Apr 06, 2016 at 09:17:22AM +0200, Magnus Hagander wrote:
> On Wed, Apr 6, 2016 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> > > I've pushed this version, and also added the item from the Brussels
> > > developer meeting to actually rewrite the main backup docs to the open
> > > items so they are definitely not forgotten for 9.6.
> >
> > Here's that PostgreSQL 9.6 open item:
> >
> > * Update of backup documentation (assigned to Bruce at [
> > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting
> > Brussels Developer Meeting], but others are surely allowed to work on it as
> > well)
> > ** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the
> > current documentation is now incorrect
> >
> > Unless Bruce endorsed this development strategy, I think it unfair for
> > commit
> > 7117685 to impose a deadline on his backup.sgml project. Did commit
> > 7117685
> >
>
> I agree that's a horrible wording. What it meant to imply was a deadline
> for *me* to help take that off Bruce's shoulders before then while also
> opening the doors for others to work on it should they want. Re-reading it
> now I can see that's not at all what it says. I'll reword (or rather,
> remove most of that, since it's been discussed separately and doesn't
> actually need to go on the open items list which is a list and not a
> discussion)
Ahh. Your original wording did admit your interpretation; I just didn't think
of that interpretation.
> > The chapter already does describe pg_basebackup before describing
> > pg_start_backup; what else did the plan entail?
> Recommending people to primarily use tried and tested ways of doing backups
> (including a reference to a list, probably on the wiki, of known external
> tools that "do things the right way", such as backrest or barman) rather
> than focusing on examples that aren't necessarily even correct, and
> certainly not complete.
>
> Mentioning the actual problems of exclusive base backups. (And obviously
> talk about how using pg_start_backup/pg_stop_backup now makes it possible
> to do "low level" backups without the risks of exclusive backups -- which
> is the "incomplete" part from my note.
>
> More clearly outlining backup vs dump, and possibly (I can't actually
> remember if we decided the second step) put base backups before pg_dump
> since the topic is backups.
Unless you especially want to self-impose the same tight resolution schedule
that 9.6 regressions experience, let's move this to the "Non-bugs" section.
Which do you prefer? I don't think the opportunity for more documentation in
light of 7117685 constitutes a regression, and I don't want "Open Issues" to
double as a parking lot for slow-moving non-regressions.
Well, if we *don't* do the rewrite before we release it, then we have to instead put information about the new version of the functions into the old structure I think.
On Mon, Apr 11, 2016 at 5:22 AM, Magnus Hagander <magnus@hagander.net> wrote: > Well, if we *don't* do the rewrite before we release it, then we have to > instead put information about the new version of the functions into the old > structure I think. I think that you should have done that in the patch as committed. Maybe you could take an hour and go do that now, and then you can do the rewrite when you get to it. Tracking open issues and getting them resolved is a lot of work, so it kind of frustrates me that you committed this patch knowing that it was going to create one when, with only slightly more work, you could have avoided that. Perhaps that is rigid and intolerant of me, but if I had done that for even 25% of the patches I committed this CommitFest, the open-issues list would be a bloodbath right now. I apologize if this sounds harsh. I don't mean to be a jerk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote: > On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch <noah@leadboat.com> wrote: > > Unless you especially want to self-impose the same tight resolution > > schedule > > that 9.6 regressions experience, let's move this to the "Non-bugs" section. > > Which do you prefer? I don't think the opportunity for more documentation > > in > > light of 7117685 constitutes a regression, and I don't want "Open Issues" > > to > > double as a parking lot for slow-moving non-regressions. > > > > Well, if we *don't* do the rewrite before we release it, then we have to > instead put information about the new version of the functions into the old > structure I think. > > So I think it's an open issue. Works for me... [This is a generic notification.] The above-described topic is currently a PostgreSQL 9.6 open item. Magnus, since you committed the patch believed to have created it, you own this open item. If that responsibility lies elsewhere, please let us know whose responsibility it is to fix this. Since new open items may be discovered at any time and I want to plan to have them all fixed well in advance of the ship date, I will appreciate your efforts toward speedy resolution. Please present, within 72 hours, a plan to fix the defect within seven days of this message. Thanks. > But maybe we should have a separate section > on the open items list for documentation issues? I tihnk we've had that > some times before. Maybe. If the list were starting to get crowded with doc-only items, that would certainly have benefits.
On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch <noah@leadboat.com> wrote:
> > Unless you especially want to self-impose the same tight resolution
> > schedule
> > that 9.6 regressions experience, let's move this to the "Non-bugs" section.
> > Which do you prefer? I don't think the opportunity for more documentation
> > in
> > light of 7117685 constitutes a regression, and I don't want "Open Issues"
> > to
> > double as a parking lot for slow-moving non-regressions.
> >
>
> Well, if we *don't* do the rewrite before we release it, then we have to
> instead put information about the new version of the functions into the old
> structure I think.
>
> So I think it's an open issue.
Works for me...
[This is a generic notification.]
The above-described topic is currently a PostgreSQL 9.6 open item. Magnus,
since you committed the patch believed to have created it, you own this open
item. If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this. Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution. Please
present, within 72 hours, a plan to fix the defect within seven days of this
message. Thanks.
On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote: > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote: > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote: > > > Well, if we *don't* do the rewrite before we release it, then we have to > > > instead put information about the new version of the functions into the > > old > > > structure I think. > > > > > > So I think it's an open issue. > > > > Works for me... > > > > [This is a generic notification.] > > > > The above-described topic is currently a PostgreSQL 9.6 open item. Magnus, > > since you committed the patch believed to have created it, you own this > > open > > item. If that responsibility lies elsewhere, please let us know whose > > responsibility it is to fix this. Since new open items may be discovered > > at > > any time and I want to plan to have them all fixed well in advance of the > > ship > > date, I will appreciate your efforts toward speedy resolution. Please > > present, within 72 hours, a plan to fix the defect within seven days of > > this > > message. Thanks. > > > > I won't have time to do the bigger rewrite/reordeirng by then, but I can > certainly commit to having the smaller updates done to cover the new > functionality in less than a week. If nothing else, that'll be something > for me to do on the flight over to pgconf.us. Thanks for that plan; it sounds good.
On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
> On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> > > Well, if we *don't* do the rewrite before we release it, then we have to
> > > instead put information about the new version of the functions into the
> > old
> > > structure I think.
> > >
> > > So I think it's an open issue.
> >
> > Works for me...
> >
> > [This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item. Magnus,
> > since you committed the patch believed to have created it, you own this
> > open
> > item. If that responsibility lies elsewhere, please let us know whose
> > responsibility it is to fix this. Since new open items may be discovered
> > at
> > any time and I want to plan to have them all fixed well in advance of the
> > ship
> > date, I will appreciate your efforts toward speedy resolution. Please
> > present, within 72 hours, a plan to fix the defect within seven days of
> > this
> > message. Thanks.
> >
>
> I won't have time to do the bigger rewrite/reordeirng by then, but I can
> certainly commit to having the smaller updates done to cover the new
> functionality in less than a week. If nothing else, that'll be something
> for me to do on the flight over to pgconf.us.
Thanks for that plan; it sounds good.
Here's a suggested patch.
Attachment
On Sat, Apr 16, 2016 at 06:22:47PM +0200, Magnus Hagander wrote: > > On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote: > > > I won't have time to do the bigger rewrite/reordeirng by then, but I can > > > certainly commit to having the smaller updates done to cover the new > > > functionality in less than a week. > There is some duplication between the non-exclusive and exclusive backup > sections, but I wanted to make sure that each set of instructions can just > be followed top-to-bottom. > > I've also removed some tips that aren't really necessary as part of the > step-by-step instructions in order to keep things from exploding in size. > > Finally, I've changed references to "backup dump" to just be "backup", > because it's confusing to call them something with dumps in when it's not > pg_dump. Enough that I got partially confused myself while editing... > > Comments? I scanned this briefly, and it looks reasonable. I recommend committing it forthwith. > *** a/doc/src/sgml/backup.sgml > --- b/doc/src/sgml/backup.sgml > *************** > *** 818,823 **** test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_xlog/ > --- 818,838 ---- > simple. It is very important that these steps are executed in > sequence, and that the success of a step is verified before > proceeding to the next step. > + </para> > + <para> > + Low level base backups can be made in a non-exclusive or an exclusive > + way. The non-exclusive method is recommended and the exclusive one will > + at some point be deprecated and removed. "I will deprecate X at some point" has the same effect as "I deprecate X now." If you have no doubt you want to deprecate it, I advise plainer phrasing like, "The exclusive method is deprecated and will eventually be removed." That is to say, just deprecate it right now. If you have doubts, omit deprecation.
On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote: >> >> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote: >> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote: >> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote: >> > > > Well, if we *don't* do the rewrite before we release it, then we >> > > > have to >> > > > instead put information about the new version of the functions into >> > > > the >> > > old >> > > > structure I think. >> > > > >> > > > So I think it's an open issue. >> > > >> > > Works for me... >> > > >> > > [This is a generic notification.] >> > > >> > > The above-described topic is currently a PostgreSQL 9.6 open item. >> > > Magnus, >> > > since you committed the patch believed to have created it, you own >> > > this >> > > open >> > > item. If that responsibility lies elsewhere, please let us know whose >> > > responsibility it is to fix this. Since new open items may be >> > > discovered >> > > at >> > > any time and I want to plan to have them all fixed well in advance of >> > > the >> > > ship >> > > date, I will appreciate your efforts toward speedy resolution. Please >> > > present, within 72 hours, a plan to fix the defect within seven days >> > > of >> > > this >> > > message. Thanks. >> > > >> > >> > I won't have time to do the bigger rewrite/reordeirng by then, but I can >> > certainly commit to having the smaller updates done to cover the new >> > functionality in less than a week. If nothing else, that'll be something >> > for me to do on the flight over to pgconf.us. >> >> Thanks for that plan; it sounds good. > > > Here's a suggested patch. > > There is some duplication between the non-exclusive and exclusive backup > sections, but I wanted to make sure that each set of instructions can just > be followed top-to-bottom. > > I've also removed some tips that aren't really necessary as part of the > step-by-step instructions in order to keep things from exploding in size. > > Finally, I've changed references to "backup dump" to just be "backup", > because it's confusing to call them something with dumps in when it's not > pg_dump. Enough that I got partially confused myself while editing... > > Comments? + Low level base backups can be made in a non-exclusive or an exclusive + way. The non-exclusive method is recommended and the exclusive one will + at some point be deprecated and removed. I don't object to add a non-exclusive mode of low level backup, but I disagree to mark an exclusive backup as deprecated at least until we can alleviate some pains that a non-exclusive mode causes. One example of the pain, in a non-exclusive backup, we need to keep the IDLE connection which was used to execute pg_start_backup(), until the end of backup. Of course a backup can take a very long time. In this case the IDLE connection also needs to remain for such a long time. If it's accidentally terminated (e.g., because of IDLE connection), the backup fails and needs to be taken again from the beginning. Another pain in a non-exclusive backup is to have to execute both pg_start_backup() and pg_stop_backup() on the same connection. Please imagine the case where psql is used to execute those two backup functions (I believe that there are many users who do this). For example, psql -c "SELECT pg_start_backup()" rsync, cp, tar, storage backup, or something psql -c "SELECT pg_stop_backup()" A non-exclusive backup breaks the above very simple steps because two backup functions are executed on different connections. So, how should we modify the steps for a non-exclusive backup? Basically we need to pause psql after pg_start_backup(), signal it to resume after the copy of database cluster is taken, and make it execute pg_stop_backup(). I'm afraid that the backup script will be complicated because of this pain of non-exclusive backup. + The <function>pg_stop_backup</> will return one row with three + values. The second of these fields should be written to a file named + <filename>backup_label</> in the root directory of the backup. The + third field should be written to a file named + <filename>tablespace_map</> unless the field is empty. How should we write those two values to different files when we execute pg_stop_backup() via psql? Whole output of pg_stop_backup() should be written to a transient file and it should be filtered and written to different two files by using some Linux commands? This also seems to make the backup script more complicated. Regards, -- Fujii Masao
+ Low level base backups can be made in a non-exclusive or an exclusiveOn Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
>> > > > Well, if we *don't* do the rewrite before we release it, then we
>> > > > have to
>> > > > instead put information about the new version of the functions into
>> > > > the
>> > > old
>> > > > structure I think.
>> > > >
>> > > > So I think it's an open issue.
>> > >
>> > > Works for me...
>> > >
>> > > [This is a generic notification.]
>> > >
>> > > The above-described topic is currently a PostgreSQL 9.6 open item.
>> > > Magnus,
>> > > since you committed the patch believed to have created it, you own
>> > > this
>> > > open
>> > > item. If that responsibility lies elsewhere, please let us know whose
>> > > responsibility it is to fix this. Since new open items may be
>> > > discovered
>> > > at
>> > > any time and I want to plan to have them all fixed well in advance of
>> > > the
>> > > ship
>> > > date, I will appreciate your efforts toward speedy resolution. Please
>> > > present, within 72 hours, a plan to fix the defect within seven days
>> > > of
>> > > this
>> > > message. Thanks.
>> > >
>> >
>> > I won't have time to do the bigger rewrite/reordeirng by then, but I can
>> > certainly commit to having the smaller updates done to cover the new
>> > functionality in less than a week. If nothing else, that'll be something
>> > for me to do on the flight over to pgconf.us.
>>
>> Thanks for that plan; it sounds good.
>
>
> Here's a suggested patch.
>
> There is some duplication between the non-exclusive and exclusive backup
> sections, but I wanted to make sure that each set of instructions can just
> be followed top-to-bottom.
>
> I've also removed some tips that aren't really necessary as part of the
> step-by-step instructions in order to keep things from exploding in size.
>
> Finally, I've changed references to "backup dump" to just be "backup",
> because it's confusing to call them something with dumps in when it's not
> pg_dump. Enough that I got partially confused myself while editing...
>
> Comments?
+ way. The non-exclusive method is recommended and the exclusive one will
+ at some point be deprecated and removed.
I don't object to add a non-exclusive mode of low level backup,
but I disagree to mark an exclusive backup as deprecated at least
until we can alleviate some pains that a non-exclusive mode causes.
One example of the pain, in a non-exclusive backup, we need to keep
the IDLE connection which was used to execute pg_start_backup(),
until the end of backup. Of course a backup can take a very
long time. In this case the IDLE connection also needs to remain
for such a long time. If it's accidentally terminated (e.g., because
of IDLE connection), the backup fails and needs to be taken again
from the beginning.
Another pain in a non-exclusive backup is to have to execute both
pg_start_backup() and pg_stop_backup() on the same connection.
Please imagine the case where psql is used to execute those two
backup functions (I believe that there are many users who do this).
For example,
psql -c "SELECT pg_start_backup()"
rsync, cp, tar, storage backup, or something
psql -c "SELECT pg_stop_backup()"
A non-exclusive backup breaks the above very simple steps because
two backup functions are executed on different connections.
So, how should we modify the steps for a non-exclusive backup?
Basically we need to pause psql after pg_start_backup(), signal it
to resume after the copy of database cluster is taken, and make
it execute pg_stop_backup(). I'm afraid that the backup script
will be complicated because of this pain of non-exclusive backup.
+ The <function>pg_stop_backup</> will return one row with three
+ values. The second of these fields should be written to a file named
+ <filename>backup_label</> in the root directory of the backup. The
+ third field should be written to a file named
+ <filename>tablespace_map</> unless the field is empty.
How should we write those two values to different files when
we execute pg_stop_backup() via psql? Whole output of
pg_stop_backup() should be written to a transient file and
it should be filtered and written to different two files by
using some Linux commands? This also seems to make the backup
script more complicated.
On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote: > Note that we have not marked them as deprecated. We're just giving warnings > that they will be deprecated. But I think that is being said here is that maybe they won't be deprecated, at least not any time soon. And therefore maybe we shouldn't say so. Frankly, I think that's right. It is one thing to say that the new method is preferred - +1 for that. But the old method is going to continue to be used by many people for a long time, and in some cases will be superior. That's not something we can deprecate, unless I'm misunderstanding the situation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote: > On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote: > > Note that we have not marked them as deprecated. We're just giving warnings > > that they will be deprecated. > > But I think that is being said here is that maybe they won't be > deprecated, at least not any time soon. And therefore maybe we > shouldn't say so. > > Frankly, I think that's right. It is one thing to say that the new > method is preferred - +1 for that. But the old method is going to > continue to be used by many people for a long time, and in some cases > will be superior. That's not something we can deprecate, unless I'm > misunderstanding the situation. I agree with Robert. One the one hand we are saying pg_stop_backup() doesn't work well in psql because you get those two file contents output that you have to write, and on the other hand we are saying we are going to deprecate the method that does work well in psql? I must be missing something too, as that makes no sense. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> > Note that we have not marked them as deprecated. We're just giving warnings
> > that they will be deprecated.
>
> But I think that is being said here is that maybe they won't be
> deprecated, at least not any time soon. And therefore maybe we
> shouldn't say so.
>
> Frankly, I think that's right. It is one thing to say that the new
> method is preferred - +1 for that. But the old method is going to
> continue to be used by many people for a long time, and in some cases
> will be superior. That's not something we can deprecate, unless I'm
> misunderstanding the situation.
I agree with Robert. One the one hand we are saying pg_stop_backup()
doesn't work well in psql because you get those two file contents output
that you have to write, and on the other hand we are saying we are going
to deprecate the method that does work well in psql? I must be missing
something too, as that makes no sense.
* Magnus Hagander (magnus@hagander.net) wrote: > On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote: > > > On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> > > wrote: > > > > Note that we have not marked them as deprecated. We're just giving > > warnings > > > > that they will be deprecated. > > > > > > But I think that is being said here is that maybe they won't be > > > deprecated, at least not any time soon. And therefore maybe we > > > shouldn't say so. > > > > > > Frankly, I think that's right. It is one thing to say that the new > > > method is preferred - +1 for that. But the old method is going to > > > continue to be used by many people for a long time, and in some cases > > > will be superior. That's not something we can deprecate, unless I'm > > > misunderstanding the situation. > > > > I agree with Robert. One the one hand we are saying pg_stop_backup() > > doesn't work well in psql because you get those two file contents output > > that you have to write, and on the other hand we are saying we are going > > to deprecate the method that does work well in psql? I must be missing > > something too, as that makes no sense. > > I don't agree. I don't see how "making a backup using psql" is more > important than "making a backup without potentially dangerous sideeffects". > But if others don't agree, could one of you at least provide an example of > how you'd like the docs to read in a way that doesn't deprecate the unsafe > way but still informs the user properly? I'm with Magnus on this, primairly because I've come to understand just how dangerous the old backup method is. That method *should* be deprecated and discouraged. A backup method which means your database doesn't restart properly if the system crashes during the backup is *bad*. Fixing that means using something more complicated than the old method and that's a bit of a pain in psql, but that doesn't mean we should tell people that the old method is an acceptable approach. Perhaps we can look at improving psql to make it easier to use it for the new backup method but, honestly, all these hackish scripts to do backups aren't doing a lot of things that a real backup solution needs to do. Improving psql for this is a bit like new paint on the titanic. Thanks! Stephen
Fixing that means using something more complicated than the old method
and that's a bit of a pain in psql, but that doesn't mean we should tell
people that the old method is an acceptable approach.
On Thu, Apr 21, 2016 at 2:32 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> >> wrote: >> > On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote: >> >> >> >> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote: >> >> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> >> >> > wrote: >> >> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote: >> >> > > > Well, if we *don't* do the rewrite before we release it, then we >> >> > > > have to >> >> > > > instead put information about the new version of the functions >> >> > > > into >> >> > > > the >> >> > > old >> >> > > > structure I think. >> >> > > > >> >> > > > So I think it's an open issue. >> >> > > >> >> > > Works for me... >> >> > > >> >> > > [This is a generic notification.] >> >> > > >> >> > > The above-described topic is currently a PostgreSQL 9.6 open item. >> >> > > Magnus, >> >> > > since you committed the patch believed to have created it, you own >> >> > > this >> >> > > open >> >> > > item. If that responsibility lies elsewhere, please let us know >> >> > > whose >> >> > > responsibility it is to fix this. Since new open items may be >> >> > > discovered >> >> > > at >> >> > > any time and I want to plan to have them all fixed well in advance >> >> > > of >> >> > > the >> >> > > ship >> >> > > date, I will appreciate your efforts toward speedy resolution. >> >> > > Please >> >> > > present, within 72 hours, a plan to fix the defect within seven >> >> > > days >> >> > > of >> >> > > this >> >> > > message. Thanks. >> >> > > >> >> > >> >> > I won't have time to do the bigger rewrite/reordeirng by then, but I >> >> > can >> >> > certainly commit to having the smaller updates done to cover the new >> >> > functionality in less than a week. If nothing else, that'll be >> >> > something >> >> > for me to do on the flight over to pgconf.us. >> >> >> >> Thanks for that plan; it sounds good. >> > >> > >> > Here's a suggested patch. >> > >> > There is some duplication between the non-exclusive and exclusive backup >> > sections, but I wanted to make sure that each set of instructions can >> > just >> > be followed top-to-bottom. >> > >> > I've also removed some tips that aren't really necessary as part of the >> > step-by-step instructions in order to keep things from exploding in >> > size. >> > >> > Finally, I've changed references to "backup dump" to just be "backup", >> > because it's confusing to call them something with dumps in when it's >> > not >> > pg_dump. Enough that I got partially confused myself while editing... >> > >> > Comments? >> >> + Low level base backups can be made in a non-exclusive or an exclusive >> + way. The non-exclusive method is recommended and the exclusive one >> will >> + at some point be deprecated and removed. >> >> I don't object to add a non-exclusive mode of low level backup, >> but I disagree to mark an exclusive backup as deprecated at least >> until we can alleviate some pains that a non-exclusive mode causes. > > > Note that we have not marked them as deprecated. We're just giving warnings > that they will be deprecated. > >> >> >> One example of the pain, in a non-exclusive backup, we need to keep >> the IDLE connection which was used to execute pg_start_backup(), >> until the end of backup. Of course a backup can take a very >> long time. In this case the IDLE connection also needs to remain >> for such a long time. If it's accidentally terminated (e.g., because >> of IDLE connection), the backup fails and needs to be taken again >> from the beginning. > > > > Yes, it's a bit annoying. But it's something you can handle. Unlike the > problems that exist with an exclusive backup which you *cannot* handle from > the backup script/commands. I know many systems which are currently running well in production and handling that problem of an exclusive backup. For example, they use Pacemaker/Corosync to shared-disk HA configuration. PostgreSQL resource agent for Pacemaker removes backup_label in the case of failover to prevent the standby server from failing to recover from the crash. This is not so neat solution, but they could live with the problem for a long time, so far. I'm NOT against migrating their backup scripts so that new method is used, at the end. On the other hand, I think that we don't need to warn and rush them to do that at least until new method will be polished. >> Another pain in a non-exclusive backup is to have to execute both >> pg_start_backup() and pg_stop_backup() on the same connection. > > > Pretty sure that's the same one you just listed? Yes, the sources of those pains are the same. I wonder if it's better to store the backup state information in shared memory, catalog, flat file or something instead of local memory so that we can execute pg_stop_backup() in different session from that pg_start_backup() is called. Maybe I'm missing something basic, but why do those functions need to be called in the same session at all? >> Please imagine the case where psql is used to execute those two >> backup functions (I believe that there are many users who do this). >> For example, >> >> psql -c "SELECT pg_start_backup()" >> rsync, cp, tar, storage backup, or something >> psql -c "SELECT pg_stop_backup()" >> >> A non-exclusive backup breaks the above very simple steps because >> two backup functions are executed on different connections. >> >> So, how should we modify the steps for a non-exclusive backup? > > > You should at least not use psql in that way. You need to open psql in a > pipe and drive it through that. Or use a more appropriate client. > > >> Basically we need to pause psql after pg_start_backup(), signal it >> to resume after the copy of database cluster is taken, and make >> it execute pg_stop_backup(). I'm afraid that the backup script >> will be complicated because of this pain of non-exclusive backup. > > > Yes, if you insist on using psql - which is not a good tool for this - then > yes. But you could for example make it a psql script that does something > along > SELECT pg_start_backup(); > \! rsync ... > SELECT pg_stop_backup() > > Or as said above, drive it through a pipe instead. Or use an appropriate > client library. So, what's the appropriate client library for new backup method? It's better to document that. > > The bottom line is yes, it makes it a bit worse. But it is something you can > handle quite simply from your client. And wen you do, it will be *right*. > > You have no way at all to fix the problem if your server crashes during an > exclusive backup, for example. No matter how you connect to the database, > because that's not where that problem is. > > >> >> >> + The <function>pg_stop_backup</> will return one row with three >> + values. The second of these fields should be written to a file named >> + <filename>backup_label</> in the root directory of the backup. The >> + third field should be written to a file named >> + <filename>tablespace_map</> unless the field is empty. >> >> How should we write those two values to different files when >> we execute pg_stop_backup() via psql? Whole output of >> pg_stop_backup() should be written to a transient file and >> it should be filtered and written to different two files by >> using some Linux commands? This also seems to make the backup >> script more complicated. > > > Yes, you would have to do that. And yes, psql is not a good tool for this. > So the solution is to not force yourself to use a tool that doesn't work > well for this problem. So it's better to document a tool which can work well or how to use psql for new backup method (i.e., how to create backup_label and tablespace_map files from the output of psql, of course it's also better to write the example). Regards, -- Fujii Masao
I know many systems which are currently running well in production andOn Thu, Apr 21, 2016 at 2:32 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>> >>
>> >> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> >> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com>
>> >> > wrote:
>> >> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> Yes, it's a bit annoying. But it's something you can handle. Unlike the
> problems that exist with an exclusive backup which you *cannot* handle from
> the backup script/commands.
handling that problem of an exclusive backup. For example, they use
Pacemaker/Corosync to shared-disk HA configuration. PostgreSQL resource
agent for Pacemaker removes backup_label in the case of failover to
prevent the standby server from failing to recover from the crash.
This is not so neat solution, but they could live with the problem for
a long time, so far.
I'm NOT against migrating their backup scripts so that new method is used,
at the end. On the other hand, I think that we don't need to warn and rush
them to do that at least until new method will be polished.
>> Another pain in a non-exclusive backup is to have to execute both
>> pg_start_backup() and pg_stop_backup() on the same connection.
>
>
> Pretty sure that's the same one you just listed?
Yes, the sources of those pains are the same.
I wonder if it's better to store the backup state information in shared
memory, catalog, flat file or something instead of local memory
so that we can execute pg_stop_backup() in different session from that
pg_start_backup() is called. Maybe I'm missing something basic, but
why do those functions need to be called in the same session at all?
> Yes, if you insist on using psql - which is not a good tool for this - then
> yes. But you could for example make it a psql script that does something
> along
> SELECT pg_start_backup();
> \! rsync ...
> SELECT pg_stop_backup()
>
> Or as said above, drive it through a pipe instead. Or use an appropriate
> client library.
So, what's the appropriate client library for new backup method?
It's better to document that.
>> How should we write those two values to different files when
>> we execute pg_stop_backup() via psql? Whole output of
>> pg_stop_backup() should be written to a transient file and
>> it should be filtered and written to different two files by
>> using some Linux commands? This also seems to make the backup
>> script more complicated.
>
>
> Yes, you would have to do that. And yes, psql is not a good tool for this.
> So the solution is to not force yourself to use a tool that doesn't work
> well for this problem.
So it's better to document a tool which can work well or how to use psql
for new backup method (i.e., how to create backup_label and tablespace_map
files from the output of psql, of course it's also better to write
the example).
On 4/24/16 11:49 AM, Stephen Frost wrote: > * Magnus Hagander (magnus@hagander.net) wrote: >> On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote: >> >>> On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote: >>> >>>> Frankly, I think that's right. It is one thing to say that the new >>>> method is preferred - +1 for that. But the old method is going to >>>> continue to be used by many people for a long time, and in some cases >>>> will be superior. That's not something we can deprecate, unless I'm >>>> misunderstanding the situation. >>> >>> I agree with Robert. One the one hand we are saying pg_stop_backup() >>> doesn't work well in psql because you get those two file contents output >>> that you have to write, and on the other hand we are saying we are going >>> to deprecate the method that does work well in psql? I must be missing >>> something too, as that makes no sense. >> >> I don't agree. I don't see how "making a backup using psql" is more >> important than "making a backup without potentially dangerous sideeffects". >> But if others don't agree, could one of you at least provide an example of >> how you'd like the docs to read in a way that doesn't deprecate the unsafe >> way but still informs the user properly? > > I'm with Magnus on this, primairly because I've come to understand just > how dangerous the old backup method is. That method *should* be > deprecated and discouraged. A backup method which means your database > doesn't restart properly if the system crashes during the backup is > *bad*. +1 > Perhaps we can look at improving psql to make it easier to use it for > the new backup method but, honestly, all these hackish scripts to do > backups aren't doing a lot of things that a real backup solution needs > to do. Improving psql for this is a bit like new paint on the titanic. Personally I think we do the users a disservice by implying that backup is as simple as calling two functions and copying the files. Oh, and don't forget to include WAL! -- -David david@pgmasters.net
On Mon, Apr 25, 2016 at 9:11 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > PostgreSQL resource agent for Pacemaker removes backup_label in > the case of failover to prevent the standby server from failing > to recover from the crash. Yikes! I hope that the providers of Pacemaker document that they are using a technique which can silently corrupt the cluster. http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html > This is not so neat solution, but they could live with the > problem for a long time, so far. It can work sometimes. It can also produce corruption which will silently return incorrect results from queries for a long time before some other symptom of the corruption surfaces. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I'm a bit late to the party, but I only recently noticed that the exclusive backup API is deprecated. On Sun, 2016-04-24 at 11:49 -0400, Stephen Frost wrote: > * Magnus Hagander (magnus@hagander.net) wrote: > > On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote: > > > > > > > > Frankly, I think that's right. It is one thing to say that the new > > > > method is preferred - +1 for that. But the old method is going to > > > > continue to be used by many people for a long time, and in some cases > > > > will be superior. That's not something we can deprecate, unless I'm > > > > misunderstanding the situation. > > > > > > I agree with Robert. One the one hand we are saying pg_stop_backup() > > > doesn't work well in psql because you get those two file contents output > > > that you have to write, and on the other hand we are saying we are going > > > to deprecate the method that does work well in psql? I must be missing > > > something too, as that makes no sense. > > > > I don't agree. I don't see how "making a backup using psql" is more > > important than "making a backup without potentially dangerous sideeffects". > > But if others don't agree, could one of you at least provide an example of > > how you'd like the docs to read in a way that doesn't deprecate the unsafe > > way but still informs the user properly? > > I'm with Magnus on this, primairly because I've come to understand just > how dangerous the old backup method is. That method *should* be > deprecated and discouraged. A backup method which means your database > doesn't restart properly if the system crashes during the backup is > *bad*. > > Fixing that means using something more complicated than the old method > and that's a bit of a pain in psql, but that doesn't mean we should tell > people that the old method is an acceptable approach. > > Perhaps we can look at improving psql to make it easier to use it for > the new backup method but, honestly, all these hackish scripts to do > backups aren't doing a lot of things that a real backup solution needs > to do. Improving psql for this is a bit like new paint on the titanic. I guess that the danger mentioned above is that the database may not restart if it crashes while in exclusive backup mode, because the WAL containing the starting checkpoint has already been purged. True, that is a problem. I would say the typical use case for the exclusive backup method is the following (and I have seen it often): You have some kind of backup software that does file system backups and is able to run a "pre-backup" and "post-backup" script. The backup is triggered by the backup software. Another thing that is problematic with non-exclusive backups is that you have to write the backup_label file into the backup after the backup has been taken. With a technique like the above, you cannot easily do that. So in essence, all backup methods that work like the above won't be possible any more once the exclusive backup is gone. I expect that that will make a lot of users unhappy. Would it be an option to change the semantics so that "backup_label" is ignored if there is no "recovery.conf"? Of course that opens another way to corrupt your database (by starting it from a backup without first creating "recovery.conf"), but we could add another big warning. I'd say that the people who ignore such a warning are the same people that happily remove "backup_label" when they see the following message upon starting a cluster from a backup without recovery: If you are not restoring from a backup, try removing the file "backup_label". Yours, Laurenz Albe
Greetings, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > On Sun, 2016-04-24 at 11:49 -0400, Stephen Frost wrote: > > * Magnus Hagander (magnus@hagander.net) wrote: > > > On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote: > > > > > > > > > > Frankly, I think that's right. It is one thing to say that the new > > > > > method is preferred - +1 for that. But the old method is going to > > > > > continue to be used by many people for a long time, and in some cases > > > > > will be superior. That's not something we can deprecate, unless I'm > > > > > misunderstanding the situation. > > > > > > > > I agree with Robert. One the one hand we are saying pg_stop_backup() > > > > doesn't work well in psql because you get those two file contents output > > > > that you have to write, and on the other hand we are saying we are going > > > > to deprecate the method that does work well in psql? I must be missing > > > > something too, as that makes no sense. > > > > > > I don't agree. I don't see how "making a backup using psql" is more > > > important than "making a backup without potentially dangerous sideeffects". > > > But if others don't agree, could one of you at least provide an example of > > > how you'd like the docs to read in a way that doesn't deprecate the unsafe > > > way but still informs the user properly? > > > > I'm with Magnus on this, primairly because I've come to understand just > > how dangerous the old backup method is. That method *should* be > > deprecated and discouraged. A backup method which means your database > > doesn't restart properly if the system crashes during the backup is > > *bad*. > > > > Fixing that means using something more complicated than the old method > > and that's a bit of a pain in psql, but that doesn't mean we should tell > > people that the old method is an acceptable approach. > > > > Perhaps we can look at improving psql to make it easier to use it for > > the new backup method but, honestly, all these hackish scripts to do > > backups aren't doing a lot of things that a real backup solution needs > > to do. Improving psql for this is a bit like new paint on the titanic. > > I guess that the danger mentioned above is that the database may not > restart if it crashes while in exclusive backup mode, because the > WAL containing the starting checkpoint has already been purged. The database would think it's trying to do recovery from a backup, not doing crash recovery like it should be doing, so, yes, that's pretty ugly. > I would say the typical use case for the exclusive backup method is > the following (and I have seen it often): > > You have some kind of backup software that does file system backups > and is able to run a "pre-backup" and "post-backup" script. > The backup is triggered by the backup software. Seeing it often doesn't make it a good solution. Running just pre-backup and post-backup scripts and copying the filesystem isn't enough to perform an online PostgreSQL backup- the WAL needs to be collected as well, and you need to make sure that you have all of the WAL before the backup can be considered complete. On restore, you're going to need to create a recovery.conf (at least in released versions) which provides a restore command (needed even in HEAD today) to get the old WAL, so having to also create the backup_label file shouldn't be that difficult. Lastly, if you really want, you can extract out the data from pg_stop_backup in whatever your post-backup script is. > Another thing that is problematic with non-exclusive backups is that > you have to write the backup_label file into the backup after the > backup has been taken. With a technique like the above, you cannot > easily do that. ... why not? You need to create the recovery.conf anyway, and you need to be archiving WAL somewhere, so it certainly seems like you could put the backup_label there too. > So in essence, all backup methods that work like the above won't be > possible any more once the exclusive backup is gone. That's why it's been deprecated and not removed... but hopefully we will be able to remove it in the future as it really isn't a good API. > I expect that that will make a lot of users unhappy. If it means that they implement a better backup strategy, then it's probably a good thing, which is the goal of this. > Would it be an option to change the semantics so that "backup_label" > is ignored if there is no "recovery.conf"? Of course that opens > another way to corrupt your database (by starting it from a backup > without first creating "recovery.conf"), but we could add another > big warning. My recollection is that other approaches were considered to deal with this but none of them were all that good, though that was two years ago at this point. Certainly the issue with the above where a restored backup without a recovery.conf could unwittingly be treated as if it was crash recovery and end up with a very incorrect database as a result, without any clear indication of that being the case, is quite bad. And, really, the arguments for keeping this API around and figuring out how to 'fix' it would have to outweigh the cost of asking users to reevaluate their existing solutions and implementing ones that use the exclusive API, which is pretty clearly better. What's more, any change we make to the deprecated API has the chance of breaking existing users too. If we're going to make some kind of breaking change, let's make it in a way that eliminates as many possible pitfalls as we can, and that's basically what the non-exclusive API does. I don't see any compelling argument for trying to do something half-way any more today than I did two years ago when this was being discussed. Thanks! Stephen
Attachment
On Sun, 2018-11-25 at 13:50 -0500, Stephen Frost wrote: > I don't see any compelling argument for trying to do something half-way > any more today than I did two years ago when this was being discussed. That may well be so. It may be better to make users unhappy than to make them very unhappy... But I find the following points unconvincing: > > I would say the typical use case for the exclusive backup method is > > the following (and I have seen it often): > > > > You have some kind of backup software that does file system backups > > and is able to run a "pre-backup" and "post-backup" script. > > The backup is triggered by the backup software. > > Seeing it often doesn't make it a good solution. Running just > pre-backup and post-backup scripts and copying the filesystem isn't > enough to perform an online PostgreSQL backup- the WAL needs to be > collected as well, and you need to make sure that you have all of the > WAL before the backup can be considered complete. Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter. So this is not a problem. > On restore, you're > going to need to create a recovery.conf (at least in released versions) > which provides a restore command (needed even in HEAD today) to get the > old WAL, so having to also create the backup_label file shouldn't be > that difficult. You write "recovery.conf" upon recovery, when you have the restored backup, so you have it on a file system. No problem adding a file then. This is entirely different from adding a "backup_label" file to a backup that has been taken by a backup software in some arbitrary format in some arbitrary location (think snapshot). So these two cases don't compare. > Lastly, if you really want, you can extract out the data from > pg_stop_backup in whatever your post-backup script is. Come on, now. You usually use backup techniques like that because you can't get your large database backed up in the available time window otherwise. > > Another thing that is problematic with non-exclusive backups is that > > you have to write the backup_label file into the backup after the > > backup has been taken. With a technique like the above, you cannot > > easily do that. > > ... why not? You need to create the recovery.conf anyway, and you need > to be archiving WAL somewhere, so it certainly seems like you could put > the backup_label there too. As I said above, you don't add "recovery.conf" to the backup right away, so these two cases don't compare. > > I expect that that will make a lot of users unhappy. > > If it means that they implement a better backup strategy, then it's > probably a good thing, which is the goal of this. I thought our goal is to provide convenient backup methods... Ignoring "backup_label" on restart, as I suggested in my previous message, probably isn't such a hot idea. But what's wrong with retaining the exclusive backup method and just sticking a big "Warning: this may cause a restart to fail after a crash" on it? That sure wouldn't be unsafe. If somebody prefers a simpler backup method at the price of having to manually restart the server after a crash, what's wrong with that? Why not give them the choice? I'd say that one could write a server start script that just removes "backup_label", but I am sure someone will argue that then somebody will restore a backup and then restart the server without first recovering the database... Yours, Laurenz Albe
On Sun, 2018-11-25 at 13:50 -0500, Stephen Frost wrote:
> I don't see any compelling argument for trying to do something half-way
> any more today than I did two years ago when this was being discussed.
That may well be so. It may be better to make users unhappy than to
make them very unhappy...
But I find the following points unconvincing:
> > I would say the typical use case for the exclusive backup method is
> > the following (and I have seen it often):
> >
> > You have some kind of backup software that does file system backups
> > and is able to run a "pre-backup" and "post-backup" script.
> > The backup is triggered by the backup software.
>
> Seeing it often doesn't make it a good solution. Running just
> pre-backup and post-backup scripts and copying the filesystem isn't
> enough to perform an online PostgreSQL backup- the WAL needs to be
> collected as well, and you need to make sure that you have all of the
> WAL before the backup can be considered complete.
Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
So this is not a problem.
> On restore, you're
> going to need to create a recovery.conf (at least in released versions)
> which provides a restore command (needed even in HEAD today) to get the
> old WAL, so having to also create the backup_label file shouldn't be
> that difficult.
You write "recovery.conf" upon recovery, when you have the restored
backup, so you have it on a file system. No problem adding a file then.
This is entirely different from adding a "backup_label" file to
a backup that has been taken by a backup software in some arbitrary
format in some arbitrary location (think snapshot).
> Lastly, if you really want, you can extract out the data from
> pg_stop_backup in whatever your post-backup script is.
Come on, now.
You usually use backup techniques like that because you can't get
your large database backed up in the available time window otherwise.
> > Another thing that is problematic with non-exclusive backups is that
> > you have to write the backup_label file into the backup after the
> > backup has been taken. With a technique like the above, you cannot
> > easily do that.
>
> ... why not? You need to create the recovery.conf anyway, and you need
> to be archiving WAL somewhere, so it certainly seems like you could put
> the backup_label there too.
As I said above, you don't add "recovery.conf" to the backup right away,
so these two cases don't compare.
> > I expect that that will make a lot of users unhappy.
>
> If it means that they implement a better backup strategy, then it's
> probably a good thing, which is the goal of this.
I thought our goal is to provide convenient backup methods...
Ignoring "backup_label" on restart, as I suggested in my previous message,
probably isn't such a hot idea.
But what's wrong with retaining the exclusive backup method and just
sticking a big "Warning: this may cause a restart to fail after a crash"
on it? That sure wouldn't be unsafe.
Stephen Frost wrote: > > > Seeing it often doesn't make it a good solution. Running just > > > pre-backup and post-backup scripts and copying the filesystem isn't > > > enough to perform an online PostgreSQL backup- the WAL needs to be > > > collected as well, and you need to make sure that you have all of the > > > WAL before the backup can be considered complete. > > > > Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter. > > So this is not a problem. > > That doesn’t actually make sure you have all of the WAL reliably saved > across the backup, it just cares what archive command returns, which is > sadly often a bad thing to depend on. I certainly wouldn’t rely on only > that for any system I cared about. If you write a bad archive_command, you have a problem. But that is quite unrelated to the problem at hand, if I am not mistaken. > > > On restore, you're > > > going to need to create a recovery.conf (at least in released versions) > > > which provides a restore command (needed even in HEAD today) to get the > > > old WAL, so having to also create the backup_label file shouldn't be > > > that difficult. > > > > You write "recovery.conf" upon recovery, when you have the restored > > backup, so you have it on a file system. No problem adding a file then. > > > > This is entirely different from adding a "backup_label" file to > > a backup that has been taken by a backup software in some arbitrary > > format in some arbitrary location (think snapshot). > > There isn’t any need to write the backup label before you restore the database, > just as you write recovery.conf then. Granted. But it is pretty convenient, and writing it to the data directory right away is a good thing on top, because it reduces the danger of inadvertedly starting the backup without recovery. > > > Lastly, if you really want, you can extract out the data from > > > pg_stop_backup in whatever your post-backup script is. > > > > Come on, now. > > You usually use backup techniques like that because you can't get > > your large database backed up in the available time window otherwise. > > I’m not following what you’re trying to get at here, why can’t you extract > the data for the backup label from pg_stop_backup..? Certainly other tools > do, even ones that do extremely fast parallel backups.. the two are > completely independent. > > Did you think I meant pg_basebackup..? I certaily didn’t. Oh yes, I misunderstood. Sorry. Yes, you can come up with a post-backup script that somehow communicates with your pre-backup script to get the information, but it sure is inconvenient. Simplicity is good in backup solutions, because complicated things tend to break more easily. > > I thought our goal is to provide convenient backup methods... > > Correctness would be first and having a broken system because of a crash during a backup isn’t correct. "Not starting up without manual intervention" is not actually broken... > > But what's wrong with retaining the exclusive backup method and just > > sticking a big "Warning: this may cause a restart to fail after a crash" > > on it? That sure wouldn't be unsafe. > > I haven’t seen anyone pushing for it to be removed immediately, but users should > not use it and newcomers would be much better served by using the non exclusive api. > There is a reason it was deprecated and it’s because it simply isn’t a good API. > Coming along a couple years later and saying that it’s a good API while ignoring > the issues that it has doesn’t change that. I don't think I'm ignoring the issues, I just think there is a valid use case for the exclusive backup API, with all its caveats. Of course I'm not arguing on behalf of organizations running lots of databases for whom manual intervention after a crash is unacceptable. I'm arguing on behalf of users that run a few databases, want a simple backup solution and are ready to deal with the shortcomings. But I will gladly accept defeat in this matter, I just needed to vent my opinion. Yours, Laurenz Albe
Stephen Frost wrote:
> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> >
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
>
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..? Certainly other tools
> do, even ones that do extremely fast parallel backups.. the two are
> completely independent.
>
> Did you think I meant pg_basebackup..? I certaily didn’t.
Oh yes, I misunderstood. Sorry.
Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient. Simplicity is good in backup solutions, because complicated
things tend to break more easily.
> > I thought our goal is to provide convenient backup methods...
>
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.
"Not starting up without manual intervention" is not actually broken...
I'm arguing on behalf of users that run a few databases, want a simple backup
solution and are ready to deal with the shortcomings.
Stephen Frost wrote:
> > > On restore, you're
> > > going to need to create a recovery.conf (at least in released versions)
> > > which provides a restore command (needed even in HEAD today) to get the
> > > old WAL, so having to also create the backup_label file shouldn't be
> > > that difficult.
> >
> > You write "recovery.conf" upon recovery, when you have the restored
> > backup, so you have it on a file system. No problem adding a file then.
> >
> > This is entirely different from adding a "backup_label" file to
> > a backup that has been taken by a backup software in some arbitrary
> > format in some arbitrary location (think snapshot).
>
> There isn’t any need to write the backup label before you restore the database,
> just as you write recovery.conf then.
Granted.
But it is pretty convenient, and writing it to the data directory right away
is a good thing on top, because it reduces the danger of inadvertedly
starting the backup without recovery.
> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> >
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
>
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..? Certainly other tools
> do, even ones that do extremely fast parallel backups.. the two are
> completely independent.
>
> Did you think I meant pg_basebackup..? I certaily didn’t.
Oh yes, I misunderstood. Sorry.
Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient. Simplicity is good in backup solutions, because complicated
things tend to break more easily.
> > I thought our goal is to provide convenient backup methods...
>
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.
"Not starting up without manual intervention" is not actually broken...
On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote: > > > There isn’t any need to write the backup label before you restore the database, > > > just as you write recovery.conf then. > > > > Granted. > > But it is pretty convenient, and writing it to the data directory right away > > is a good thing on top, because it reduces the danger of inadvertedly > > starting the backup without recovery. > > Writing it into the data directory is *not* a good thing because as soon as you do > that you risk there being an issue if there’s a crash. Writing into the backup > isn’t a bad idea but if you’re managing your backups then writing it somewhere else > (such as where you write your WAL) and associating it with the backup (presumably > it has a label) should make it easy to pull back when you restore. If there is a crash during the backup procedure, the backup is bad. Doesn't matter during which part of the backup procedure it happens. > > Yes, you can come up with a post-backup script that somehow communicates > > with your pre-backup script to get the information, but it sure is > > inconvenient. Simplicity is good in backup solutions, because complicated > > things tend to break more easily. > > Not sure what communication is necessary here..? The data needed for the backup > label file comes from pg_stop_backup in a non-exclusive backup. Right, and pg_stop_backup has to be run from the "pre-backup" script. Yours, Laurenz Albe
On Sun, 2018-11-25 at 22:01 +0100, Magnus Hagander wrote: [about managing backups from pre- and post-file-system-backup scrips] > I agree with your point that it's not an uncommon thing to need. If a good solution > for it can be proposed that requires the exclusive backup interface, then I wouldn't > be against un-deprecating that. But that's going to require a lot more than just a > documentation change, IMHO. What could perhaps be handled with a documentation change, > however, is to document a good way for this type of setup to use the new interfaces. Good point, and it puts the ball in my court :^) > > I'm arguing on behalf of users that run a few databases, want a simple backup > > solution and are ready to deal with the shortcomings. > > Those that want a simple backup solution have one -- pg_basebackup. > > The exclusive backup API is *not* simple. It is convenient, but it is not simple. > > Actually having a simple API that worked with the pre/post backup scripts, that > would be an improvement that we should definitely want! Right; unfortunately it is not simple to come up with one that doesn't exhibit the problems of the existing exclusive backup. Perhaps it's theoretically impossible. The goal is to disambiguate what a file system backup sees in backup mode and what the startup process sees after a crash in backup mode, and I can't see how that could be done. Yours, Laurenz Albe
On Sun, 2018-11-25 at 22:01 +0100, Magnus Hagander wrote:
[about managing backups from pre- and post-file-system-backup scrips]
> I agree with your point that it's not an uncommon thing to need. If a good solution
> for it can be proposed that requires the exclusive backup interface, then I wouldn't
> be against un-deprecating that. But that's going to require a lot more than just a
> documentation change, IMHO. What could perhaps be handled with a documentation change,
> however, is to document a good way for this type of setup to use the new interfaces.
Good point, and it puts the ball in my court :^)
> > I'm arguing on behalf of users that run a few databases, want a simple backup
> > solution and are ready to deal with the shortcomings.
>
> Those that want a simple backup solution have one -- pg_basebackup.
>
> The exclusive backup API is *not* simple. It is convenient, but it is not simple.
>
> Actually having a simple API that worked with the pre/post backup scripts, that
> would be an improvement that we should definitely want!
Right; unfortunately it is not simple to come up with one that doesn't exhibit
the problems of the existing exclusive backup.
Perhaps it's theoretically impossible. The goal is to disambiguate what a file
system backup sees in backup mode and what the startup process sees after a crash
in backup mode, and I can't see how that could be done.
On 11/26/18 12:31 AM, Laurenz Albe wrote: > On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote: >>>> There isn’t any need to write the backup label before you restore the database, >>>> just as you write recovery.conf then. >>> >>> Granted. >>> But it is pretty convenient, and writing it to the data directory right away >>> is a good thing on top, because it reduces the danger of inadvertedly >>> starting the backup without recovery. >> >> Writing it into the data directory is *not* a good thing because as soon as you do >> that you risk there being an issue if there’s a crash. Writing into the backup >> isn’t a bad idea but if you’re managing your backups then writing it somewhere else >> (such as where you write your WAL) and associating it with the backup (presumably >> it has a label) should make it easy to pull back when you restore. > > If there is a crash during the backup procedure, the backup is bad. > Doesn't matter during which part of the backup procedure it happens. Yes, but in this case with exclusive backups your cluster also will not start. That's a lot different than a bad backup because you should have many backups and a way to mark when they are complete. If your cluster requires manual intervention to start because a backup was in progress that's not a good thing. I'm of the opinion that we should remove the exclusive option for PG12. The recovery changes recently committed will completely break any automated recovery written for PG <= 11 so we might as go all the way. Requiring non-exclusive backup will be an easy change for tools that already understand non-exclusive backup, which has been available in the last three versions of Postgres. The exclusive backup method tries to make a hard problem look simple but does not succeed and causes issues besides. Performing backups properly *is* hard and we don't do anyone a service by making it look like something can be done with a trivial bash script. Regards, -- -David david@pgmasters.net
Greetings, * David Steele (david@pgmasters.net) wrote: > On 11/26/18 12:31 AM, Laurenz Albe wrote: > >If there is a crash during the backup procedure, the backup is bad. > >Doesn't matter during which part of the backup procedure it happens. > > Yes, but in this case with exclusive backups your cluster also will not > start. That's a lot different than a bad backup because you should have > many backups and a way to mark when they are complete. If your cluster > requires manual intervention to start because a backup was in progress > that's not a good thing. > > I'm of the opinion that we should remove the exclusive option for PG12. The > recovery changes recently committed will completely break any automated > recovery written for PG <= 11 so we might as go all the way. Yeah, after chatting with David about that this morning, I agree. Breaking everything around how restores works means that anyone doing backups today needs to re-evaluate their restore procedures, even if it's not an automated script, so it makes sense to also remove the deprecated exclusive backup mode at the same time. > Requiring non-exclusive backup will be an easy change for tools that already > understand non-exclusive backup, which has been available in the last three > versions of Postgres. > > The exclusive backup method tries to make a hard problem look simple but > does not succeed and causes issues besides. Performing backups properly > *is* hard and we don't do anyone a service by making it look like something > can be done with a trivial bash script. +1 Thanks! Stephen
Attachment
Greetings, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote: > > > Yes, you can come up with a post-backup script that somehow communicates > > > with your pre-backup script to get the information, but it sure is > > > inconvenient. Simplicity is good in backup solutions, because complicated > > > things tend to break more easily. > > > > Not sure what communication is necessary here..? The data needed for the backup > > label file comes from pg_stop_backup in a non-exclusive backup. > > Right, and pg_stop_backup has to be run from the "pre-backup" script. No, pg_stop_backup has has to be run by a process that's been connected since the pre-backup script ran, but you could certainly have some process that starts from pre-backup, detaches, and then waits for a signal from the post-backup script to run pg_stop_backup and record the results somewhere. Yes, that's rather annoying to do from a bash script, but I'm well past the point of thinking that bash scripts are a good way to run backups in PG. Your efforts in this area might be better put to use by writing a good tool which works with the non-exclusive API but can be called from the pre/post backup scripts your backup solution provides. Unfortunately, it seems pretty unlikely that there's a good generic way to write such a tool or we could include it in PG. Thanks! Stephen
Attachment
On Mon, 2018-11-26 at 10:18 +0100, Magnus Hagander wrote: > > [about managing backups from pre- and post-file-system-backup scrips] > > > I agree with your point that it's not an uncommon thing to need. If a good solution > > > for it can be proposed that requires the exclusive backup interface, then I wouldn't > > > be against un-deprecating that. But that's going to require a lot more than just a > > > documentation change, IMHO. What could perhaps be handled with a documentation change, > > > however, is to document a good way for this type of setup to use the new interfaces. > > > > Good point, and it puts the ball in my court :^) > > Enjoy :) I have come up with some sample code here: https://github.com/cybertec-postgresql/safe-backup This just uses bash and psql. Does that look reasonably safe? It's probably too big to be introduced into the documentation, but maybe we could add it to the Wiki. Yours, Laurenz Albe
On 12/11/18 5:33 PM, Laurenz Albe wrote: > On Mon, 2018-11-26 at 10:18 +0100, Magnus Hagander wrote: >>> [about managing backups from pre- and post-file-system-backup scrips] >>>> I agree with your point that it's not an uncommon thing to need. If a good solution >>>> for it can be proposed that requires the exclusive backup interface, then I wouldn't >>>> be against un-deprecating that. But that's going to require a lot more than just a >>>> documentation change, IMHO. What could perhaps be handled with a documentation change, >>>> however, is to document a good way for this type of setup to use the new interfaces. >>> >>> Good point, and it puts the ball in my court :^) >> >> Enjoy :) > > I have come up with some sample code here: > https://github.com/cybertec-postgresql/safe-backup > > This just uses bash and psql. > Does that look reasonably safe? > > It's probably too big to be introduced into the documentation, but maybe > we could add it to the Wiki. My bash-fu is not all that great, but it seems to me you could do this all in one script and forgo the table entirely. Just do the file copy at about line 60 in pgpre.sh. It does look workable to me, just wonder if it could be simpler. -- -David david@pgmasters.net
On Tue, 2018-12-11 at 23:43 -0500, David Steele wrote: > > > > [about managing backups from pre- and post-file-system-backup scrips] > > I have come up with some sample code here: > > https://github.com/cybertec-postgresql/safe-backup > > > > This just uses bash and psql. > > Does that look reasonably safe? > > > > It's probably too big to be introduced into the documentation, but maybe > > we could add it to the Wiki. > > My bash-fu is not all that great, but it seems to me you could do this > all in one script and forgo the table entirely. Just do the file copy > at about line 60 in pgpre.sh. > > It does look workable to me, just wonder if it could be simpler. Thanks for having a look. Yes, it looks appallingly complicated. Sure, if I knew there was a file to write, and I knew where to write it, I could do it in the "pre" script. But since I have no idea how the actual backup is performed and how the "backup_label" file is going to be saved, I thought it best to return the information to the caller and persist it somewhere, and only the "post" script can actually return the information. Yours, Laurenz Albe
On 12/12/18 12:58 AM, Laurenz Albe wrote: > On Tue, 2018-12-11 at 23:43 -0500, David Steele wrote: >>>>> [about managing backups from pre- and post-file-system-backup scrips] >>> I have come up with some sample code here: >>> https://github.com/cybertec-postgresql/safe-backup >>> >>> This just uses bash and psql. >>> Does that look reasonably safe? >>> >>> It's probably too big to be introduced into the documentation, but maybe >>> we could add it to the Wiki. >> >> My bash-fu is not all that great, but it seems to me you could do this >> all in one script and forgo the table entirely. Just do the file copy >> at about line 60 in pgpre.sh. >> >> It does look workable to me, just wonder if it could be simpler. > > Thanks for having a look. Yes, it looks appallingly complicated. > > Sure, if I knew there was a file to write, and I knew where to write > it, I could do it in the "pre" script. Good point! > But since I have no idea how the > actual backup is performed and how the "backup_label" file is going to > be saved, I thought it best to return the information to the caller and > persist it somewhere, and only the "post" script can actually return the > information. Perhaps it could be done in a bash file that is included into the user's script, rather than doing pre/post. So, they would have start and stop backup methods they could call that would manage the psql process in the background. In that case it should be possible to eliminate the table. I do see the table as a weak spot. It appears to work, but does add a lot of complexity. Limiting to one backup at a time can be a feature, though. In pgBackRest we use advisory locks to enforce this. -- -David david@pgmasters.net