Thread: Updated backup APIs for non-exclusive backups

Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
Per discussionat the developer meeting in Brussels, here's a patch that makes some updates to the backup APIs, to support non-exclusive backups without using pg_basebackup. The idea is to fix at least three main issues that are there today -- that you cannot run concurrent backups, that the backup_label file is created in the data directory which makes it impossible to distinguish between a cluster restored from backup and one that crashed while a backup was running, and a cluster can get "stuck" in backup mode if the backup script/software crashes.

To make this work, this patch:

* Introduces a new argument for pg_start_backup(), for "exclusive". It defaults to true, which means that just calling pg_start_backup() like before will behave exactly like before. If it's called with exclusive='f', a non-exclusive backup is started, and the backup label is collected in memory instead of in the backup_label file.

* If the client disconnects with a non-exclusive backup running, the backup is automatically aborted. This is the same thing that pg_basebackup does. To use these non-exclusive backups the backup software will have to maintain a persistent connection to the database -- something that should not be a problem for any of the modern ones out there (but would've been slightly trickier back in the days when we suggested shellscripts)

* A new version of pg_stop_backup is created, taking an argument specifying if it's exclusive or not. The current version of pg_stop_backup() continues to work for exclusive backups, with no changes to behavior. The new pg_stop_backup will return a record of three columns instead of just the value -- the LSN (pglsn), the backup label file (text) and the tablespace map file (text). If used to cancel an exclusive backup, backup label file and tablespace map will be NULL. At this point it's up to the backup software to write down the files in the location of the backup.


Attached patch currently doesn't include full documentation, since Bruce was going to restructure that section of the docs (also based on the devmeeting). I will write up the docs once that is done (I assume it will be soon enough, or I'll go do it regardless), but I wanted to get some review in on the code while waiting.
Attachment

Re: Updated backup APIs for non-exclusive backups

From
Robert Haas
Date:
On Wed, Feb 10, 2016 at 7:46 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. The idea is to fix at least three main issues
> that are there today -- that you cannot run concurrent backups, that the
> backup_label file is created in the data directory which makes it impossible
> to distinguish between a cluster restored from backup and one that crashed
> while a backup was running, and a cluster can get "stuck" in backup mode if
> the backup script/software crashes.
>
> To make this work, this patch:
>
> * Introduces a new argument for pg_start_backup(), for "exclusive". It
> defaults to true, which means that just calling pg_start_backup() like
> before will behave exactly like before. If it's called with exclusive='f', a
> non-exclusive backup is started, and the backup label is collected in memory
> instead of in the backup_label file.
>
> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does. To
> use these non-exclusive backups the backup software will have to maintain a
> persistent connection to the database -- something that should not be a
> problem for any of the modern ones out there (but would've been slightly
> trickier back in the days when we suggested shellscripts)
>
> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text). If used to cancel an exclusive backup, backup label file
> and tablespace map will be NULL. At this point it's up to the backup
> software to write down the files in the location of the backup.
>
>
> Attached patch currently doesn't include full documentation, since Bruce was
> going to restructure that section of the docs (also based on the
> devmeeting). I will write up the docs once that is done (I assume it will be
> soon enough, or I'll go do it regardless), but I wanted to get some review
> in on the code while waiting.

Wow, this is a great idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Updated backup APIs for non-exclusive backups

From
Andres Freund
Date:
Hi,

On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup.

Thanks for following through with this!

> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does.
> To use these non-exclusive backups the backup software will have to
> maintain a persistent connection to the database -- something that should
> not be a problem for any of the modern ones out there (but would've been
> slightly trickier back in the days when we suggested shellscripts)

I think we might want to make this one optional, but I'm not going to
fight super hard for it.

> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text).

I wonder if we shouldn't change the API a bit more aggressively. Right
now we return the label and the map - but what if there's more files at
some later point? One way would be to make it a SETOF function returning
'filename', 'content' or such.  Makes it less clear what to do with the
lsn admittedly.

Regards,

Andres



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> Per discussionat the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup.

Thanks for following through with this!

> * If the client disconnects with a non-exclusive backup running, the backup
> is automatically aborted. This is the same thing that pg_basebackup does.
> To use these non-exclusive backups the backup software will have to
> maintain a persistent connection to the database -- something that should
> not be a problem for any of the modern ones out there (but would've been
> slightly trickier back in the days when we suggested shellscripts)

I think we might want to make this one optional, but I'm not going to
fight super hard for it.

Not sure what you're referring to here. Do you mean being able to make a non-exclusive backup while not maintaining a connection? That's going to make things a *lot* more complicated, as we'd have to invent something like "backup slots" similar to what we're doing with replication slots. I doubt it's worth all that work and complexity.
 

> * A new version of pg_stop_backup is created, taking an argument specifying
> if it's exclusive or not. The current version of pg_stop_backup() continues
> to work for exclusive backups, with no changes to behavior. The new
> pg_stop_backup will return a record of three columns instead of just the
> value -- the LSN (pglsn), the backup label file (text) and the tablespace
> map file (text).

I wonder if we shouldn't change the API a bit more aggressively. Right
now we return the label and the map - but what if there's more files at
some later point? One way would be to make it a SETOF function returning
'filename', 'content' or such.  Makes it less clear what to do with the
lsn admittedly.

*Adding* more columns to the API shouldn't be a problem in the future. If there's another file, we can return a fourth column. A backup program is going to have to know about those things anyway and shouldn't just blindly write those files to the backup, IMO.
 

--

Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
* Andres Freund (andres@anarazel.de) wrote:
> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> > * If the client disconnects with a non-exclusive backup running, the backup
> > is automatically aborted. This is the same thing that pg_basebackup does.
> > To use these non-exclusive backups the backup software will have to
> > maintain a persistent connection to the database -- something that should
> > not be a problem for any of the modern ones out there (but would've been
> > slightly trickier back in the days when we suggested shellscripts)
>
> I think we might want to make this one optional, but I'm not going to
> fight super hard for it.

I was thinking along the same lines.

> > * A new version of pg_stop_backup is created, taking an argument specifying
> > if it's exclusive or not. The current version of pg_stop_backup() continues
> > to work for exclusive backups, with no changes to behavior. The new
> > pg_stop_backup will return a record of three columns instead of just the
> > value -- the LSN (pglsn), the backup label file (text) and the tablespace
> > map file (text).
>
> I wonder if we shouldn't change the API a bit more aggressively. Right
> now we return the label and the map - but what if there's more files at
> some later point? One way would be to make it a SETOF function returning
> 'filename', 'content' or such.  Makes it less clear what to do with the
> lsn admittedly.

If we make the 'client disconnect means abort' optional then we'd also
need to modify the API of stop backup to specify which backup to stop,
I'd think.

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
* Magnus Hagander (magnus@hagander.net) wrote:
> On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
> > > Per discussionat the developer meeting in Brussels, here's a patch that
> > > makes some updates to the backup APIs, to support non-exclusive backups
> > > without using pg_basebackup.
> >
> > Thanks for following through with this!
> >
> > > * If the client disconnects with a non-exclusive backup running, the
> > backup
> > > is automatically aborted. This is the same thing that pg_basebackup does.
> > > To use these non-exclusive backups the backup software will have to
> > > maintain a persistent connection to the database -- something that should
> > > not be a problem for any of the modern ones out there (but would've been
> > > slightly trickier back in the days when we suggested shellscripts)
> >
> > I think we might want to make this one optional, but I'm not going to
> > fight super hard for it.
>
> Not sure what you're referring to here. Do you mean being able to make a
> non-exclusive backup while not maintaining a connection? That's going to
> make things a *lot* more complicated, as we'd have to invent something like
> "backup slots" similar to what we're doing with replication slots. I doubt
> it's worth all that work and complexity.

Hrmmm.  If that's the case then perhaps you're right.  I liked the
general idea of not having to maintain a TCP connection during the
entire backup (TCP connections can be annoyingly finicky in certain
environments...), but I'm not sure it's worth a lot of additional
complexity.

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 2/10/16 9:44 AM, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> On Wed, Feb 10, 2016 at 2:46 PM, Andres Freund <andres@anarazel.de> wrote:
>>> On 2016-02-10 13:46:05 +0100, Magnus Hagander wrote:
>>>> * If the client disconnects with a non-exclusive backup running, the
>>> backup
>>>> is automatically aborted. This is the same thing that pg_basebackup does.
>>>> To use these non-exclusive backups the backup software will have to
>>>> maintain a persistent connection to the database -- something that should
>>>> not be a problem for any of the modern ones out there (but would've been
>>>> slightly trickier back in the days when we suggested shellscripts)
>>>
>>> I think we might want to make this one optional, but I'm not going to
>>> fight super hard for it.
>>
>> Not sure what you're referring to here. Do you mean being able to make a
>> non-exclusive backup while not maintaining a connection? That's going to
>> make things a *lot* more complicated, as we'd have to invent something like
>> "backup slots" similar to what we're doing with replication slots. I doubt
>> it's worth all that work and complexity.

Agreed.

> Hrmmm.  If that's the case then perhaps you're right.  I liked the
> general idea of not having to maintain a TCP connection during the
> entire backup (TCP connections can be annoyingly finicky in certain
> environments...), but I'm not sure it's worth a lot of additional
> complexity.

Well, pgBackRest holds a connection to PostgreSQL through the entire
backup and will abort the backup if it is severed.  The connection is
always held locally, though, even if the master backup process is on a
different server.  I haven't gotten any reports of aborted backups due
to the connection failing.

--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
* David Steele (david@pgmasters.net) wrote:
> On 2/10/16 9:44 AM, Stephen Frost wrote:
> > Hrmmm.  If that's the case then perhaps you're right.  I liked the
> > general idea of not having to maintain a TCP connection during the
> > entire backup (TCP connections can be annoyingly finicky in certain
> > environments...), but I'm not sure it's worth a lot of additional
> > complexity.
>
> Well, pgBackRest holds a connection to PostgreSQL through the entire
> backup and will abort the backup if it is severed.  The connection is
> always held locally, though, even if the master backup process is on a
> different server.  I haven't gotten any reports of aborted backups due
> to the connection failing.

Yeah, I know, I had been thinking it might be nice to not do that at
some point in the future, but thinking on it further, we've already got
a "pick up where you left off" capability with pgbackrest, so it's
really not a huge deal if the backup fails and has to be restarted, and
this does remove the need (or at least deprecate) to use the "stop an
already-running backup if you find one" option that pgbackrest has.

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 2/10/16 7:46 AM, Magnus Hagander wrote:
> Per discussion at the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. <...>

This sounds like a great idea and I have signed up to review.

> * A new version of pg_stop_backup is created, taking an argument
> specifying if it's exclusive or not. The current version of
> pg_stop_backup() continues to work for exclusive backups, with no
> changes to behavior. The new pg_stop_backup will return a record of
> three columns instead of just the value -- the LSN (pglsn), the backup
> label file (text) and the tablespace map file (text). If used to cancel
> an exclusive backup, backup label file and tablespace map will be NULL.
> At this point it's up to the backup software to write down the files in
> the location of the backup.

It would be nice if this new pg_stop_backup() function also output the
exact time when the backup stopped.  Depending on how long
pg_stop_backup() waits for WAL to be archived a time-stamp recorded
after pg_stop_backup() returns can be pretty far off.

This information is handy for automating selection of a backup when
doing time-based PITR (or so the user can manually select).  For
exclusive backups it is possible to parse the .backup file to get this
information but non-exclusive backups do not output the .backup file.

I would be happy to see the time-stamp returned from the
pg_start_backup() function as well.  It's a bigger change, but once
pg_start_backup() returns multiple columns it will be easier to add more
columns in the future.

In fact, it might be better if backup_label and tablespace_map were
output by pg_start_backup().  This would give the backup software more
information to work with from the start.

Thanks!
--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Wed, Feb 10, 2016 at 4:38 PM, David Steele <david@pgmasters.net> wrote:
On 2/10/16 7:46 AM, Magnus Hagander wrote:
> Per discussion at the developer meeting in Brussels, here's a patch that
> makes some updates to the backup APIs, to support non-exclusive backups
> without using pg_basebackup. <...>

This sounds like a great idea and I have signed up to review.

> * A new version of pg_stop_backup is created, taking an argument
> specifying if it's exclusive or not. The current version of
> pg_stop_backup() continues to work for exclusive backups, with no
> changes to behavior. The new pg_stop_backup will return a record of
> three columns instead of just the value -- the LSN (pglsn), the backup
> label file (text) and the tablespace map file (text). If used to cancel
> an exclusive backup, backup label file and tablespace map will be NULL.
> At this point it's up to the backup software to write down the files in
> the location of the backup.

It would be nice if this new pg_stop_backup() function also output the
exact time when the backup stopped.  Depending on how long
pg_stop_backup() waits for WAL to be archived a time-stamp recorded
after pg_stop_backup() returns can be pretty far off.

This information is handy for automating selection of a backup when
doing time-based PITR (or so the user can manually select).  For
exclusive backups it is possible to parse the .backup file to get this
information but non-exclusive backups do not output the .backup file.

The non-exclusive backups *do* output the backup_label file, it just shows up as a text field instead of as a separate file. You're supposed to write that data to a file in the backup program.

 
I would be happy to see the time-stamp returned from the
pg_start_backup() function as well.  It's a bigger change, but once
pg_start_backup() returns multiple columns it will be easier to add more
columns in the future.

In fact, it might be better if backup_label and tablespace_map were
output by pg_start_backup().  This would give the backup software more
information to work with from the start.

I was trying to figure out why that's a bad idea, but I can't really say why :)

For pg_basebackup backups, I think the reason we put it at the end is simply that we don't want to write it into the backup directory until the rest of the backup is complete, since it's not going to be useful before then. But that requirement can certainly be put on the backup client instead of the server. With the newer API it's going to have to keep those things around anyway.

That would then change the function syntax for pg_start_backup() but instead make pg_stop_backup() now behave the same as it did before.

Would anybody else like to comment on if that's a good idea or if there are any holes in it? :)
 
--

Re: Updated backup APIs for non-exclusive backups

From
Andres Freund
Date:
On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote:
> > I would be happy to see the time-stamp returned from the
> > pg_start_backup() function as well.  It's a bigger change, but once
> > pg_start_backup() returns multiple columns it will be easier to add more
> > columns in the future.
> >
> > In fact, it might be better if backup_label and tablespace_map were
> > output by pg_start_backup().  This would give the backup software more
> > information to work with from the start.
> >
> 
> I was trying to figure out why that's a bad idea, but I can't really say
> why :)
> 
> For pg_basebackup backups, I think the reason we put it at the end is
> simply that we don't want to write it into the backup directory until the
> rest of the backup is complete, since it's not going to be useful before
> then. But that requirement can certainly be put on the backup client
> instead of the server. With the newer API it's going to have to keep those
> things around anyway.
> 
> That would then change the function syntax for pg_start_backup() but
> instead make pg_stop_backup() now behave the same as it did before.
> 
> Would anybody else like to comment on if that's a good idea or if there are
> any holes in it? :)

I don't think that's a good idea. It makes it impossible to add
information to labels about the minimum recovery point and
such. Currently there's some rather fragile logic to discover that, but
I'd really like to get rid of that at some point.

Regards,

Andres



Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 2/10/16 11:01 AM, Andres Freund wrote:
> On 2016-02-10 16:50:26 +0100, Magnus Hagander wrote:
>>> I would be happy to see the time-stamp returned from the
>>> pg_start_backup() function as well.  It's a bigger change, but once
>>> pg_start_backup() returns multiple columns it will be easier to add more
>>> columns in the future.
>>>
>>> In fact, it might be better if backup_label and tablespace_map were
>>> output by pg_start_backup().  This would give the backup software more
>>> information to work with from the start.
>>
>> I was trying to figure out why that's a bad idea, but I can't really say
>> why :)
>>
>> For pg_basebackup backups, I think the reason we put it at the end is
>> simply that we don't want to write it into the backup directory until the
>> rest of the backup is complete, since it's not going to be useful before
>> then. But that requirement can certainly be put on the backup client
>> instead of the server. With the newer API it's going to have to keep those
>> things around anyway.
>>
>> That would then change the function syntax for pg_start_backup() but
>> instead make pg_stop_backup() now behave the same as it did before.
>>
>> Would anybody else like to comment on if that's a good idea or if there are
>> any holes in it? :)
>
> I don't think that's a good idea. It makes it impossible to add
> information to labels about the minimum recovery point and
> such. Currently there's some rather fragile logic to discover that, but
> I'd really like to get rid of that at some point.

That makes sense.  The backup_label "as is" could be output at the
beginning but if we want to add the minimum recovery point it would need
to be output at the end.

It seems like tablespace_map could still be output at the beginning, though.

--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 2/10/16 10:50 AM, Magnus Hagander wrote:
> On Wed, Feb 10, 2016 at 4:38 PM, David Steele <david@pgmasters.net
>
>     This information is handy for automating selection of a backup when
>     doing time-based PITR (or so the user can manually select).  For
>     exclusive backups it is possible to parse the .backup file to get this
>     information but non-exclusive backups do not output the .backup file.
>
>
> The non-exclusive backups *do* output the backup_label file, it just
> shows up as a text field instead of as a separate file. You're supposed
> to write that data to a file in the backup program.

I meant the .backup file (e.g. 000000010000008C0000001A.00000028.backup)
that is archived along with the WAL for an exlcusive backup.  I believe
this is currently the only place to get the stop time (without reading
the WAL segments).

--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Andres Freund
Date:
On 2016-02-10 11:06:01 -0500, David Steele wrote:
> That makes sense.  The backup_label "as is" could be output at the
> beginning but if we want to add the minimum recovery point it would need
> to be output at the end.
> 
> It seems like tablespace_map could still be output at the beginning, though.

I don't really see enough benefit to go that way. What are you thinking
of using the information for ("This would give the backup software more
information to work with from the start.")?

Andres



Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 2/10/16 11:12 AM, Andres Freund wrote:
> On 2016-02-10 11:06:01 -0500, David Steele wrote:
>> That makes sense.  The backup_label "as is" could be output at the
>> beginning but if we want to add the minimum recovery point it would need
>> to be output at the end.
>>
>> It seems like tablespace_map could still be output at the beginning, though.
>
> I don't really see enough benefit to go that way. What are you thinking
> of using the information for ("This would give the backup software more
> information to work with from the start.")?

Currently backup software that handles tablespaces needs to read the
pg_tblspc directory to build an oid/path mapping in order to know which
tablespace directories to copy.  Since Postgres is already doing this
when it creates tablespace_map it seems like it's redundant for the
backup software to do it again.

Since tablespace_map is created in do_pg_start_backup() and I don't see
how it could be updated after that, I think it is logical to output it
from pg_start_backup().  I don't feel strongly about it, though.

--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Marco Nenciarini
Date:
Hi Magnus,

thanks for working on this topic.
What it does is very similar to what Barman's pgespresso extension does and I'd like to see it implemented soon in the
core.

I've added myself as reviewer for the patch on commitfest site.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Updated backup APIs for non-exclusive backups

From
Marco Nenciarini
Date:
Hi Magnus,

I've finally found some time to take a look to the patch.

It applies with some fuzziness on master, but the result looks correct.
Unfortunately the OID of the new pg_stop_backup function conflicts with
"pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).

After changing it the patch does not compile:

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -O0 -g -fno-omit-frame-pointer
-I../../../../src/include

-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2
-I/usr/local/include -I/usr/local/opt/openssl/include -c -o xlog.o
xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:10000:19: error: use of undeclared identifier 'tblspc_mapfbuf';
did you mean 'tblspcmapfile'?
initStringInfo(&tblspc_mapfbuf);
^~~~~~~~~~~~~~
tblspcmapfile
xlog.c:9790:19: note: 'tblspcmapfile' declared here
StringInfo tblspcmapfile, bool infotbssize,
^
xlog.c:10073:22: error: use of undeclared identifier 'tblspc_mapfbuf';
did you mean 'tblspcmapfile'?
appendStringInfo(&tblspc_mapfbuf, "%s %s\n", ti->oid, ti->path);
^~~~~~~~~~~~~~
tblspcmapfile
xlog.c:9790:19: note: 'tblspcmapfile' declared here
StringInfo tblspcmapfile, bool infotbssize,
^
xlog.c:10092:19: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
initStringInfo(&labelfbuf);
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10099:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "START WAL LOCATION: %X/%X (file %s)\n",
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10101:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "CHECKPOINT LOCATION: %X/%X\n",
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10103:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "BACKUP METHOD: %s\n",
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10105:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "BACKUP FROM: %s\n",
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10107:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "START TIME: %s\n", strfbuf);
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10108:21: error: use of undeclared identifier 'labelfbuf'; did
you mean 'labelfile'?
appendStringInfo(&labelfbuf, "LABEL: %s\n", backupidstr);
^~~~~~~~~
labelfile
xlog.c:9789:19: note: 'labelfile' declared here
StringInfo labelfile, DIR *tblspcdir, List **tablespaces,
^
xlog.c:10142:15: error: use of undeclared identifier 'labelfbuf'
if (fwrite(labelfbuf.data, labelfbuf.len, 1, fp) != 1 ||
^
xlog.c:10142:31: error: use of undeclared identifier 'labelfbuf'
if (fwrite(labelfbuf.data, labelfbuf.len, 1, fp) != 1 ||
^
xlog.c:10151:10: error: use of undeclared identifier 'labelfbuf'
pfree(labelfbuf.data);
^
xlog.c:10154:8: error: use of undeclared identifier 'tblspc_mapfbuf'
if (tblspc_mapfbuf.len > 0)
^
xlog.c:10178:16: error: use of undeclared identifier 'tblspc_mapfbuf'
if (fwrite(tblspc_mapfbuf.data, tblspc_mapfbuf.len, 1, fp) != 1 ||
^
xlog.c:10178:37: error: use of undeclared identifier 'tblspc_mapfbuf'
if (fwrite(tblspc_mapfbuf.data, tblspc_mapfbuf.len, 1, fp) != 1 ||
^
xlog.c:10189:10: error: use of undeclared identifier 'tblspc_mapfbuf'
pfree(tblspc_mapfbuf.data);
^
xlog.c:10193:17: error: use of undeclared identifier 'labelfbuf'
*labelfile = labelfbuf.data;
^
xlog.c:10194:8: error: use of undeclared identifier 'tblspc_mapfbuf'
if (tblspc_mapfbuf.len > 0)
^
xlog.c:10195:22: error: use of undeclared identifier 'tblspc_mapfbuf'
*tblspcmapfile = tblspc_mapfbuf.data;
^
19 errors generated.
make[4]: *** [xlog.o] Error 1
make[3]: *** [transam-recursive] Error 2
make[2]: *** [access-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

I've searched in past commits for any recent change that involves the
affected lines, but I have not found any.
Maybe some changes are missing?

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
Hi Magnus,

On 3/2/16 12:49 PM, Marco Nenciarini wrote:

> I've finally found some time to take a look to the patch.
> 
> It applies with some fuzziness on master, but the result looks correct.
> Unfortunately the OID of the new pg_stop_backup function conflicts with
> "pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).
>
> After changing it the patch does not compile:

It's been two weeks since Marco found these issues in the patch.  Please
provide an updated patch soon or I will mark this "returned with feedback".

Thanks,
-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Wed, Mar 16, 2016 at 5:06 PM, David Steele <david@pgmasters.net> wrote:
Hi Magnus,

On 3/2/16 12:49 PM, Marco Nenciarini wrote:

> I've finally found some time to take a look to the patch.
>
> It applies with some fuzziness on master, but the result looks correct.
> Unfortunately the OID of the new pg_stop_backup function conflicts with
> "pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).
>
> After changing it the patch does not compile:

It's been two weeks since Marco found these issues in the patch.  Please
provide an updated patch soon or I will mark this "returned with feedback".


Hi!

Thanks for the note - I had missed Marco's response completely!

I'll take a look at it once Nordic PGDay is done! 


--

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote:
Hi Magnus,

Hi!

First, again my apologies for completely missing that you had posted this review!

 
I've finally found some time to take a look to the patch.

It applies with some fuzziness on master, but the result looks correct.
Unfortunately the OID of the new pg_stop_backup function conflicts with
"pg_blocking_pids()" patch (52f5d578d6c29bf254e93c69043b817d4047ca67).

Fixed, thanks!
 


After changing it the patch does not compile:

It compiles fine for me, and with no warnings.

 

gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
-Wformat-security -fno-strict-aliasing -fwrapv
-Wno-unused-command-line-argument -g -O0 -g -fno-omit-frame-pointer
-I../../../../src/include
-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/usr/include/libxml2
-I/usr/local/include -I/usr/local/opt/openssl/include -c -o xlog.o
xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:10000:19: error: use of undeclared identifier 'tblspc_mapfbuf';

Eh. There is no presence of "tblspc_mapfbuf" after the patch. I think it looks like the "applies with fuzziness" actually wasn't correct, and you ended up with bad code with a mix of the old and the new code in it.

I've attached an updated patch, which is rebased on current master and includes the oid fix.


--

Attachment

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
Hi Magnus,

On 3/19/16 8:15 AM, Magnus Hagander wrote:

> I've attached an updated patch, which is rebased on current master and
> includes the oid fix.

Before doing a thorough review of this patch there are a few points I
would like to consider:

* I think it's really important to provide the stop time in some fashion
when using this new technique.  I would prefer a new column to be
returned from pg_stop_backup() but I could live with STOP TIME being
recorded in the label file.  STOP TIME should probably be included in
the label file anyway.

* It seems like STOP WAL LOCATION should now also be recorded in the
label file.  Preferably this would used by recovery to determine when
the database has reach consistency but that could be a future patch.
I'm not very happy with the current method of using pg_control to get
this information as it assumes that pg_control is copied last (at least
based on the code comments).

Thanks,
-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Tue, Mar 22, 2016 at 5:08 PM, David Steele <david@pgmasters.net> wrote:

On 3/19/16 8:15 AM, Magnus Hagander wrote:

> I've attached an updated patch, which is rebased on current master and
> includes the oid fix.

Before doing a thorough review of this patch there are a few points I
would like to consider:

* I think it's really important to provide the stop time in some fashion
when using this new technique.  I would prefer a new column to be
returned from pg_stop_backup() but I could live with STOP TIME being
recorded in the label file.  STOP TIME should probably be included in
the label file anyway.

Adding the stop time column should be a simple addition and I don't see a problem with that. I think I misunderstood your original request on that. Because you are just talking about returning a timestamptz with the "right now" value for when you called pg_stop_backup()? Or to be specific, just before pg_Stop_backup *finished*. Or do you mean when pg_stop_backup() started?

Doing it in the backup label file is obviously a different target, where we might need to consider backwards compatibility, Should we?

 
* It seems like STOP WAL LOCATION should now also be recorded in the
label file.  Preferably this would used by recovery to determine when
the database has reach consistency but that could be a future patch.
I'm not very happy with the current method of using pg_control to get
this information as it assumes that pg_control is copied last (at least
based on the code comments).

That seems entirely out of scope for this patch, though. Doesn't mean it shouldn't be done, but that's a separate thing.

-- 

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 3/22/16 12:14 PM, Magnus Hagander wrote:
> On Tue, Mar 22, 2016 at 5:08 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
> 
>     On 3/19/16 8:15 AM, Magnus Hagander wrote:
> 
>     > I've attached an updated patch, which is rebased on current master and
>     > includes the oid fix.
> 
>     Before doing a thorough review of this patch there are a few points I
>     would like to consider:
> 
>     * I think it's really important to provide the stop time in some fashion
>     when using this new technique.  I would prefer a new column to be
>     returned from pg_stop_backup() but I could live with STOP TIME being
>     recorded in the label file.  STOP TIME should probably be included in
>     the label file anyway.
> 
> Adding the stop time column should be a simple addition and I don't see
> a problem with that. I think I misunderstood your original request on
> that. Because you are just talking about returning a timestamptz with
> the "right now" value for when you called pg_stop_backup()? Or to be
> specific, just before pg_Stop_backup *finished*. Or do you mean when
> pg_stop_backup() started?

What would be ideal is the minimum time that could be used for PITR.  In
an exclusive backup that's the time the end-of-backup record is written
to WAL.  In a non-exlusive backup I'm not quite sure how that works.

> Doing it in the backup label file is obviously a different target, where
> we might need to consider backwards compatibility, Should we?

Physical backups can only be restored in the same version so I'm not
sure why it would be a problem?  Do you mean for programs outside of
Postgres that are parsing this file?

>     * It seems like STOP WAL LOCATION should now also be recorded in the
>     label file.  Preferably this would used by recovery to determine when
>     the database has reach consistency but that could be a future patch.
>     I'm not very happy with the current method of using pg_control to get
>     this information as it assumes that pg_control is copied last (at least
>     based on the code comments).
> 
> That seems entirely out of scope for this patch, though. Doesn't mean it
> shouldn't be done, but that's a separate thing.

Fair enough.

-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net> wrote:
On 3/22/16 12:14 PM, Magnus Hagander wrote:
> On Tue, Mar 22, 2016 at 5:08 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     On 3/19/16 8:15 AM, Magnus Hagander wrote:
>
>     > I've attached an updated patch, which is rebased on current master and
>     > includes the oid fix.
>
>     Before doing a thorough review of this patch there are a few points I
>     would like to consider:
>
>     * I think it's really important to provide the stop time in some fashion
>     when using this new technique.  I would prefer a new column to be
>     returned from pg_stop_backup() but I could live with STOP TIME being
>     recorded in the label file.  STOP TIME should probably be included in
>     the label file anyway.
>
> Adding the stop time column should be a simple addition and I don't see
> a problem with that. I think I misunderstood your original request on
> that. Because you are just talking about returning a timestamptz with
> the "right now" value for when you called pg_stop_backup()? Or to be
> specific, just before pg_Stop_backup *finished*. Or do you mean when
> pg_stop_backup() started?

What would be ideal is the minimum time that could be used for PITR.  In
an exclusive backup that's the time the end-of-backup record is written
to WAL.  In a non-exlusive backup I'm not quite sure how that works.

Having an actual definition of that is kind of required before adding it :P
 
> Doing it in the backup label file is obviously a different target, where
> we might need to consider backwards compatibility, Should we?

Physical backups can only be restored in the same version so I'm not
sure why it would be a problem?  Do you mean for programs outside of
Postgres that are parsing this file?

Yes, I meant programs outside postgres.

--

Re: Updated backup APIs for non-exclusive backups

From
Craig Ringer
Date:
On 23 March 2016 at 00:14, Magnus Hagander <magnus@hagander.net> wrote:
 
Doing it in the backup label file is obviously a different target, where we might need to consider backwards compatibility, Should we?

As part of the failover slots a few folks at 2ndQ looked into whether tools would get upset by the addition of new label file entries. Everything we found used regexps to find what they wanted and didn't care in the slightest about new lines. So I wouldn't worry too much. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 3/22/16 12:31 PM, Magnus Hagander wrote:

> On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:>
>     > Adding the stop time column should be a simple addition and I don't see
>     > a problem with that. I think I misunderstood your original request on
>     > that. Because you are just talking about returning a timestamptz with
>     > the "right now" value for when you called pg_stop_backup()? Or to be
>     > specific, just before pg_Stop_backup *finished*. Or do you mean when
>     > pg_stop_backup() started?
>
>     What would be ideal is the minimum time that could be used for PITR.  In
>     an exclusive backup that's the time the end-of-backup record is written
>     to WAL.  In a non-exlusive backup I'm not quite sure how that works.

I guess I was hoping that you would know.  I fine with just getting the 
current timestamp as is currently done in do_pg_stop_backup().  It's not 
perfect but it will be pretty close.

I thought some more about putting STOP_WAL_LOCATION into the backup 
label and I think this is an important step.  Without that the recovery 
process needs to use pg_control to determine when the database has reach 
consistency and that will only work if pg_control was copied last.

In summary, I think the patch should be updated to include stop_time as 
a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label.  The 
recovery process should read STOP_WAL_LOCATION from backup_label and use 
that to decide when the database has become consistent.

Thanks,
-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Tue, Mar 29, 2016 at 4:36 PM, David Steele <david@pgmasters.net> wrote:
On 3/22/16 12:31 PM, Magnus Hagander wrote:

On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:
>
    > Adding the stop time column should be a simple addition and I don't see
    > a problem with that. I think I misunderstood your original request on
    > that. Because you are just talking about returning a timestamptz with
    > the "right now" value for when you called pg_stop_backup()? Or to be
    > specific, just before pg_Stop_backup *finished*. Or do you mean when
    > pg_stop_backup() started?

    What would be ideal is the minimum time that could be used for PITR.  In
    an exclusive backup that's the time the end-of-backup record is written
    to WAL.  In a non-exlusive backup I'm not quite sure how that works.

I guess I was hoping that you would know.  I fine with just getting the current timestamp as is currently done in do_pg_stop_backup().  It's not perfect but it will be pretty close.

I thought some more about putting STOP_WAL_LOCATION into the backup label and I think this is an important step.  Without that the recovery process needs to use pg_control to determine when the database has reach consistency and that will only work if pg_control was copied last.

In summary, I think the patch should be updated to include stop_time as a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label.  The recovery process should read STOP_WAL_LOCATION from backup_label and use that to decide when the database has become consistent.

Is that the only reason we need pg_control to be copied last? I thought there were other reasons for that.

I was chatting with Stephen about this earlier, and one thing we considered was that maybe we should return pg_control as a bytea field from pg_stop_backup() thereby strongly hinting to tools that they should write it out from there rather than copy it from the data directory (we can't stop them from copying it from the data directory like we do for backup_label of course, but if we return the contents and document that's how it should be done, it's pretty clear).

But if we can actually remove the whole requirement on the order of the copy of pg_control by doing that it certainly seems worthwhile.


So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...


--

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander <magnus@hagander.net> wrote:


On Tue, Mar 29, 2016 at 4:36 PM, David Steele <david@pgmasters.net> wrote:
On 3/22/16 12:31 PM, Magnus Hagander wrote:

On Tue, Mar 22, 2016 at 5:27 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:
>
    > Adding the stop time column should be a simple addition and I don't see
    > a problem with that. I think I misunderstood your original request on
    > that. Because you are just talking about returning a timestamptz with
    > the "right now" value for when you called pg_stop_backup()? Or to be
    > specific, just before pg_Stop_backup *finished*. Or do you mean when
    > pg_stop_backup() started?

    What would be ideal is the minimum time that could be used for PITR.  In
    an exclusive backup that's the time the end-of-backup record is written
    to WAL.  In a non-exlusive backup I'm not quite sure how that works.

I guess I was hoping that you would know.  I fine with just getting the current timestamp as is currently done in do_pg_stop_backup().  It's not perfect but it will be pretty close.

I thought some more about putting STOP_WAL_LOCATION into the backup label and I think this is an important step.  Without that the recovery process needs to use pg_control to determine when the database has reach consistency and that will only work if pg_control was copied last.

In summary, I think the patch should be updated to include stop_time as a column and add STOP_WAL_LOCATION and STOP_TIME to backup_label.  The recovery process should read STOP_WAL_LOCATION from backup_label and use that to decide when the database has become consistent.

Is that the only reason we need pg_control to be copied last? I thought there were other reasons for that.

I was chatting with Stephen about this earlier, and one thing we considered was that maybe we should return pg_control as a bytea field from pg_stop_backup() thereby strongly hinting to tools that they should write it out from there rather than copy it from the data directory (we can't stop them from copying it from the data directory like we do for backup_label of course, but if we return the contents and document that's how it should be done, it's pretty clear).

But if we can actually remove the whole requirement on the order of the copy of pg_control by doing that it certainly seems worthwhile.


So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...


I had a chat with Heikki, and here's another suggestion:

1. We don't touch the current exclusive backups at all, as previously discussed, other than deprecating their use. For backwards compat.

2. For new backups, we return the contents of pg_control as a bytea from pg_stop_backup(). We tell backup programs they are supposed to write this out as pg_control.backup, *not* as pg_control.

3a. On recovery, if it's an exlcusive backup, we do as we did before.

3b. on recovery, in non-exclusive backups (determined from backup_label), we check that pg_control.backup exists *and* that pg_control does *not* exist. That guards us reasonably against backup programs that do the wrong thing, and we know we get the correct version of pg_control.

4. (we can still add the stop location to the backup_label file in case backup programs find it useful, but we don't use it in recovery)

Thoughts about this approach?

--

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 3/29/16 2:09 PM, Magnus Hagander wrote:

> I had a chat with Heikki, and here's another suggestion:
> 
> 1. We don't touch the current exclusive backups at all, as previously
> discussed, other than deprecating their use. For backwards compat.
> 
> 2. For new backups, we return the contents of pg_control as a bytea from
> pg_stop_backup(). We tell backup programs they are supposed to write
> this out as pg_control.backup, *not* as pg_control.
> 
> 3a. On recovery, if it's an exclusive backup, we do as we did before.
> 
> 3b. on recovery, in non-exclusive backups (determined from
> backup_label), we check that pg_control.backup exists *and* that
> pg_control does *not* exist. That guards us reasonably against backup
> programs that do the wrong thing, and we know we get the correct version
> of pg_control.
> 
> 4. (we can still add the stop location to the backup_label file in case
> backup programs find it useful, but we don't use it in recovery)
> 
> Thoughts about this approach?

This certainly looks like it would work but it raises the barrier for
implementing backups by quite a lot.  It's fine for backrest or barman
but it won't be pleasant for anyone who has home-grown scripts.

-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Amit Kapila
Date:
On Tue, Mar 29, 2016 at 11:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...


I had a chat with Heikki, and here's another suggestion:

1. We don't touch the current exclusive backups at all, as previously discussed, other than deprecating their use. For backwards compat.

2. For new backups, we return the contents of pg_control as a bytea from pg_stop_backup(). We tell backup programs they are supposed to write this out as pg_control.backup, *not* as pg_control.

3a. On recovery, if it's an exlcusive backup, we do as we did before.

3b. on recovery, in non-exclusive backups (determined from backup_label), we check that pg_control.backup exists *and* that pg_control does *not* exist.

Currently pg_control has been read before backup_label file, so as per this proposal do you want to change that?  If yes, I think that will make this patch more invasive with respect to handling of failure modes.  Also as David points out, I also feel that it will raise the bar for usage of this API.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net> wrote:
On 3/29/16 2:09 PM, Magnus Hagander wrote:

> I had a chat with Heikki, and here's another suggestion:
>
> 1. We don't touch the current exclusive backups at all, as previously
> discussed, other than deprecating their use. For backwards compat.
>
> 2. For new backups, we return the contents of pg_control as a bytea from
> pg_stop_backup(). We tell backup programs they are supposed to write
> this out as pg_control.backup, *not* as pg_control.
>
> 3a. On recovery, if it's an exclusive backup, we do as we did before.
>
> 3b. on recovery, in non-exclusive backups (determined from
> backup_label), we check that pg_control.backup exists *and* that
> pg_control does *not* exist. That guards us reasonably against backup
> programs that do the wrong thing, and we know we get the correct version
> of pg_control.
>
> 4. (we can still add the stop location to the backup_label file in case
> backup programs find it useful, but we don't use it in recovery)
>
> Thoughts about this approach?

This certainly looks like it would work but it raises the barrier for
implementing backups by quite a lot.  It's fine for backrest or barman
but it won't be pleasant for anyone who has home-grown scripts.


How much does it really raise the bar, though?

It would go from "copy all files and make damn sure you copy pg_control last, and rename it to pg_control.backup" to "take this binary blob you got from the server and write it to pg_control.backup"?

Also, the target of these APIs is specifically the backup tools and not homewritten scripts. A simple shellscript will have trouble enough using it in the first place since it requires a persistent connection to the database. But those scripts are likely broken anyway.

You can of course keep the current requirements which is just "copy pg_control last", but if we do that we have zero way of checking that that happened, and you may end up with subtly broken restores if the backup software gets it wrong. (Of course it can get the rename/writeout thing wrong as well, but that's going to be a lot more obvious if you're doing it wrong).

The main reason for Heikki to suggest this one over the other basic one is that it brings protection against the "backup script/program crashed halfway through but the user still tried to restore from that".They will outright fail becuase there is no pg_control.backup in that case. If we don't care about that, then we can go back to just saying "copy pg_control last and we're done". But you yourself complained about that requirement because it's too easy to get wrong (though you advocated using backup_label to transfer the data over -- but that has the potential for getting more complicated if we now or at any point in the future want more than one field to transfer, for example).


--

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Wed, Mar 30, 2016 at 10:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Mar 29, 2016 at 11:39 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Mar 29, 2016 at 6:40 PM, Magnus Hagander <magnus@hagander.net> wrote:
So - I can definitely see the argument for returning the stop wal *location*. But I'm still not sure what the definition of the time would be? We can't return it before we know what it means...


I had a chat with Heikki, and here's another suggestion:

1. We don't touch the current exclusive backups at all, as previously discussed, other than deprecating their use. For backwards compat.

2. For new backups, we return the contents of pg_control as a bytea from pg_stop_backup(). We tell backup programs they are supposed to write this out as pg_control.backup, *not* as pg_control.

3a. On recovery, if it's an exlcusive backup, we do as we did before.

3b. on recovery, in non-exclusive backups (determined from backup_label), we check that pg_control.backup exists *and* that pg_control does *not* exist.

Currently pg_control has been read before backup_label file, so as per this proposal do you want to change that?  If yes, I think that will make this patch more invasive with respect to handling of failure modes.  Also as David points out, I also feel that it will raise the bar for usage of this API.

Yes, we'd have to change that. I don't think it's going to be much more invasive than reading part of it from pg_control and part of it from backup_label, as suggested by David. It would be a bit more complicated than what we have today - but it would move complication from user scripts (that are likely to get it wrong) to a central place in the backend (where we can be more certain that it's at least less wrong). 


--

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 3/30/16 4:18 AM, Magnus Hagander wrote:
> 
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
> 
>     This certainly looks like it would work but it raises the barrier for
>     implementing backups by quite a lot.  It's fine for backrest or barman
>     but it won't be pleasant for anyone who has home-grown scripts.
> 
> 
> How much does it really raise the bar, though?
> 
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
> 
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.

Then what would home-grown scripts use, the deprecated API that we know
has issues?

> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database. 

All that's required is to spawn a psql process.  I'm no shell expert but
that's simple enough.

> But those scripts are likely broken anyway.

Yes, probably.  Backup and especially archiving correctly are harder
than most people realize.

> <...>
> 
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.

But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last.  pgBackRest already
does that and Barman probably does, too.

I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.

pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.

> If we
> don't care about that, then we can go back to just saying "copy
> pg_control last and we're done". But you yourself complained about that
> requirement because it's too easy to get wrong (though you advocated
> using backup_label to transfer the data over -- but that has the
> potential for getting more complicated if we now or at any point in the
> future want more than one field to transfer, for example).

Perhaps.  I'm still not convinced that getting some information from
backup_label and other information from pg_control is a good solution.
I would rather write the recovery point into the backup_label and use
that instead of the value in pg_control.  Text files are much easier to
parse and push around accurately (and test).

-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:
On 3/30/16 4:18 AM, Magnus Hagander wrote:
>
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     This certainly looks like it would work but it raises the barrier for
>     implementing backups by quite a lot.  It's fine for backrest or barman
>     but it won't be pleasant for anyone who has home-grown scripts.
>
>
> How much does it really raise the bar, though?
>
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
>
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.

Then what would home-grown scripts use, the deprecated API that we know
has issues?

They should use either pg_basebackup/pg_receivexlog, or they should use a tool like backrest or barman.

 
> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database.

All that's required is to spawn a psql process.  I'm no shell expert but
that's simple enough.

Oh, it's certainly doable, but you also need to come back and talk to it at a later time (to call stop backup), and do something useful with a multiline output. None of that is particularly easy. Certainly doable, but it becomes the wrong tool for the job.

 
> But those scripts are likely broken anyway.

Yes, probably.  Backup and especially archiving correctly are harder
than most people realize.

Exactly. Which is why we should discourage people from doing that, and instead point them to the tools that do the job right. This is the whole reason we're making this change in the first place.

 
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.

But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last.  pgBackRest already
does that and Barman probably does, too.

It does (I did doublecheck that at some point).

 
I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.

pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.

So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?

Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).

We can always extend the function with more columns later if we need - it's changing the ones we have that's a big problem there :)

Then we start this way and we can keep improving if necessary. But the advantage of sticking to this for now is we don't have to touch the recovery side at all. And we're pretty late in the cycle ATM.

 
> If we
> don't care about that, then we can go back to just saying "copy
> pg_control last and we're done". But you yourself complained about that
> requirement because it's too easy to get wrong (though you advocated
> using backup_label to transfer the data over -- but that has the
> potential for getting more complicated if we now or at any point in the
> future want more than one field to transfer, for example).

Perhaps.  I'm still not convinced that getting some information from
backup_label and other information from pg_control is a good solution.
I would rather write the recovery point into the backup_label and use
that instead of the value in pg_control.  Text files are much easier to
parse and push around accurately (and test).

What about all the *other* functionality that's then in pg_control?

We could do as Heikki at one point suggested in our chat as well which is basically write all of pg_control out to the backup_label file, and reconstruct it from that. But that makes that file a lot more complicated...

//Magnus

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Thu, Mar 31, 2016 at 2:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:
On 3/30/16 4:18 AM, Magnus Hagander wrote:
>
> On Wed, Mar 30, 2016 at 4:10 AM, David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>
>     This certainly looks like it would work but it raises the barrier for
>     implementing backups by quite a lot.  It's fine for backrest or barman
>     but it won't be pleasant for anyone who has home-grown scripts.
>
>
> How much does it really raise the bar, though?
>
> It would go from "copy all files and make damn sure you copy pg_control
> last, and rename it to pg_control.backup" to "take this binary blob you
> got from the server and write it to pg_control.backup"?
>
> Also, the target of these APIs is specifically the backup tools and not
> homewritten scripts.

Then what would home-grown scripts use, the deprecated API that we know
has issues?

They should use either pg_basebackup/pg_receivexlog, or they should use a tool like backrest or barman.

 
> A simple shellscript will have trouble enough using
> it in the first place since it requires a persistent connection to the
> database.

All that's required is to spawn a psql process.  I'm no shell expert but
that's simple enough.

Oh, it's certainly doable, but you also need to come back and talk to it at a later time (to call stop backup), and do something useful with a multiline output. None of that is particularly easy. Certainly doable, but it becomes the wrong tool for the job.

 
> But those scripts are likely broken anyway.

Yes, probably.  Backup and especially archiving correctly are harder
than most people realize.

Exactly. Which is why we should discourage people from doing that, and instead point them to the tools that do the job right. This is the whole reason we're making this change in the first place.

 
> The main reason for Heikki to suggest this one over the other basic one
> is that it brings protection against the "backup script/program crashed
> halfway through but the user still tried to restore from that". They will
> outright fail because there is no pg_control.backup in that case.

But if we are going to make this complicated I'm not sure it's a big
deal just to require pg_control to be copied last.  pgBackRest already
does that and Barman probably does, too.

It does (I did doublecheck that at some point).

 
I don't see "don't copy pg_control" and "copy pg_control last" as all
that different in terms of complexity.

pgBackRest also *restores* pg_control last which I think is pretty
important and would not be addressed by this solution.

So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?

Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).

Eh, nevermind. We do already return the stoppoint. I was looking at the wrong version of the patch. So basically that's current version of patch, but with proper docs of course.

--

Re: Updated backup APIs for non-exclusive backups

From
Amit Kapila
Date:
On Thu, Mar 31, 2016 at 5:49 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Mar 31, 2016 at 4:00 AM, David Steele <david@pgmasters.net> wrote:
So maybe we should at least start that way. And just document that clearly, and then use the patch that we have right now?

Except we add so it still returns the stoppoint() as well (which is returned from the current pg_stop_backup, but not in the new one).

We can always extend the function with more columns later if we need - it's changing the ones we have that's a big problem there :)


+1 for doing that way for now.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated backup APIs for non-exclusive backups

From
Amit Kapila
Date:
On Sat, Mar 19, 2016 at 5:45 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote:

I've attached an updated patch, which is rebased on current master and includes the oid fix.



+       <entry>Finish performing exclusive on-line backup (restricted to superusers or replication roles)</entry>

+      </row>

+      <row>

+       <entry>

+        <literal><function>pg_stop_backup(<parameter>exclusive</> <type>boolean</>)</function></literal>

+        </entry>

+       <entry><type>setof record</type></entry>

+       <entry>Finish performing exclusive or non-exclusive on-line backup (restricted to superusers or replication roles)</entry>



Isn't it better to indicate that user needs to form a backup_label and tablespace_map file from the output of this API and those needs to be dropped into data directory?

Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.



Similarly, there is a description for pg_stop_backup which needs to be modified.


CREATE OR REPLACE FUNCTION

-  pg_start_backup(label text, fast boolean DEFAULT false)

+  pg_start_backup(label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true)

   RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';



One thing, that might be slightly inconvenient for user is if he or she wants to use this API for non-exclusive backups then, they need to pass the value of second parameter as well which doesn't seem to be a big issue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Mar 19, 2016 at 5:45 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <marco.nenciarini@2ndquadrant.it> wrote:

I've attached an updated patch, which is rebased on current master and includes the oid fix.



+       <entry>Finish performing exclusive on-line backup (restricted to superusers or replication roles)</entry>

+      </row>

+      <row>

+       <entry>

+        <literal><function>pg_stop_backup(<parameter>exclusive</> <type>boolean</>)</function></literal>

+        </entry>

+       <entry><type>setof record</type></entry>

+       <entry>Finish performing exclusive or non-exclusive on-line backup (restricted to superusers or replication roles)</entry>



Isn't it better to indicate that user needs to form a backup_label and tablespace_map file from the output of this API and those needs to be dropped into data directory?

I think that documentation should go in the "usage" part of the documentation, and not the reference to the function itself.

This is the documentation that is not written yet of course. I was planinng to wait for Bruce to finish his restructuring of the backup documentation in general, but latest news on that was that it's several months away, so I am going to go ahead and write it anyway, without waiting for that (or possibly do the restructure at hte same time). That's where the process of how to use these functions belong.
 
Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.


That one definitely needs to be fixed, as it's part of the reference. Good spot.

 
Similarly, there is a description for pg_stop_backup which needs to be modified.

That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?

 

CREATE OR REPLACE FUNCTION

-  pg_start_backup(label text, fast boolean DEFAULT false)

+  pg_start_backup(label text, fast boolean DEFAULT false, exclusive boolean DEFAULT true)

   RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';



One thing, that might be slightly inconvenient for user is if he or she wants to use this API for non-exclusive backups then, they need to pass the value of second parameter as well which doesn't seem to be a big issue.

Well, they have to pass it *somehow*. The other option would be to have a different function, which I think doesn't help at all. And we do *not* want the behaviour of existing scripts to implicitly change, so we can't have the default be non-exclusive.

--

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 3/19/16 8:15 AM, Magnus Hagander wrote:

> I've attached an updated patch, which is rebased on current master and
> includes the oid fix.

I've now had a chance to go through this in detail and test thoroughly.  We had a lot of discussion about copying
pg_controllast and that led 
 
me to believe that there might be some sort of race condition with 
multiple backups running concurrently.

I tried every trick I could think of and everything worked as expected 
so as far as I can see this requirement only applies to backups taken 
from a standby which agrees with the comments in do_pg_stop_backup().

Note that I only tested backups against a master database since that's 
what is modified by this patch.  I'm still not entirely comfortable with 
how backups against a standby are done and it that case still think it 
would be best to write the stop wal location into backup_label, but as 
you said before that would be a completely separate patch.

So in the end I'm fine with the code as it stands with the exception of 
the pg_stop_backup2() naming.  I'd prefer a more verbose name here but 
I'm unable to come up with anything very good.  How about 
pg_stop_backup_v2()?  This function could also use a few more comments, 
at least at the top of the exclusive and non-exclusive blocks.

Except for those minor details I believe this is "ready for committer" 
unless anybody has an objection.

-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Amit Kapila
Date:
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
 
Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.


That one definitely needs to be fixed, as it's part of the reference. Good spot.

 
Similarly, there is a description for pg_stop_backup which needs to be modified.

That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?


I am referring to below part of docs in func.sgml

<para>

    <function>pg_stop_backup</> removes the label file and, if it exists,

    the <filename>tablespace_map</> file created by

    <function>pg_start_backup</>, and creates a backup history file in

    the transaction log archive area.  The history file includes the label given to

    <function>pg_start_backup</>, the starting and ending transaction log locations for

    the backup, and the starting and ending times of the backup.  The return

    value is the backup's ending transaction log location (which again

    can be ignored).  After recording the ending location, the current

    transaction log insertion

    point is automatically advanced to the next transaction log file, so that the

    ending transaction log file can be archived immediately to complete the backup.

 </para>




With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
 
Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.


That one definitely needs to be fixed, as it's part of the reference. Good spot.

 
Similarly, there is a description for pg_stop_backup which needs to be modified.

That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?


I am referring to below part of docs in func.sgml

<para>

    <function>pg_stop_backup</> removes the label file and, if it exists,

    the <filename>tablespace_map</> file created by

    <function>pg_start_backup</>, and creates a backup history file in

    the transaction log archive area.  The history file includes the label given to

    <function>pg_start_backup</>, the starting and ending transaction log locations for

    the backup, and the starting and ending times of the backup.  The return

    value is the backup's ending transaction log location (which again

    can be ignored).  After recording the ending location, the current

    transaction log insertion

    point is automatically advanced to the next transaction log file, so that the

    ending transaction log file can be archived immediately to complete the backup.

 </para>



Ah, gotcha. Clearly I wasn't paying attention properly.

PFA a better one (I think), also with the rename and added comments that David was asking for.

Barring objections, I will apply this version.
 
--
Attachment

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 4/5/16 8:05 AM, Magnus Hagander wrote:

> PFA a better one (I think), also with the rename and added comments that
> David was asking for.
>
> Barring objections, I will apply this version.

This version looks good to me.

-- 
-David
david@pgmasters.net



Re: Updated backup APIs for non-exclusive backups

From
Amit Kapila
Date:
On Tue, Apr 5, 2016 at 5:35 PM, Magnus Hagander <magnus@hagander.net> wrote:


On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
 
Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.


That one definitely needs to be fixed, as it's part of the reference. Good spot.

 
Similarly, there is a description for pg_stop_backup which needs to be modified.

That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?


I am referring to below part of docs in func.sgml

<para>

    <function>pg_stop_backup</> removes the label file and, if it exists,

    the <filename>tablespace_map</> file created by

    <function>pg_start_backup</>, and creates a backup history file in

    the transaction log archive area.  The history file includes the label given to

    <function>pg_start_backup</>, the starting and ending transaction log locations for

    the backup, and the starting and ending times of the backup.  The return

    value is the backup's ending transaction log location (which again

    can be ignored).  After recording the ending location, the current

    transaction log insertion

    point is automatically advanced to the next transaction log file, so that the

    ending transaction log file can be archived immediately to complete the backup.

 </para>



Ah, gotcha. Clearly I wasn't paying attention properly.

PFA a better one (I think), also with the rename and added comments that David was asking for.


This version looks good.  Apart from review, I have done some tests (including regression test), everything seems to be fine.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Tue, Apr 5, 2016 at 5:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 5, 2016 at 5:35 PM, Magnus Hagander <magnus@hagander.net> wrote:


On Mon, Apr 4, 2016 at 3:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 4, 2016 at 4:31 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Fri, Apr 1, 2016 at 6:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
 
Also, I think below part of documentation for pg_start_backup() needs to be modified:

<para>

    <function>pg_start_backup</> accepts an

    arbitrary user-defined label for the backup.  (Typically this would be

    the name under which the backup dump file will be stored.)  The function

    writes a backup label file (<filename>backup_label</>) and, if there

    are any links in the <filename>pg_tblspc/</> directory, a tablespace map

    file (<filename>tablespace_map</>) into the database cluster's data

    directory, performs a checkpoint, and then returns the backup's starting

    transaction log location as text.  The user can ignore this result value,

    but it is provided in case it is useful.


That one definitely needs to be fixed, as it's part of the reference. Good spot.

 
Similarly, there is a description for pg_stop_backup which needs to be modified.

That's the one you're referring to in your first commend above, is it not? Or is there one more that you mean?


I am referring to below part of docs in func.sgml

<para>

    <function>pg_stop_backup</> removes the label file and, if it exists,

    the <filename>tablespace_map</> file created by

    <function>pg_start_backup</>, and creates a backup history file in

    the transaction log archive area.  The history file includes the label given to

    <function>pg_start_backup</>, the starting and ending transaction log locations for

    the backup, and the starting and ending times of the backup.  The return

    value is the backup's ending transaction log location (which again

    can be ignored).  After recording the ending location, the current

    transaction log insertion

    point is automatically advanced to the next transaction log file, so that the

    ending transaction log file can be archived immediately to complete the backup.

 </para>



Ah, gotcha. Clearly I wasn't paying attention properly.

PFA a better one (I think), also with the rename and added comments that David was asking for.


This version looks good.  Apart from review, I have done some tests (including regression test), everything seems to be fine.


I've pushed this version, and also added the item from the Brussels developer meeting to actually rewrite the main backup docs to the open items so they are definitely not forgotten for 9.6. 

--

Re: Updated backup APIs for non-exclusive backups

From
Noah Misch
Date:
On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> I've pushed this version, and also added the item from the Brussels
> developer meeting to actually rewrite the main backup docs to the open
> items so they are definitely not forgotten for 9.6.

Here's that PostgreSQL 9.6 open item:

* Update of backup documentation (assigned to Bruce at
[https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_MeetingBrussels Developer Meeting], but others are surely
allowedto work on it as well)
 
** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the current documentation is now incorrect

Unless Bruce endorsed this development strategy, I think it unfair for commit
7117685 to impose a deadline on his backup.sgml project.  Did commit 7117685
indeed make the documentation "incorrect?"  Coverage in the Backup and Restore
chapter would be a good thing, but I don't think that gap alone makes 7117685,
with its reference-only documentation, incorrect.  Did 7117685 impair
documentation in any other respect?


Incidentally, I'm not clear on the extent of Bruce's plans to change backup
documentation.  Relevant meeting note fragment:
 Magnus: We need a new robust API fornon-exclusive backups Simon: Keep but deprecate the existing API. Need to find a
betterway to ensure users have the required xlog in backups Craig: Our docs are in the wrong order. pg_basebackup
shouldbe first, ahead of manual methods. Action point: Re-arrange backup docs page (Bruce)
 

The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?

Thanks,
nm



Re: Updated backup APIs for non-exclusive backups

From
Craig Ringer
Date:
On 6 April 2016 at 12:42, Noah Misch <noah@leadboat.com> wrote:
 

The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?



24.1. SQL Dump
24.1.1. Restoring the Dump
24.1.2. Using pg_dumpall
24.1.3. Handling Large Databases
24.2. File System Level Backup
24.3. Continuous Archiving and Point-in-Time Recovery (PITR)
24.3.1. Setting Up WAL Archiving
24.3.2. Making a Base Backup
24.3.3. Making a Base Backup Using the Low Level API
24.3.4. Recovering Using a Continuous Archive Backup
24.3.5. Timelines
24.3.6. Tips and Examples
24.3.7. Caveats 

"File System Level Backup" should be last, and should link to pg_basebackup as the strongly recommended method.

I'm not too keen on how pg_basebackup is only covered under continuous archiving and PITR, but OTOH that's really how it's mostly used .... and I think most people who're doing physical backups are doing continuous archiving. Or should be.

I'd also be inclined to add a hint in the sql dumps section to the effect that while there's no incremental/differential pg_dump support, you can do incrementals with continuous archiving.

In "Making a base backup" discussion of pg_basebackup should probably mention that 'pg_basebackup -X stream' is required if you want a self-contained base backup.

I think that'd address most of the confusion I've seen on occasion.

I'd love to spend several solid days reorganizing and updating those docs some day, but I think that for now moving some chunks around will still help.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Wed, Apr 6, 2016 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote:
On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> I've pushed this version, and also added the item from the Brussels
> developer meeting to actually rewrite the main backup docs to the open
> items so they are definitely not forgotten for 9.6.

Here's that PostgreSQL 9.6 open item:

* Update of backup documentation (assigned to Bruce at [https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting Brussels Developer Meeting], but others are surely allowed to work on it as well)
** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the current documentation is now incorrect

Unless Bruce endorsed this development strategy, I think it unfair for commit
7117685 to impose a deadline on his backup.sgml project.  Did commit 7117685

I agree that's a horrible wording. What it meant to imply was a deadline for *me* to help take that off Bruce's shoulders before then while also opening the doors for others to work on it should they want. Re-reading it now I can see that's not at all what it says. I'll reword (or rather, remove most of that, since it's been discussed separately and doesn't actually need to go on the open items list which is a list and not a discussion)

(And yes, I chatted with Bruce about doing that weeks ago, so he knows well about it - just that's not what the entry says).

 
indeed make the documentation "incorrect?"  Coverage in the Backup and Restore
chapter would be a good thing, but I don't think that gap alone makes 7117685,
with its reference-only documentation, incorrect.  Did 7117685 impair
documentation in any other respect?

Probably incomplete is a better word there as well.

 
Incidentally, I'm not clear on the extent of Bruce's plans to change backup
documentation.  Relevant meeting note fragment:

  Magnus: We need a new robust API fornon-exclusive backups
  Simon: Keep but deprecate the existing API.
  Need to find a better way to ensure users have the required xlog in backups
  Craig: Our docs are in the wrong order. pg_basebackup should be first, ahead of manual methods.
  Action point: Re-arrange backup docs page (Bruce)

The chapter already does describe pg_basebackup before describing
pg_start_backup; what else did the plan entail?


Recommending people to primarily use tried and tested ways of doing backups (including a reference to a list, probably on the wiki, of known external tools that "do things the right way", such as backrest or barman) rather than focusing on examples that aren't necessarily even correct, and certainly not complete.

Mentioning the actual problems of exclusive base backups. (And obviously talk about how using pg_start_backup/pg_stop_backup now makes it possible to do "low level" backups without the risks of exclusive backups -- which is the "incomplete" part from my note.

More clearly outlining backup vs dump, and possibly (I can't actually remember if we decided the second step) put base backups before pg_dump since the topic is backups.

--

Re: Updated backup APIs for non-exclusive backups

From
Noah Misch
Date:
On Wed, Apr 06, 2016 at 09:17:22AM +0200, Magnus Hagander wrote:
> On Wed, Apr 6, 2016 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> > > I've pushed this version, and also added the item from the Brussels
> > > developer meeting to actually rewrite the main backup docs to the open
> > > items so they are definitely not forgotten for 9.6.
> >
> > Here's that PostgreSQL 9.6 open item:
> >
> > * Update of backup documentation (assigned to Bruce at [
> > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting
> > Brussels Developer Meeting], but others are surely allowed to work on it as
> > well)
> > ** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the
> > current documentation is now incorrect
> >
> > Unless Bruce endorsed this development strategy, I think it unfair for
> > commit
> > 7117685 to impose a deadline on his backup.sgml project.  Did commit
> > 7117685
> >
> 
> I agree that's a horrible wording. What it meant to imply was a deadline
> for *me* to help take that off Bruce's shoulders before then while also
> opening the doors for others to work on it should they want. Re-reading it
> now I can see that's not at all what it says. I'll reword (or rather,
> remove most of that, since it's been discussed separately and doesn't
> actually need to go on the open items list which is a list and not a
> discussion)

Ahh.  Your original wording did admit your interpretation; I just didn't think
of that interpretation.

> > The chapter already does describe pg_basebackup before describing
> > pg_start_backup; what else did the plan entail?

> Recommending people to primarily use tried and tested ways of doing backups
> (including a reference to a list, probably on the wiki, of known external
> tools that "do things the right way", such as backrest or barman) rather
> than focusing on examples that aren't necessarily even correct, and
> certainly not complete.
> 
> Mentioning the actual problems of exclusive base backups. (And obviously
> talk about how using pg_start_backup/pg_stop_backup now makes it possible
> to do "low level" backups without the risks of exclusive backups -- which
> is the "incomplete" part from my note.
> 
> More clearly outlining backup vs dump, and possibly (I can't actually
> remember if we decided the second step) put base backups before pg_dump
> since the topic is backups.

Unless you especially want to self-impose the same tight resolution schedule
that 9.6 regressions experience, let's move this to the "Non-bugs" section.
Which do you prefer?  I don't think the opportunity for more documentation in
light of 7117685 constitutes a regression, and I don't want "Open Issues" to
double as a parking lot for slow-moving non-regressions.



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:

On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch <noah@leadboat.com> wrote:
On Wed, Apr 06, 2016 at 09:17:22AM +0200, Magnus Hagander wrote:
> On Wed, Apr 6, 2016 at 6:42 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Tue, Apr 05, 2016 at 08:15:16PM +0200, Magnus Hagander wrote:
> > > I've pushed this version, and also added the item from the Brussels
> > > developer meeting to actually rewrite the main backup docs to the open
> > > items so they are definitely not forgotten for 9.6.
> >
> > Here's that PostgreSQL 9.6 open item:
> >
> > * Update of backup documentation (assigned to Bruce at [
> > https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting
> > Brussels Developer Meeting], but others are surely allowed to work on it as
> > well)
> > ** Made required by 7117685461af50f50c03f43e6a622284c8d54694 since the
> > current documentation is now incorrect
> >
> > Unless Bruce endorsed this development strategy, I think it unfair for
> > commit
> > 7117685 to impose a deadline on his backup.sgml project.  Did commit
> > 7117685
> >
>
> I agree that's a horrible wording. What it meant to imply was a deadline
> for *me* to help take that off Bruce's shoulders before then while also
> opening the doors for others to work on it should they want. Re-reading it
> now I can see that's not at all what it says. I'll reword (or rather,
> remove most of that, since it's been discussed separately and doesn't
> actually need to go on the open items list which is a list and not a
> discussion)

Ahh.  Your original wording did admit your interpretation; I just didn't think
of that interpretation.

> > The chapter already does describe pg_basebackup before describing
> > pg_start_backup; what else did the plan entail?

> Recommending people to primarily use tried and tested ways of doing backups
> (including a reference to a list, probably on the wiki, of known external
> tools that "do things the right way", such as backrest or barman) rather
> than focusing on examples that aren't necessarily even correct, and
> certainly not complete.
>
> Mentioning the actual problems of exclusive base backups. (And obviously
> talk about how using pg_start_backup/pg_stop_backup now makes it possible
> to do "low level" backups without the risks of exclusive backups -- which
> is the "incomplete" part from my note.
>
> More clearly outlining backup vs dump, and possibly (I can't actually
> remember if we decided the second step) put base backups before pg_dump
> since the topic is backups.

Unless you especially want to self-impose the same tight resolution schedule
that 9.6 regressions experience, let's move this to the "Non-bugs" section.
Which do you prefer?  I don't think the opportunity for more documentation in
light of 7117685 constitutes a regression, and I don't want "Open Issues" to
double as a parking lot for slow-moving non-regressions.

Well, if we *don't* do the rewrite before we release it, then we have to instead put information about the new version of the functions into the old structure I think.

So I think it's an open issue. But maybe we should have a separate section on the open items list for documentation issues? I tihnk we've had that some times before.

--

Re: Updated backup APIs for non-exclusive backups

From
Robert Haas
Date:
On Mon, Apr 11, 2016 at 5:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Well, if we *don't* do the rewrite before we release it, then we have to
> instead put information about the new version of the functions into the old
> structure I think.

I think that you should have done that in the patch as committed.
Maybe you could take an hour and go do that now, and then you can do
the rewrite when you get to it.

Tracking open issues and getting them resolved is a lot of work, so it
kind of frustrates me that you committed this patch knowing that it
was going to create one when, with only slightly more work, you could
have avoided that.  Perhaps that is rigid and intolerant of me, but if
I had done that for even 25% of the patches I committed this
CommitFest, the open-issues list would be a bloodbath right now.

I apologize if this sounds harsh.  I don't mean to be a jerk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Updated backup APIs for non-exclusive backups

From
Noah Misch
Date:
On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch <noah@leadboat.com> wrote:
> > Unless you especially want to self-impose the same tight resolution
> > schedule
> > that 9.6 regressions experience, let's move this to the "Non-bugs" section.
> > Which do you prefer?  I don't think the opportunity for more documentation
> > in
> > light of 7117685 constitutes a regression, and I don't want "Open Issues"
> > to
> > double as a parking lot for slow-moving non-regressions.
> >
> 
> Well, if we *don't* do the rewrite before we release it, then we have to
> instead put information about the new version of the functions into the old
> structure I think.
> 
> So I think it's an open issue.

Works for me...

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.

> But maybe we should have a separate section
> on the open items list for documentation issues? I tihnk we've had that
> some times before.

Maybe.  If the list were starting to get crowded with doc-only items, that
would certainly have benefits.



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> On Thu, Apr 7, 2016 at 5:15 AM, Noah Misch <noah@leadboat.com> wrote:
> > Unless you especially want to self-impose the same tight resolution
> > schedule
> > that 9.6 regressions experience, let's move this to the "Non-bugs" section.
> > Which do you prefer?  I don't think the opportunity for more documentation
> > in
> > light of 7117685 constitutes a regression, and I don't want "Open Issues"
> > to
> > double as a parking lot for slow-moving non-regressions.
> >
>
> Well, if we *don't* do the rewrite before we release it, then we have to
> instead put information about the new version of the functions into the old
> structure I think.
>
> So I think it's an open issue.

Works for me...

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
since you committed the patch believed to have created it, you own this open
item.  If that responsibility lies elsewhere, please let us know whose
responsibility it is to fix this.  Since new open items may be discovered at
any time and I want to plan to have them all fixed well in advance of the ship
date, I will appreciate your efforts toward speedy resolution.  Please
present, within 72 hours, a plan to fix the defect within seven days of this
message.  Thanks.

I won't have time to do the bigger rewrite/reordeirng by then, but I can certainly commit to having the smaller updates done to cover the new functionality in less than a week. If nothing else, that'll be something for me to do on the flight over to pgconf.us.

--

Re: Updated backup APIs for non-exclusive backups

From
Noah Misch
Date:
On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
> On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> > > Well, if we *don't* do the rewrite before we release it, then we have to
> > > instead put information about the new version of the functions into the
> > old
> > > structure I think.
> > >
> > > So I think it's an open issue.
> >
> > Works for me...
> >
> > [This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
> > since you committed the patch believed to have created it, you own this
> > open
> > item.  If that responsibility lies elsewhere, please let us know whose
> > responsibility it is to fix this.  Since new open items may be discovered
> > at
> > any time and I want to plan to have them all fixed well in advance of the
> > ship
> > date, I will appreciate your efforts toward speedy resolution.  Please
> > present, within 72 hours, a plan to fix the defect within seven days of
> > this
> > message.  Thanks.
> >
> 
> I won't have time to do the bigger rewrite/reordeirng by then, but I can
> certainly commit to having the smaller updates done to cover the new
> functionality in less than a week. If nothing else, that'll be something
> for me to do on the flight over to pgconf.us.

Thanks for that plan; it sounds good.



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
> On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
> > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> > > Well, if we *don't* do the rewrite before we release it, then we have to
> > > instead put information about the new version of the functions into the
> > old
> > > structure I think.
> > >
> > > So I think it's an open issue.
> >
> > Works for me...
> >
> > [This is a generic notification.]
> >
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Magnus,
> > since you committed the patch believed to have created it, you own this
> > open
> > item.  If that responsibility lies elsewhere, please let us know whose
> > responsibility it is to fix this.  Since new open items may be discovered
> > at
> > any time and I want to plan to have them all fixed well in advance of the
> > ship
> > date, I will appreciate your efforts toward speedy resolution.  Please
> > present, within 72 hours, a plan to fix the defect within seven days of
> > this
> > message.  Thanks.
> >
>
> I won't have time to do the bigger rewrite/reordeirng by then, but I can
> certainly commit to having the smaller updates done to cover the new
> functionality in less than a week. If nothing else, that'll be something
> for me to do on the flight over to pgconf.us.

Thanks for that plan; it sounds good.

Here's a suggested patch.

There is some duplication between the non-exclusive and exclusive backup sections, but I wanted to make sure that each set of instructions can just be followed top-to-bottom.

I've also removed some tips that aren't really necessary as part of the step-by-step instructions in order to keep things from exploding in size.

Finally, I've changed references to "backup dump" to just be "backup", because it's confusing to call them something with dumps in when it's not pg_dump. Enough that I got partially confused myself while editing...

Comments?

--
Attachment

Re: Updated backup APIs for non-exclusive backups

From
Noah Misch
Date:
On Sat, Apr 16, 2016 at 06:22:47PM +0200, Magnus Hagander wrote:
> > On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
> > > I won't have time to do the bigger rewrite/reordeirng by then, but I can
> > > certainly commit to having the smaller updates done to cover the new
> > > functionality in less than a week.

> There is some duplication between the non-exclusive and exclusive backup
> sections, but I wanted to make sure that each set of instructions can just
> be followed top-to-bottom.
> 
> I've also removed some tips that aren't really necessary as part of the
> step-by-step instructions in order to keep things from exploding in size.
> 
> Finally, I've changed references to "backup dump" to just be "backup",
> because it's confusing to call them something with dumps in when it's not
> pg_dump. Enough that I got partially confused myself while editing...
> 
> Comments?

I scanned this briefly, and it looks reasonable.  I recommend committing it
forthwith.

> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************
> *** 818,823 **** test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_xlog/
> --- 818,838 ----
>       simple. It is very important that these steps are executed in
>       sequence, and that the success of a step is verified before
>       proceeding to the next step.
> +    </para>
> +    <para>
> +     Low level base backups can be made in a non-exclusive or an exclusive
> +     way. The non-exclusive method is recommended and the exclusive one will
> +     at some point be deprecated and removed.

"I will deprecate X at some point" has the same effect as "I deprecate X now."
If you have no doubt you want to deprecate it, I advise plainer phrasing like,
"The exclusive method is deprecated and will eventually be removed."  That is
to say, just deprecate it right now.  If you have doubts, omit deprecation.



Re: Updated backup APIs for non-exclusive backups

From
Fujii Masao
Date:
On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
>> > > > Well, if we *don't* do the rewrite before we release it, then we
>> > > > have to
>> > > > instead put information about the new version of the functions into
>> > > > the
>> > > old
>> > > > structure I think.
>> > > >
>> > > > So I think it's an open issue.
>> > >
>> > > Works for me...
>> > >
>> > > [This is a generic notification.]
>> > >
>> > > The above-described topic is currently a PostgreSQL 9.6 open item.
>> > > Magnus,
>> > > since you committed the patch believed to have created it, you own
>> > > this
>> > > open
>> > > item.  If that responsibility lies elsewhere, please let us know whose
>> > > responsibility it is to fix this.  Since new open items may be
>> > > discovered
>> > > at
>> > > any time and I want to plan to have them all fixed well in advance of
>> > > the
>> > > ship
>> > > date, I will appreciate your efforts toward speedy resolution.  Please
>> > > present, within 72 hours, a plan to fix the defect within seven days
>> > > of
>> > > this
>> > > message.  Thanks.
>> > >
>> >
>> > I won't have time to do the bigger rewrite/reordeirng by then, but I can
>> > certainly commit to having the smaller updates done to cover the new
>> > functionality in less than a week. If nothing else, that'll be something
>> > for me to do on the flight over to pgconf.us.
>>
>> Thanks for that plan; it sounds good.
>
>
> Here's a suggested patch.
>
> There is some duplication between the non-exclusive and exclusive backup
> sections, but I wanted to make sure that each set of instructions can just
> be followed top-to-bottom.
>
> I've also removed some tips that aren't really necessary as part of the
> step-by-step instructions in order to keep things from exploding in size.
>
> Finally, I've changed references to "backup dump" to just be "backup",
> because it's confusing to call them something with dumps in when it's not
> pg_dump. Enough that I got partially confused myself while editing...
>
> Comments?

+    Low level base backups can be made in a non-exclusive or an exclusive
+    way. The non-exclusive method is recommended and the exclusive one will
+    at some point be deprecated and removed.

I don't object to add a non-exclusive mode of low level backup,
but I disagree to mark an exclusive backup as deprecated at least
until we can alleviate some pains that a non-exclusive mode causes.

One example of the pain, in a non-exclusive backup, we need to keep
the IDLE connection which was used to execute pg_start_backup(),
until the end of backup. Of course a backup can take a very
long time. In this case the IDLE connection also needs to remain
for such a long time. If it's accidentally terminated (e.g., because
of IDLE connection), the backup fails and needs to be taken again
from the beginning.

Another pain in a non-exclusive backup is to have to execute both
pg_start_backup() and pg_stop_backup() on the same connection.
Please imagine the case where psql is used to execute those two
backup functions (I believe that there are many users who do this).
For example,
   psql -c "SELECT pg_start_backup()"   rsync, cp, tar, storage backup, or something   psql -c "SELECT
pg_stop_backup()"

A non-exclusive backup breaks the above very simple steps because
two backup functions are executed on different connections.
So, how should we modify the steps for a non-exclusive backup?
Basically we need to pause psql after pg_start_backup(), signal it
to resume after the copy of database cluster is taken, and make
it execute pg_stop_backup(). I'm afraid that the backup script
will be complicated because of this pain of non-exclusive backup.

+     The <function>pg_stop_backup</> will return one row with three
+     values. The second of these fields should be written to a file named
+     <filename>backup_label</> in the root directory of the backup. The
+     third field should be written to a file named
+     <filename>tablespace_map</> unless the field is empty.

How should we write those two values to different files when
we execute pg_stop_backup() via psql? Whole output of
pg_stop_backup() should be written to a transient file and
it should be filtered and written to different two files by
using some Linux commands? This also seems to make the backup
script more complicated.

Regards,

-- 
Fujii Masao



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
>> > > > Well, if we *don't* do the rewrite before we release it, then we
>> > > > have to
>> > > > instead put information about the new version of the functions into
>> > > > the
>> > > old
>> > > > structure I think.
>> > > >
>> > > > So I think it's an open issue.
>> > >
>> > > Works for me...
>> > >
>> > > [This is a generic notification.]
>> > >
>> > > The above-described topic is currently a PostgreSQL 9.6 open item.
>> > > Magnus,
>> > > since you committed the patch believed to have created it, you own
>> > > this
>> > > open
>> > > item.  If that responsibility lies elsewhere, please let us know whose
>> > > responsibility it is to fix this.  Since new open items may be
>> > > discovered
>> > > at
>> > > any time and I want to plan to have them all fixed well in advance of
>> > > the
>> > > ship
>> > > date, I will appreciate your efforts toward speedy resolution.  Please
>> > > present, within 72 hours, a plan to fix the defect within seven days
>> > > of
>> > > this
>> > > message.  Thanks.
>> > >
>> >
>> > I won't have time to do the bigger rewrite/reordeirng by then, but I can
>> > certainly commit to having the smaller updates done to cover the new
>> > functionality in less than a week. If nothing else, that'll be something
>> > for me to do on the flight over to pgconf.us.
>>
>> Thanks for that plan; it sounds good.
>
>
> Here's a suggested patch.
>
> There is some duplication between the non-exclusive and exclusive backup
> sections, but I wanted to make sure that each set of instructions can just
> be followed top-to-bottom.
>
> I've also removed some tips that aren't really necessary as part of the
> step-by-step instructions in order to keep things from exploding in size.
>
> Finally, I've changed references to "backup dump" to just be "backup",
> because it's confusing to call them something with dumps in when it's not
> pg_dump. Enough that I got partially confused myself while editing...
>
> Comments?

+    Low level base backups can be made in a non-exclusive or an exclusive
+    way. The non-exclusive method is recommended and the exclusive one will
+    at some point be deprecated and removed.

I don't object to add a non-exclusive mode of low level backup,
but I disagree to mark an exclusive backup as deprecated at least
until we can alleviate some pains that a non-exclusive mode causes.

Note that we have not marked them as deprecated. We're just giving warnings that they will be deprecated.
 

One example of the pain, in a non-exclusive backup, we need to keep
the IDLE connection which was used to execute pg_start_backup(),
until the end of backup. Of course a backup can take a very
long time. In this case the IDLE connection also needs to remain
for such a long time. If it's accidentally terminated (e.g., because
of IDLE connection), the backup fails and needs to be taken again
from the beginning.


Yes, it's a bit annoying. But it's something you can handle. Unlike the problems that exist with an exclusive backup which you *cannot* handle from the backup script/commands.

 

Another pain in a non-exclusive backup is to have to execute both
pg_start_backup() and pg_stop_backup() on the same connection.

Pretty sure that's the same one you just listed?

 
Please imagine the case where psql is used to execute those two
backup functions (I believe that there are many users who do this).
For example,

    psql -c "SELECT pg_start_backup()"
    rsync, cp, tar, storage backup, or something
    psql -c "SELECT pg_stop_backup()"

A non-exclusive backup breaks the above very simple steps because
two backup functions are executed on different connections.
So, how should we modify the steps for a non-exclusive backup?

You should at least not use psql in that way. You need to open psql in a pipe and drive it through that. Or use a more appropriate client.


Basically we need to pause psql after pg_start_backup(), signal it
to resume after the copy of database cluster is taken, and make
it execute pg_stop_backup(). I'm afraid that the backup script
will be complicated because of this pain of non-exclusive backup.

Yes, if you insist on using psql - which is not a good tool for this - then yes. But you could for example make it a psql script that does something along
SELECT pg_start_backup();
\! rsync ...
SELECT pg_stop_backup()

Or as said above, drive it through a pipe instead. Or use an appropriate client library.

The bottom line is yes, it makes it a bit worse. But it is something you can handle quite simply from your client. And wen you do, it will be *right*.

You have no way at all to fix the problem if your server crashes during an exclusive backup, for example. No matter how you connect to the database, because that's not where that problem is.

 

+     The <function>pg_stop_backup</> will return one row with three
+     values. The second of these fields should be written to a file named
+     <filename>backup_label</> in the root directory of the backup. The
+     third field should be written to a file named
+     <filename>tablespace_map</> unless the field is empty.

How should we write those two values to different files when
we execute pg_stop_backup() via psql? Whole output of
pg_stop_backup() should be written to a transient file and
it should be filtered and written to different two files by
using some Linux commands? This also seems to make the backup
script more complicated.

Yes, you would have to do that. And yes, psql is not a good tool for this. So the solution is to not force yourself to use a tool that doesn't work well for this problem. But if you do, then yes, you will have to go through some extra pain to handle it.

--

Re: Updated backup APIs for non-exclusive backups

From
Robert Haas
Date:
On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Note that we have not marked them as deprecated. We're just giving warnings
> that they will be deprecated.

But I think that is being said here is that maybe they won't be
deprecated, at least not any time soon.  And therefore maybe we
shouldn't say so.

Frankly, I think that's right.  It is one thing to say that the new
method is preferred - +1 for that.  But the old method is going to
continue to be used by many people for a long time, and in some cases
will be superior.  That's not something we can deprecate, unless I'm
misunderstanding the situation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Updated backup APIs for non-exclusive backups

From
Bruce Momjian
Date:
On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> > Note that we have not marked them as deprecated. We're just giving warnings
> > that they will be deprecated.
> 
> But I think that is being said here is that maybe they won't be
> deprecated, at least not any time soon.  And therefore maybe we
> shouldn't say so.
> 
> Frankly, I think that's right.  It is one thing to say that the new
> method is preferred - +1 for that.  But the old method is going to
> continue to be used by many people for a long time, and in some cases
> will be superior.  That's not something we can deprecate, unless I'm
> misunderstanding the situation.

I agree with Robert.  One the one hand we are saying pg_stop_backup()
doesn't work well in psql because you get those two file contents output
that you have to write, and on the other hand we are saying we are going
to deprecate the method that does work well in psql?  I must be missing
something too, as that makes no sense.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
> > Note that we have not marked them as deprecated. We're just giving warnings
> > that they will be deprecated.
>
> But I think that is being said here is that maybe they won't be
> deprecated, at least not any time soon.  And therefore maybe we
> shouldn't say so.
>
> Frankly, I think that's right.  It is one thing to say that the new
> method is preferred - +1 for that.  But the old method is going to
> continue to be used by many people for a long time, and in some cases
> will be superior.  That's not something we can deprecate, unless I'm
> misunderstanding the situation.

I agree with Robert.  One the one hand we are saying pg_stop_backup()
doesn't work well in psql because you get those two file contents output
that you have to write, and on the other hand we are saying we are going
to deprecate the method that does work well in psql?  I must be missing
something too, as that makes no sense.

I don't agree. I don't see how "making a backup using psql" is more important than "making a backup without potentially dangerous sideeffects". But if others don't agree, could one of you at least provide an example of how you'd like the docs to read in a way that doesn't deprecate the unsafe way but still informs the user properly? 

--

Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
* Magnus Hagander (magnus@hagander.net) wrote:
> On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> > > On Wed, Apr 20, 2016 at 1:32 PM, Magnus Hagander <magnus@hagander.net>
> > wrote:
> > > > Note that we have not marked them as deprecated. We're just giving
> > warnings
> > > > that they will be deprecated.
> > >
> > > But I think that is being said here is that maybe they won't be
> > > deprecated, at least not any time soon.  And therefore maybe we
> > > shouldn't say so.
> > >
> > > Frankly, I think that's right.  It is one thing to say that the new
> > > method is preferred - +1 for that.  But the old method is going to
> > > continue to be used by many people for a long time, and in some cases
> > > will be superior.  That's not something we can deprecate, unless I'm
> > > misunderstanding the situation.
> >
> > I agree with Robert.  One the one hand we are saying pg_stop_backup()
> > doesn't work well in psql because you get those two file contents output
> > that you have to write, and on the other hand we are saying we are going
> > to deprecate the method that does work well in psql?  I must be missing
> > something too, as that makes no sense.
>
> I don't agree. I don't see how "making a backup using psql" is more
> important than "making a backup without potentially dangerous sideeffects".
> But if others don't agree, could one of you at least provide an example of
> how you'd like the docs to read in a way that doesn't deprecate the unsafe
> way but still informs the user properly?

I'm with Magnus on this, primairly because I've come to understand just
how dangerous the old backup method is.  That method *should* be
deprecated and discouraged.  A backup method which means your database
doesn't restart properly if the system crashes during the backup is
*bad*.

Fixing that means using something more complicated than the old method
and that's a bit of a pain in psql, but that doesn't mean we should tell
people that the old method is an acceptable approach.

Perhaps we can look at improving psql to make it easier to use it for
the new backup method but, honestly, all these hackish scripts to do
backups aren't doing a lot of things that a real backup solution needs
to do.  Improving psql for this is a bit like new paint on the titanic.

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
Craig Ringer
Date:
On 24 April 2016 at 23:49, Stephen Frost <sfrost@snowman.net> wrote:
 

Fixing that means using something more complicated than the old method
and that's a bit of a pain in psql, but that doesn't mean we should tell
people that the old method is an acceptable approach.

+1

Frankly, I don't find doing backups with psql this way interesting. I'm a bit astonished that anyone's doing it, really. Most of the time I'm doing anything more complicated than pg_basebackup I've got more flexible tools at hand than psql too.

\gset should be a workaround for anyone who really wants to do this in psql for some reason.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Updated backup APIs for non-exclusive backups

From
Fujii Masao
Date:
On Thu, Apr 21, 2016 at 2:32 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>> >>
>> >> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> >> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com>
>> >> > wrote:
>> >> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
>> >> > > > Well, if we *don't* do the rewrite before we release it, then we
>> >> > > > have to
>> >> > > > instead put information about the new version of the functions
>> >> > > > into
>> >> > > > the
>> >> > > old
>> >> > > > structure I think.
>> >> > > >
>> >> > > > So I think it's an open issue.
>> >> > >
>> >> > > Works for me...
>> >> > >
>> >> > > [This is a generic notification.]
>> >> > >
>> >> > > The above-described topic is currently a PostgreSQL 9.6 open item.
>> >> > > Magnus,
>> >> > > since you committed the patch believed to have created it, you own
>> >> > > this
>> >> > > open
>> >> > > item.  If that responsibility lies elsewhere, please let us know
>> >> > > whose
>> >> > > responsibility it is to fix this.  Since new open items may be
>> >> > > discovered
>> >> > > at
>> >> > > any time and I want to plan to have them all fixed well in advance
>> >> > > of
>> >> > > the
>> >> > > ship
>> >> > > date, I will appreciate your efforts toward speedy resolution.
>> >> > > Please
>> >> > > present, within 72 hours, a plan to fix the defect within seven
>> >> > > days
>> >> > > of
>> >> > > this
>> >> > > message.  Thanks.
>> >> > >
>> >> >
>> >> > I won't have time to do the bigger rewrite/reordeirng by then, but I
>> >> > can
>> >> > certainly commit to having the smaller updates done to cover the new
>> >> > functionality in less than a week. If nothing else, that'll be
>> >> > something
>> >> > for me to do on the flight over to pgconf.us.
>> >>
>> >> Thanks for that plan; it sounds good.
>> >
>> >
>> > Here's a suggested patch.
>> >
>> > There is some duplication between the non-exclusive and exclusive backup
>> > sections, but I wanted to make sure that each set of instructions can
>> > just
>> > be followed top-to-bottom.
>> >
>> > I've also removed some tips that aren't really necessary as part of the
>> > step-by-step instructions in order to keep things from exploding in
>> > size.
>> >
>> > Finally, I've changed references to "backup dump" to just be "backup",
>> > because it's confusing to call them something with dumps in when it's
>> > not
>> > pg_dump. Enough that I got partially confused myself while editing...
>> >
>> > Comments?
>>
>> +    Low level base backups can be made in a non-exclusive or an exclusive
>> +    way. The non-exclusive method is recommended and the exclusive one
>> will
>> +    at some point be deprecated and removed.
>>
>> I don't object to add a non-exclusive mode of low level backup,
>> but I disagree to mark an exclusive backup as deprecated at least
>> until we can alleviate some pains that a non-exclusive mode causes.
>
>
> Note that we have not marked them as deprecated. We're just giving warnings
> that they will be deprecated.
>
>>
>>
>> One example of the pain, in a non-exclusive backup, we need to keep
>> the IDLE connection which was used to execute pg_start_backup(),
>> until the end of backup. Of course a backup can take a very
>> long time. In this case the IDLE connection also needs to remain
>> for such a long time. If it's accidentally terminated (e.g., because
>> of IDLE connection), the backup fails and needs to be taken again
>> from the beginning.
>
>
>
> Yes, it's a bit annoying. But it's something you can handle. Unlike the
> problems that exist with an exclusive backup which you *cannot* handle from
> the backup script/commands.

I know many systems which are currently running well in production and
handling that problem of an exclusive backup. For example, they use
Pacemaker/Corosync to shared-disk HA configuration. PostgreSQL resource
agent for Pacemaker removes backup_label in the case of failover to
prevent the standby server from failing to recover from the crash.
This is not so neat solution, but they could live with the problem for
a long time, so far.

I'm NOT against migrating their backup scripts so that new method is used,
at the end. On the other hand, I think that we don't need to warn and rush
them to do that at least until new method will be polished.

>> Another pain in a non-exclusive backup is to have to execute both
>> pg_start_backup() and pg_stop_backup() on the same connection.
>
>
> Pretty sure that's the same one you just listed?

Yes, the sources of those pains are the same.

I wonder if it's better to store the backup state information in shared
memory, catalog, flat file or something instead of local memory
so that we can execute pg_stop_backup() in different session from that
pg_start_backup() is called. Maybe I'm missing something basic, but
why do those functions need to be called in the same session at all?

>> Please imagine the case where psql is used to execute those two
>> backup functions (I believe that there are many users who do this).
>> For example,
>>
>>     psql -c "SELECT pg_start_backup()"
>>     rsync, cp, tar, storage backup, or something
>>     psql -c "SELECT pg_stop_backup()"
>>
>> A non-exclusive backup breaks the above very simple steps because
>> two backup functions are executed on different connections.
>>
>> So, how should we modify the steps for a non-exclusive backup?
>
>
> You should at least not use psql in that way. You need to open psql in a
> pipe and drive it through that. Or use a more appropriate client.
>
>
>> Basically we need to pause psql after pg_start_backup(), signal it
>> to resume after the copy of database cluster is taken, and make
>> it execute pg_stop_backup(). I'm afraid that the backup script
>> will be complicated because of this pain of non-exclusive backup.
>
>
> Yes, if you insist on using psql - which is not a good tool for this - then
> yes. But you could for example make it a psql script that does something
> along
> SELECT pg_start_backup();
> \! rsync ...
> SELECT pg_stop_backup()
>
> Or as said above, drive it through a pipe instead. Or use an appropriate
> client library.

So, what's the appropriate client library for new backup method?
It's better to document that.

>
> The bottom line is yes, it makes it a bit worse. But it is something you can
> handle quite simply from your client. And wen you do, it will be *right*.
>
> You have no way at all to fix the problem if your server crashes during an
> exclusive backup, for example. No matter how you connect to the database,
> because that's not where that problem is.
>
>
>>
>>
>> +     The <function>pg_stop_backup</> will return one row with three
>> +     values. The second of these fields should be written to a file named
>> +     <filename>backup_label</> in the root directory of the backup. The
>> +     third field should be written to a file named
>> +     <filename>tablespace_map</> unless the field is empty.
>>
>> How should we write those two values to different files when
>> we execute pg_stop_backup() via psql? Whole output of
>> pg_stop_backup() should be written to a transient file and
>> it should be filtered and written to different two files by
>> using some Linux commands? This also seems to make the backup
>> script more complicated.
>
>
> Yes, you would have to do that. And yes, psql is not a good tool for this.
> So the solution is to not force yourself to use a tool that doesn't work
> well for this problem.

So it's better to document a tool which can work well or how to use psql
for new backup method (i.e., how to create backup_label and tablespace_map
files from the output of psql, of course it's also better to write
the example).

Regards,

-- 
Fujii Masao



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Mon, Apr 25, 2016 at 4:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Apr 21, 2016 at 2:32 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>> On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net>
>> wrote:
>> > On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>> >>
>> >> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> >> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com>
>> >> > wrote:
>> >> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
> Yes, it's a bit annoying. But it's something you can handle. Unlike the
> problems that exist with an exclusive backup which you *cannot* handle from
> the backup script/commands.

I know many systems which are currently running well in production and
handling that problem of an exclusive backup. For example, they use
Pacemaker/Corosync to shared-disk HA configuration. PostgreSQL resource
agent for Pacemaker removes backup_label in the case of failover to
prevent the standby server from failing to recover from the crash.
This is not so neat solution, but they could live with the problem for
a long time, so far.

I'm NOT against migrating their backup scripts so that new method is used,
at the end. On the other hand, I think that we don't need to warn and rush
them to do that at least until new method will be polished.

We have not put a timeframe on the deprecation. Given that, they can expect to have multiple releases. And surely they don't use psql in the first place, but an interface that gives them more control already, don't they?

 
>> Another pain in a non-exclusive backup is to have to execute both
>> pg_start_backup() and pg_stop_backup() on the same connection.
>
>
> Pretty sure that's the same one you just listed?

Yes, the sources of those pains are the same.

I wonder if it's better to store the backup state information in shared
memory, catalog, flat file or something instead of local memory
so that we can execute pg_stop_backup() in different session from that
pg_start_backup() is called. Maybe I'm missing something basic, but
why do those functions need to be called in the same session at all?

So that we can automatically terminate backup mode if the client disappears.


> Yes, if you insist on using psql - which is not a good tool for this - then
> yes. But you could for example make it a psql script that does something
> along
> SELECT pg_start_backup();
> \! rsync ...
> SELECT pg_stop_backup()
>
> Or as said above, drive it through a pipe instead. Or use an appropriate
> client library.

So, what's the appropriate client library for new backup method?
It's better to document that.


Take your pick. *Any* client library will work fine. libpq, psycopg2, JDBC, DBD::Pg, npgsql, doesn't matter.

psql is not a client *library*, it's a client.



>> How should we write those two values to different files when
>> we execute pg_stop_backup() via psql? Whole output of
>> pg_stop_backup() should be written to a transient file and
>> it should be filtered and written to different two files by
>> using some Linux commands? This also seems to make the backup
>> script more complicated.
>
>
> Yes, you would have to do that. And yes, psql is not a good tool for this.
> So the solution is to not force yourself to use a tool that doesn't work
> well for this problem.

So it's better to document a tool which can work well or how to use psql
for new backup method (i.e., how to create backup_label and tablespace_map
files from the output of psql, of course it's also better to write
the example).

We didn't have examples before, did we?

These APIs are directed at *tool developers*. Surely they can be trusted to be able to write a file.

*End users* should not be using those APIs at all. They should be using pg_basebackup, or one of the other tools. That's the "bigger documentation rewrite" that was referenced in the thread, but nobody could commit to getting done before beta.

--

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 4/24/16 11:49 AM, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>>> On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
>>>
>>>> Frankly, I think that's right.  It is one thing to say that the new
>>>> method is preferred - +1 for that.  But the old method is going to
>>>> continue to be used by many people for a long time, and in some cases
>>>> will be superior.  That's not something we can deprecate, unless I'm
>>>> misunderstanding the situation.
>>>
>>> I agree with Robert.  One the one hand we are saying pg_stop_backup()
>>> doesn't work well in psql because you get those two file contents output
>>> that you have to write, and on the other hand we are saying we are going
>>> to deprecate the method that does work well in psql?  I must be missing
>>> something too, as that makes no sense.
>>
>> I don't agree. I don't see how "making a backup using psql" is more
>> important than "making a backup without potentially dangerous sideeffects".
>> But if others don't agree, could one of you at least provide an example of
>> how you'd like the docs to read in a way that doesn't deprecate the unsafe
>> way but still informs the user properly?
>
> I'm with Magnus on this, primairly because I've come to understand just
> how dangerous the old backup method is.  That method *should* be
> deprecated and discouraged.  A backup method which means your database
> doesn't restart properly if the system crashes during the backup is
> *bad*.

+1

> Perhaps we can look at improving psql to make it easier to use it for
> the new backup method but, honestly, all these hackish scripts to do
> backups aren't doing a lot of things that a real backup solution needs
> to do.  Improving psql for this is a bit like new paint on the titanic.

Personally I think we do the users a disservice by implying that backup
is as simple as calling two functions and copying the files.  Oh, and
don't forget to include WAL!

--
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Kevin Grittner
Date:
On Mon, Apr 25, 2016 at 9:11 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

> PostgreSQL resource agent for Pacemaker removes backup_label in
> the case of failover to prevent the standby server from failing
> to recover from the crash.

Yikes!  I hope that the providers of Pacemaker document that they
are using a technique which can silently corrupt the cluster.

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

> This is not so neat solution, but they could live with the
> problem for a long time, so far.

It can work sometimes.  It can also produce corruption which will
silently return incorrect results from queries for a long time
before some other symptom of the corruption surfaces.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
I'm a bit late to the party, but I only recently noticed that the exclusive
backup API is deprecated.

On Sun, 2016-04-24 at 11:49 -0400, Stephen Frost wrote:
> * Magnus Hagander (magnus@hagander.net) wrote:
> > On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> > > > 
> > > > Frankly, I think that's right.  It is one thing to say that the new
> > > > method is preferred - +1 for that.  But the old method is going to
> > > > continue to be used by many people for a long time, and in some cases
> > > > will be superior.  That's not something we can deprecate, unless I'm
> > > > misunderstanding the situation.
> > > 
> > > I agree with Robert.  One the one hand we are saying pg_stop_backup()
> > > doesn't work well in psql because you get those two file contents output
> > > that you have to write, and on the other hand we are saying we are going
> > > to deprecate the method that does work well in psql?  I must be missing
> > > something too, as that makes no sense.
> > 
> > I don't agree. I don't see how "making a backup using psql" is more
> > important than "making a backup without potentially dangerous sideeffects".
> > But if others don't agree, could one of you at least provide an example of
> > how you'd like the docs to read in a way that doesn't deprecate the unsafe
> > way but still informs the user properly?
> 
> I'm with Magnus on this, primairly because I've come to understand just
> how dangerous the old backup method is.  That method *should* be
> deprecated and discouraged.  A backup method which means your database
> doesn't restart properly if the system crashes during the backup is
> *bad*.
> 
> Fixing that means using something more complicated than the old method
> and that's a bit of a pain in psql, but that doesn't mean we should tell
> people that the old method is an acceptable approach.
> 
> Perhaps we can look at improving psql to make it easier to use it for
> the new backup method but, honestly, all these hackish scripts to do
> backups aren't doing a lot of things that a real backup solution needs
> to do.  Improving psql for this is a bit like new paint on the titanic.

I guess that the danger mentioned above is that the database may not
restart if it crashes while in exclusive backup mode, because the
WAL containing the starting checkpoint has already been purged.
True, that is a problem.

I would say the typical use case for the exclusive backup method is
the following (and I have seen it often):

You have some kind of backup software that does file system backups
and is able to run a "pre-backup" and "post-backup" script.
The backup is triggered by the backup software.

Another thing that is problematic with non-exclusive backups is that
you have to write the backup_label file into the backup after the
backup has been taken.  With a technique like the above, you cannot
easily do that.

So in essence, all backup methods that work like the above won't be
possible any more once the exclusive backup is gone.


I expect that that will make a lot of users unhappy.

Would it be an option to change the semantics so that "backup_label"
is ignored if there is no "recovery.conf"?  Of course that opens
another way to corrupt your database (by starting it from a backup
without first creating "recovery.conf"), but we could add another
big warning.

I'd say that the people who ignore such a warning are the same
people that happily remove "backup_label" when they see the following
message upon starting a cluster from a backup without recovery:

  If you are not restoring from a backup, try removing the file "backup_label".

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Sun, 2016-04-24 at 11:49 -0400, Stephen Frost wrote:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> > > On Sun, Apr 24, 2016 at 5:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > > > On Fri, Apr 22, 2016 at 11:53:46AM -0400, Robert Haas wrote:
> > > > >
> > > > > Frankly, I think that's right.  It is one thing to say that the new
> > > > > method is preferred - +1 for that.  But the old method is going to
> > > > > continue to be used by many people for a long time, and in some cases
> > > > > will be superior.  That's not something we can deprecate, unless I'm
> > > > > misunderstanding the situation.
> > > >
> > > > I agree with Robert.  One the one hand we are saying pg_stop_backup()
> > > > doesn't work well in psql because you get those two file contents output
> > > > that you have to write, and on the other hand we are saying we are going
> > > > to deprecate the method that does work well in psql?  I must be missing
> > > > something too, as that makes no sense.
> > >
> > > I don't agree. I don't see how "making a backup using psql" is more
> > > important than "making a backup without potentially dangerous sideeffects".
> > > But if others don't agree, could one of you at least provide an example of
> > > how you'd like the docs to read in a way that doesn't deprecate the unsafe
> > > way but still informs the user properly?
> >
> > I'm with Magnus on this, primairly because I've come to understand just
> > how dangerous the old backup method is.  That method *should* be
> > deprecated and discouraged.  A backup method which means your database
> > doesn't restart properly if the system crashes during the backup is
> > *bad*.
> >
> > Fixing that means using something more complicated than the old method
> > and that's a bit of a pain in psql, but that doesn't mean we should tell
> > people that the old method is an acceptable approach.
> >
> > Perhaps we can look at improving psql to make it easier to use it for
> > the new backup method but, honestly, all these hackish scripts to do
> > backups aren't doing a lot of things that a real backup solution needs
> > to do.  Improving psql for this is a bit like new paint on the titanic.
>
> I guess that the danger mentioned above is that the database may not
> restart if it crashes while in exclusive backup mode, because the
> WAL containing the starting checkpoint has already been purged.

The database would think it's trying to do recovery from a backup, not
doing crash recovery like it should be doing, so, yes, that's pretty
ugly.

> I would say the typical use case for the exclusive backup method is
> the following (and I have seen it often):
>
> You have some kind of backup software that does file system backups
> and is able to run a "pre-backup" and "post-backup" script.
> The backup is triggered by the backup software.

Seeing it often doesn't make it a good solution.  Running just
pre-backup and post-backup scripts and copying the filesystem isn't
enough to perform an online PostgreSQL backup- the WAL needs to be
collected as well, and you need to make sure that you have all of the
WAL before the backup can be considered complete.  On restore, you're
going to need to create a recovery.conf (at least in released versions)
which provides a restore command (needed even in HEAD today) to get the
old WAL, so having to also create the backup_label file shouldn't be
that difficult.

Lastly, if you really want, you can extract out the data from
pg_stop_backup in whatever your post-backup script is.

> Another thing that is problematic with non-exclusive backups is that
> you have to write the backup_label file into the backup after the
> backup has been taken.  With a technique like the above, you cannot
> easily do that.

... why not?  You need to create the recovery.conf anyway, and you need
to be archiving WAL somewhere, so it certainly seems like you could put
the backup_label there too.

> So in essence, all backup methods that work like the above won't be
> possible any more once the exclusive backup is gone.

That's why it's been deprecated and not removed...  but hopefully we
will be able to remove it in the future as it really isn't a good API.

> I expect that that will make a lot of users unhappy.

If it means that they implement a better backup strategy, then it's
probably a good thing, which is the goal of this.

> Would it be an option to change the semantics so that "backup_label"
> is ignored if there is no "recovery.conf"?  Of course that opens
> another way to corrupt your database (by starting it from a backup
> without first creating "recovery.conf"), but we could add another
> big warning.

My recollection is that other approaches were considered to deal with
this but none of them were all that good, though that was two years ago
at this point.  Certainly the issue with the above where a restored
backup without a recovery.conf could unwittingly be treated as if it
was crash recovery and end up with a very incorrect database as a
result, without any clear indication of that being the case, is quite
bad.

And, really, the arguments for keeping this API around and figuring out
how to 'fix' it would have to outweigh the cost of asking users to
reevaluate their existing solutions and implementing ones that use the
exclusive API, which is pretty clearly better.  What's more, any change
we make to the deprecated API has the chance of breaking existing users
too.  If we're going to make some kind of breaking change, let's make it
in a way that eliminates as many possible pitfalls as we can, and that's
basically what the non-exclusive API does.

I don't see any compelling argument for trying to do something half-way
any more today than I did two years ago when this was being discussed.

Thanks!

Stephen

Attachment

Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
On Sun, 2018-11-25 at 13:50 -0500, Stephen Frost wrote:
> I don't see any compelling argument for trying to do something half-way
> any more today than I did two years ago when this was being discussed.

That may well be so.  It may be better to make users unhappy than to
make them very unhappy...

But I find the following points unconvincing:

> > I would say the typical use case for the exclusive backup method is
> > the following (and I have seen it often):
> > 
> > You have some kind of backup software that does file system backups
> > and is able to run a "pre-backup" and "post-backup" script.
> > The backup is triggered by the backup software.
> 
> Seeing it often doesn't make it a good solution.  Running just
> pre-backup and post-backup scripts and copying the filesystem isn't
> enough to perform an online PostgreSQL backup- the WAL needs to be
> collected as well, and you need to make sure that you have all of the
> WAL before the backup can be considered complete.

Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
So this is not a problem.

> On restore, you're
> going to need to create a recovery.conf (at least in released versions)
> which provides a restore command (needed even in HEAD today) to get the
> old WAL, so having to also create the backup_label file shouldn't be
> that difficult.

You write "recovery.conf" upon recovery, when you have the restored
backup, so you have it on a file system.  No problem adding a file then.

This is entirely different from adding a "backup_label" file to
a backup that has been taken by a backup software in some arbitrary
format in some arbitrary location (think snapshot).

So these two cases don't compare.

> Lastly, if you really want, you can extract out the data from
> pg_stop_backup in whatever your post-backup script is.

Come on, now.
You usually use backup techniques like that because you can't get
your large database backed up in the available time window otherwise.

> > Another thing that is problematic with non-exclusive backups is that
> > you have to write the backup_label file into the backup after the
> > backup has been taken.  With a technique like the above, you cannot
> > easily do that.
> 
> ... why not?  You need to create the recovery.conf anyway, and you need
> to be archiving WAL somewhere, so it certainly seems like you could put
> the backup_label there too.

As I said above, you don't add "recovery.conf" to the backup right away,
so these two cases don't compare.

> > I expect that that will make a lot of users unhappy.
> 
> If it means that they implement a better backup strategy, then it's
> probably a good thing, which is the goal of this.

I thought our goal is to provide convenient backup methods...


Ignoring "backup_label" on restart, as I suggested in my previous message,
probably isn't such a hot idea.

But what's wrong with retaining the exclusive backup method and just
sticking a big "Warning: this may cause a restart to fail after a crash"
on it?  That sure wouldn't be unsafe.

If somebody prefers a simpler backup method at the price of having to
manually restart the server after a crash, what's wrong with that?
Why not give them the choice?

I'd say that one could write a server start script that just removes
"backup_label", but I am sure someone will argue that then somebody
will restore a backup and then restart the server without first
recovering the database...

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
Greetings,

On Sun, Nov 25, 2018 at 14:17 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2018-11-25 at 13:50 -0500, Stephen Frost wrote:
> I don't see any compelling argument for trying to do something half-way
> any more today than I did two years ago when this was being discussed.

That may well be so.  It may be better to make users unhappy than to
make them very unhappy...

But I find the following points unconvincing:

> > I would say the typical use case for the exclusive backup method is
> > the following (and I have seen it often):
> >
> > You have some kind of backup software that does file system backups
> > and is able to run a "pre-backup" and "post-backup" script.
> > The backup is triggered by the backup software.
>
> Seeing it often doesn't make it a good solution.  Running just
> pre-backup and post-backup scripts and copying the filesystem isn't
> enough to perform an online PostgreSQL backup- the WAL needs to be
> collected as well, and you need to make sure that you have all of the
> WAL before the backup can be considered complete.

Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
So this is not a problem.

That doesn’t actually make sure you have all of the WAL reliably saved across the backup, it just cares what archive command returns, which is sadly often a bad thing to depend on.  I certainly wouldn’t rely on only that for any system I cared about. 

> On restore, you're
> going to need to create a recovery.conf (at least in released versions)
> which provides a restore command (needed even in HEAD today) to get the
> old WAL, so having to also create the backup_label file shouldn't be
> that difficult.

You write "recovery.conf" upon recovery, when you have the restored
backup, so you have it on a file system.  No problem adding a file then.

This is entirely different from adding a "backup_label" file to
a backup that has been taken by a backup software in some arbitrary
format in some arbitrary location (think snapshot).

There isn’t any need to write the backup label before you restore the database, just as you write recovery.conf then.

> Lastly, if you really want, you can extract out the data from
> pg_stop_backup in whatever your post-backup script is.

Come on, now.
You usually use backup techniques like that because you can't get
your large database backed up in the available time window otherwise.

I’m not following what you’re trying to get at here, why can’t you extract the data for the backup label from pg_stop_backup..?  Certainly other tools do, even ones that do extremely fast parallel backups..  the two are completely independent.

Did you think I meant pg_basebackup..?  I certaily didn’t.

> > Another thing that is problematic with non-exclusive backups is that
> > you have to write the backup_label file into the backup after the
> > backup has been taken.  With a technique like the above, you cannot
> > easily do that.
>
> ... why not?  You need to create the recovery.conf anyway, and you need
> to be archiving WAL somewhere, so it certainly seems like you could put
> the backup_label there too.

As I said above, you don't add "recovery.conf" to the backup right away,
so these two cases don't compare.

There’s no requirement that you add the backup label contents immediately either, you just need to keep track of it and restore it when you restore the database and create the recovery.conf file.

> > I expect that that will make a lot of users unhappy.
>
> If it means that they implement a better backup strategy, then it's
> probably a good thing, which is the goal of this.

I thought our goal is to provide convenient backup methods...

Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

Ignoring "backup_label" on restart, as I suggested in my previous message,
probably isn't such a hot idea.

Agreed. 

But what's wrong with retaining the exclusive backup method and just
sticking a big "Warning: this may cause a restart to fail after a crash"
on it?  That sure wouldn't be unsafe.

I haven’t seen anyone pushing for it to be removed immediately, but users should not use it and newcomers would be much better served by using the non exclusive api.  There is a reason it was deprecated and it’s because it simply isn’t a good API. Coming along a couple years later and saying that it’s a good API while ignoring the issues that it has doesn’t change that.

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
Stephen Frost wrote: 
> > > Seeing it often doesn't make it a good solution.  Running just
> > > pre-backup and post-backup scripts and copying the filesystem isn't
> > > enough to perform an online PostgreSQL backup- the WAL needs to be
> > > collected as well, and you need to make sure that you have all of the
> > > WAL before the backup can be considered complete.
> > 
> > Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
> > So this is not a problem.
> 
> That doesn’t actually make sure you have all of the WAL reliably saved
> across the backup, it just cares what archive command returns, which is
> sadly often a bad thing to depend on.  I certainly wouldn’t rely on only
> that for any system I cared about. 

If you write a bad archive_command, you have a problem.
But that is quite unrelated to the problem at hand, if I am not mistaken.

> > > On restore, you're
> > > going to need to create a recovery.conf (at least in released versions)
> > > which provides a restore command (needed even in HEAD today) to get the
> > > old WAL, so having to also create the backup_label file shouldn't be
> > > that difficult.
> > 
> > You write "recovery.conf" upon recovery, when you have the restored
> > backup, so you have it on a file system.  No problem adding a file then.
> > 
> > This is entirely different from adding a "backup_label" file to
> > a backup that has been taken by a backup software in some arbitrary
> > format in some arbitrary location (think snapshot).
> 
> There isn’t any need to write the backup label before you restore the database,
> just as you write recovery.conf then.

Granted.
But it is pretty convenient, and writing it to the data directory right away
is a good thing on top, because it reduces the danger of inadvertedly
starting the backup without recovery.

> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> > 
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
> 
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..?  Certainly other tools
> do, even ones that do extremely fast parallel backups..  the two are
> completely independent.
> 
> Did you think I meant pg_basebackup..?  I certaily didn’t.

Oh yes, I misunderstood.  Sorry.

Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient.  Simplicity is good in backup solutions, because complicated
things tend to break more easily.

> > I thought our goal is to provide convenient backup methods...
> 
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

"Not starting up without manual intervention" is not actually broken...

> > But what's wrong with retaining the exclusive backup method and just
> > sticking a big "Warning: this may cause a restart to fail after a crash"
> > on it?  That sure wouldn't be unsafe.
> 
> I haven’t seen anyone pushing for it to be removed immediately, but users should
> not use it and newcomers would be much better served by using the non exclusive api.
> There is a reason it was deprecated and it’s because it simply isn’t a good API.
> Coming along a couple years later and saying that it’s a good API while ignoring
> the issues that it has doesn’t change that.

I don't think I'm ignoring the issues, I just think there is a valid use case for
the exclusive backup API, with all its caveats.

Of course I'm not arguing on behalf of organizations running lots of databases
for whom manual intervention after a crash is unacceptable.

I'm arguing on behalf of users that run a few databases, want a simple backup
solution and are ready to deal with the shortcomings.


But I will gladly accept defeat in this matter, I just needed to vent my opinion.

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:


On Sun, Nov 25, 2018 at 9:45 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Stephen Frost wrote:
> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> >
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
>
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..?  Certainly other tools
> do, even ones that do extremely fast parallel backups..  the two are
> completely independent.
>
> Did you think I meant pg_basebackup..?  I certaily didn’t.

Oh yes, I misunderstood.  Sorry.

Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient.  Simplicity is good in backup solutions, because complicated
things tend to break more easily.

Yes, simplicity is good.

The problem with the previous interface is that it made things *look* simple, but they were not. Thus, people got into all sorts of trouble because they got things wrong.

The new interface is simple, and much harder to get wrong. What it isn't, is that it's not as convenient as the old one. But correctness is a lot more important than convenience.

That said, I agree with your point that it's not an uncommon thing to need. If a good solution for it can be proposed that requires the exclusive backup interface, then I wouldn't be against un-deprecating that.  But that's going to require a lot more than just a documentation change, IMHO. What could perhaps be handled with a documentation change, however, is to document a good way for this type of setup to use the new interfaces.


> > I thought our goal is to provide convenient backup methods...
>
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

"Not starting up without manual intervention" is not actually broken...

Correct. But that's not the worst case scenario here. Yes, some of the bad ones are the ones amplified by said manual intervention, but user interaction at the restore point is going to be even harder to get right.


I'm arguing on behalf of users that run a few databases, want a simple backup
solution and are ready to deal with the shortcomings.

Those that want a simple backup solution have one -- pg_basebackup.

The exclusive backup API is *not* simple. It is convenient, but it is not simple.

Actually having a simple API that worked with the pre/post backup scripts, that would be an improvement that we should definitely want!

--

Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
Greetings,

On Sun, Nov 25, 2018 at 15:45 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Stephen Frost wrote:
> > > On restore, you're
> > > going to need to create a recovery.conf (at least in released versions)
> > > which provides a restore command (needed even in HEAD today) to get the
> > > old WAL, so having to also create the backup_label file shouldn't be
> > > that difficult.
> >
> > You write "recovery.conf" upon recovery, when you have the restored
> > backup, so you have it on a file system.  No problem adding a file then.
> >
> > This is entirely different from adding a "backup_label" file to
> > a backup that has been taken by a backup software in some arbitrary
> > format in some arbitrary location (think snapshot).
>
> There isn’t any need to write the backup label before you restore the database,
> just as you write recovery.conf then.

Granted.
But it is pretty convenient, and writing it to the data directory right away
is a good thing on top, because it reduces the danger of inadvertedly
starting the backup without recovery.

Writing it into the data directory is *not* a good thing because as soon as you do that you risk there being an issue if there’s a crash.  Writing into the backup isn’t a bad idea but if you’re managing your backups then writing it somewhere else (such as where you write your WAL) and associating it with the backup (presumably it has a label) should make it easy to pull back when you restore. 

> > > Lastly, if you really want, you can extract out the data from
> > > pg_stop_backup in whatever your post-backup script is.
> >
> > Come on, now.
> > You usually use backup techniques like that because you can't get
> > your large database backed up in the available time window otherwise.
>
> I’m not following what you’re trying to get at here, why can’t you extract
> the data for the backup label from pg_stop_backup..?  Certainly other tools
> do, even ones that do extremely fast parallel backups..  the two are
> completely independent.
>
> Did you think I meant pg_basebackup..?  I certaily didn’t.

Oh yes, I misunderstood.  Sorry.

Yes, you can come up with a post-backup script that somehow communicates
with your pre-backup script to get the information, but it sure is
inconvenient.  Simplicity is good in backup solutions, because complicated
things tend to break more easily.

Not sure what communication is necessary here..?   The data needed for the backup label file comes from pg_stop_backup in a non-exclusive backup.  You *should* be grabbing the starting WAL from the start backup as well, to confirm that you have all of the WAL for the backup, but you don’t actually need to in order to write out the backup label.

> > I thought our goal is to provide convenient backup methods...
>
> Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

"Not starting up without manual intervention" is not actually broken...

Of course it is.  Having the behavior of the system be completely different depending on if a backup happened to be running or not is just plain bad and is broken.  It’s not a feature. Had this issue been realized when the exclusive backup mode was developed I suspect it never would have been implemented that way to begin with...

Thanks!

Stephen

Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote:
> > > There isn’t any need to write the backup label before you restore the database,
> > > just as you write recovery.conf then.
> > 
> > Granted.
> > But it is pretty convenient, and writing it to the data directory right away
> > is a good thing on top, because it reduces the danger of inadvertedly
> > starting the backup without recovery.
> 
> Writing it into the data directory is *not* a good thing because as soon as you do
> that you risk there being an issue if there’s a crash.  Writing into the backup
> isn’t a bad idea but if you’re managing your backups then writing it somewhere else
> (such as where you write your WAL) and associating it with the backup (presumably
> it has a label) should make it easy to pull back when you restore. 

If there is a crash during the backup procedure, the backup is bad.
Doesn't matter during which part of the backup procedure it happens.

> > Yes, you can come up with a post-backup script that somehow communicates
> > with your pre-backup script to get the information, but it sure is
> > inconvenient.  Simplicity is good in backup solutions, because complicated
> > things tend to break more easily.
> 
> Not sure what communication is necessary here..?   The data needed for the backup
> label file comes from pg_stop_backup in a non-exclusive backup.

Right, and pg_stop_backup has to be run from the "pre-backup" script.

Yours,
Laurenz Albe




Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
On Sun, 2018-11-25 at 22:01 +0100, Magnus Hagander wrote:
[about managing backups from pre- and post-file-system-backup scrips]
> I agree with your point that it's not an uncommon thing to need. If a good solution
> for it can be proposed that requires the exclusive backup interface, then I wouldn't
> be against un-deprecating that.  But that's going to require a lot more than just a
> documentation change, IMHO. What could perhaps be handled with a documentation change,
> however, is to document a good way for this type of setup to use the new interfaces.

Good point, and it puts the ball in my court :^)

> > I'm arguing on behalf of users that run a few databases, want a simple backup
> > solution and are ready to deal with the shortcomings.
> 
> Those that want a simple backup solution have one -- pg_basebackup.
> 
> The exclusive backup API is *not* simple. It is convenient, but it is not simple.
> 
> Actually having a simple API that worked with the pre/post backup scripts, that
> would be an improvement that we should definitely want!

Right; unfortunately it is not simple to come up with one that doesn't exhibit
the problems of the existing exclusive backup.

Perhaps it's theoretically impossible.  The goal is to disambiguate what a file
system backup sees in backup mode and what the startup process sees after a crash
in backup mode, and I can't see how that could be done.

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
Magnus Hagander
Date:
On Mon, Nov 26, 2018 at 6:44 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2018-11-25 at 22:01 +0100, Magnus Hagander wrote:
[about managing backups from pre- and post-file-system-backup scrips]
> I agree with your point that it's not an uncommon thing to need. If a good solution
> for it can be proposed that requires the exclusive backup interface, then I wouldn't
> be against un-deprecating that.  But that's going to require a lot more than just a
> documentation change, IMHO. What could perhaps be handled with a documentation change,
> however, is to document a good way for this type of setup to use the new interfaces.

Good point, and it puts the ball in my court :^)

Enjoy :)

 
> > I'm arguing on behalf of users that run a few databases, want a simple backup
> > solution and are ready to deal with the shortcomings.
>
> Those that want a simple backup solution have one -- pg_basebackup.
>
> The exclusive backup API is *not* simple. It is convenient, but it is not simple.
>
> Actually having a simple API that worked with the pre/post backup scripts, that
> would be an improvement that we should definitely want!

Right; unfortunately it is not simple to come up with one that doesn't exhibit
the problems of the existing exclusive backup.

Right, it turns out to actually be a hard problem. The old API pretended it wasn't, which wasn't really very helpful in the long run...


Perhaps it's theoretically impossible.  The goal is to disambiguate what a file
system backup sees in backup mode and what the startup process sees after a crash
in backup mode, and I can't see how that could be done.

Not if it's in the same physical location, no, I think that's really hard. 

--

Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 11/26/18 12:31 AM, Laurenz Albe wrote:
> On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote:
>>>> There isn’t any need to write the backup label before you restore the database,
>>>> just as you write recovery.conf then.
>>>
>>> Granted.
>>> But it is pretty convenient, and writing it to the data directory right away
>>> is a good thing on top, because it reduces the danger of inadvertedly
>>> starting the backup without recovery.
>>
>> Writing it into the data directory is *not* a good thing because as soon as you do
>> that you risk there being an issue if there’s a crash.  Writing into the backup
>> isn’t a bad idea but if you’re managing your backups then writing it somewhere else
>> (such as where you write your WAL) and associating it with the backup (presumably
>> it has a label) should make it easy to pull back when you restore.
> 
> If there is a crash during the backup procedure, the backup is bad.
> Doesn't matter during which part of the backup procedure it happens.

Yes, but in this case with exclusive backups your cluster also will not 
start.  That's a lot different than a bad backup because you should have 
many backups and a way to mark when they are complete.  If your cluster 
requires manual intervention to start because a backup was in progress 
that's not a good thing.

I'm of the opinion that we should remove the exclusive option for PG12. 
The recovery changes recently committed will completely break any 
automated recovery written for PG <= 11 so we might as go all the way.

Requiring non-exclusive backup will be an easy change for tools that 
already understand non-exclusive backup, which has been available in the 
last three versions of Postgres.

The exclusive backup method tries to make a hard problem look simple but 
does not succeed and causes issues besides.  Performing backups properly 
*is* hard and we don't do anyone a service by making it look like 
something can be done with a trivial bash script.

Regards,

-- 
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
Greetings,

* David Steele (david@pgmasters.net) wrote:
> On 11/26/18 12:31 AM, Laurenz Albe wrote:
> >If there is a crash during the backup procedure, the backup is bad.
> >Doesn't matter during which part of the backup procedure it happens.
>
> Yes, but in this case with exclusive backups your cluster also will not
> start.  That's a lot different than a bad backup because you should have
> many backups and a way to mark when they are complete.  If your cluster
> requires manual intervention to start because a backup was in progress
> that's not a good thing.
>
> I'm of the opinion that we should remove the exclusive option for PG12. The
> recovery changes recently committed will completely break any automated
> recovery written for PG <= 11 so we might as go all the way.

Yeah, after chatting with David about that this morning, I agree.
Breaking everything around how restores works means that anyone doing
backups today needs to re-evaluate their restore procedures, even if
it's not an automated script, so it makes sense to also remove the
deprecated exclusive backup mode at the same time.

> Requiring non-exclusive backup will be an easy change for tools that already
> understand non-exclusive backup, which has been available in the last three
> versions of Postgres.
>
> The exclusive backup method tries to make a hard problem look simple but
> does not succeed and causes issues besides.  Performing backups properly
> *is* hard and we don't do anyone a service by making it look like something
> can be done with a trivial bash script.

+1

Thanks!

Stephen

Attachment

Re: Updated backup APIs for non-exclusive backups

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> On Sun, 2018-11-25 at 16:04 -0500, Stephen Frost wrote:
> > > Yes, you can come up with a post-backup script that somehow communicates
> > > with your pre-backup script to get the information, but it sure is
> > > inconvenient.  Simplicity is good in backup solutions, because complicated
> > > things tend to break more easily.
> >
> > Not sure what communication is necessary here..?   The data needed for the backup
> > label file comes from pg_stop_backup in a non-exclusive backup.
>
> Right, and pg_stop_backup has to be run from the "pre-backup" script.

No, pg_stop_backup has has to be run by a process that's been connected
since the pre-backup script ran, but you could certainly have some
process that starts from pre-backup, detaches, and then waits for a
signal from the post-backup script to run pg_stop_backup and record the
results somewhere.

Yes, that's rather annoying to do from a bash script, but I'm well past
the point of thinking that bash scripts are a good way to run backups in
PG.  Your efforts in this area might be better put to use by writing a
good tool which works with the non-exclusive API but can be called from
the pre/post backup scripts your backup solution provides.
Unfortunately, it seems pretty unlikely that there's a good generic way
to write such a tool or we could include it in PG.

Thanks!

Stephen

Attachment

Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
On Mon, 2018-11-26 at 10:18 +0100, Magnus Hagander wrote:
> > [about managing backups from pre- and post-file-system-backup scrips]
> > > I agree with your point that it's not an uncommon thing to need. If a good solution
> > > for it can be proposed that requires the exclusive backup interface, then I wouldn't
> > > be against un-deprecating that.  But that's going to require a lot more than just a
> > > documentation change, IMHO. What could perhaps be handled with a documentation change,
> > > however, is to document a good way for this type of setup to use the new interfaces.
> > 
> > Good point, and it puts the ball in my court :^)
> 
> Enjoy :)

I have come up with some sample code here:
https://github.com/cybertec-postgresql/safe-backup

This just uses bash and psql.
Does that look reasonably safe?

It's probably too big to be introduced into the documentation, but maybe
we could add it to the Wiki.

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 12/11/18 5:33 PM, Laurenz Albe wrote:
> On Mon, 2018-11-26 at 10:18 +0100, Magnus Hagander wrote:
>>> [about managing backups from pre- and post-file-system-backup scrips]
>>>> I agree with your point that it's not an uncommon thing to need. If a good solution
>>>> for it can be proposed that requires the exclusive backup interface, then I wouldn't
>>>> be against un-deprecating that.  But that's going to require a lot more than just a
>>>> documentation change, IMHO. What could perhaps be handled with a documentation change,
>>>> however, is to document a good way for this type of setup to use the new interfaces.
>>>
>>> Good point, and it puts the ball in my court :^)
>>
>> Enjoy :)
> 
> I have come up with some sample code here:
> https://github.com/cybertec-postgresql/safe-backup
> 
> This just uses bash and psql.
> Does that look reasonably safe?
> 
> It's probably too big to be introduced into the documentation, but maybe
> we could add it to the Wiki.

My bash-fu is not all that great, but it seems to me you could do this
all in one script and forgo the table entirely.  Just do the file copy
at about line 60 in pgpre.sh.

It does look workable to me, just wonder if it could be simpler.

-- 
-David
david@pgmasters.net


Re: Updated backup APIs for non-exclusive backups

From
Laurenz Albe
Date:
On Tue, 2018-12-11 at 23:43 -0500, David Steele wrote:
> > > > [about managing backups from pre- and post-file-system-backup scrips]
> > I have come up with some sample code here:
> > https://github.com/cybertec-postgresql/safe-backup
> > 
> > This just uses bash and psql.
> > Does that look reasonably safe?
> > 
> > It's probably too big to be introduced into the documentation, but maybe
> > we could add it to the Wiki.
> 
> My bash-fu is not all that great, but it seems to me you could do this
> all in one script and forgo the table entirely.  Just do the file copy
> at about line 60 in pgpre.sh.
> 
> It does look workable to me, just wonder if it could be simpler.

Thanks for having a look.  Yes, it looks appallingly complicated.

Sure, if I knew there was a file to write, and I knew where to write
it, I could do it in the "pre" script.  But since I have no idea how the
actual backup is performed and how the "backup_label" file is going to
be saved, I thought it best to return the information to the caller and
persist it somewhere, and only the "post" script can actually return the
information.

Yours,
Laurenz Albe



Re: Updated backup APIs for non-exclusive backups

From
David Steele
Date:
On 12/12/18 12:58 AM, Laurenz Albe wrote:
> On Tue, 2018-12-11 at 23:43 -0500, David Steele wrote:
>>>>> [about managing backups from pre- and post-file-system-backup scrips]
>>> I have come up with some sample code here:
>>> https://github.com/cybertec-postgresql/safe-backup
>>>
>>> This just uses bash and psql.
>>> Does that look reasonably safe?
>>>
>>> It's probably too big to be introduced into the documentation, but maybe
>>> we could add it to the Wiki.
>>
>> My bash-fu is not all that great, but it seems to me you could do this
>> all in one script and forgo the table entirely.  Just do the file copy
>> at about line 60 in pgpre.sh.
>>
>> It does look workable to me, just wonder if it could be simpler.
> 
> Thanks for having a look.  Yes, it looks appallingly complicated.
> 
> Sure, if I knew there was a file to write, and I knew where to write
> it, I could do it in the "pre" script.  

Good point!

> But since I have no idea how the
> actual backup is performed and how the "backup_label" file is going to
> be saved, I thought it best to return the information to the caller and
> persist it somewhere, and only the "post" script can actually return the
> information.

Perhaps it could be done in a bash file that is included into the user's
script, rather than doing pre/post.

So, they would have start and stop backup methods they could call that
would manage the psql process in the background.  In that case it should
be possible to eliminate the table.

I do see the table as a weak spot.  It appears to work, but does add a
lot of complexity.  Limiting to one backup at a time can be a feature,
though.  In pgBackRest we use advisory locks to enforce this.

-- 
-David
david@pgmasters.net