Re: Updated backup APIs for non-exclusive backups - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Updated backup APIs for non-exclusive backups
Date
Msg-id CABUevEzG4SXXzNZ_ZBb8H6GeVSvrzi63V51Q46HSsvU=f2zE+g@mail.gmail.com
Whole thread Raw
In response to Re: Updated backup APIs for non-exclusive backups  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: Updated backup APIs for non-exclusive backups  (Robert Haas <robertmhaas@gmail.com>)
Re: Updated backup APIs for non-exclusive backups  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
On Wed, Apr 20, 2016 at 1:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Apr 17, 2016 at 1:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
> On Wed, Apr 13, 2016 at 4:07 AM, Noah Misch <noah@leadboat.com> wrote:
>>
>> On Tue, Apr 12, 2016 at 10:08:23PM +0200, Magnus Hagander wrote:
>> > On Tue, Apr 12, 2016 at 8:39 AM, Noah Misch <noah@leadboat.com> wrote:
>> > > On Mon, Apr 11, 2016 at 11:22:27AM +0200, Magnus Hagander wrote:
>> > > > Well, if we *don't* do the rewrite before we release it, then we
>> > > > have to
>> > > > instead put information about the new version of the functions into
>> > > > the
>> > > old
>> > > > structure I think.
>> > > >
>> > > > So I think it's an open issue.
>> > >
>> > > Works for me...
>> > >
>> > > [This is a generic notification.]
>> > >
>> > > The above-described topic is currently a PostgreSQL 9.6 open item.
>> > > Magnus,
>> > > since you committed the patch believed to have created it, you own
>> > > this
>> > > open
>> > > item.  If that responsibility lies elsewhere, please let us know whose
>> > > responsibility it is to fix this.  Since new open items may be
>> > > discovered
>> > > at
>> > > any time and I want to plan to have them all fixed well in advance of
>> > > the
>> > > ship
>> > > date, I will appreciate your efforts toward speedy resolution.  Please
>> > > present, within 72 hours, a plan to fix the defect within seven days
>> > > of
>> > > this
>> > > message.  Thanks.
>> > >
>> >
>> > I won't have time to do the bigger rewrite/reordeirng by then, but I can
>> > certainly commit to having the smaller updates done to cover the new
>> > functionality in less than a week. If nothing else, that'll be something
>> > for me to do on the flight over to pgconf.us.
>>
>> Thanks for that plan; it sounds good.
>
>
> Here's a suggested patch.
>
> There is some duplication between the non-exclusive and exclusive backup
> sections, but I wanted to make sure that each set of instructions can just
> be followed top-to-bottom.
>
> I've also removed some tips that aren't really necessary as part of the
> step-by-step instructions in order to keep things from exploding in size.
>
> Finally, I've changed references to "backup dump" to just be "backup",
> because it's confusing to call them something with dumps in when it's not
> pg_dump. Enough that I got partially confused myself while editing...
>
> Comments?

+    Low level base backups can be made in a non-exclusive or an exclusive
+    way. The non-exclusive method is recommended and the exclusive one will
+    at some point be deprecated and removed.

I don't object to add a non-exclusive mode of low level backup,
but I disagree to mark an exclusive backup as deprecated at least
until we can alleviate some pains that a non-exclusive mode causes.

Note that we have not marked them as deprecated. We're just giving warnings that they will be deprecated.
 

One example of the pain, in a non-exclusive backup, we need to keep
the IDLE connection which was used to execute pg_start_backup(),
until the end of backup. Of course a backup can take a very
long time. In this case the IDLE connection also needs to remain
for such a long time. If it's accidentally terminated (e.g., because
of IDLE connection), the backup fails and needs to be taken again
from the beginning.


Yes, it's a bit annoying. But it's something you can handle. Unlike the problems that exist with an exclusive backup which you *cannot* handle from the backup script/commands.

 

Another pain in a non-exclusive backup is to have to execute both
pg_start_backup() and pg_stop_backup() on the same connection.

Pretty sure that's the same one you just listed?

 
Please imagine the case where psql is used to execute those two
backup functions (I believe that there are many users who do this).
For example,

    psql -c "SELECT pg_start_backup()"
    rsync, cp, tar, storage backup, or something
    psql -c "SELECT pg_stop_backup()"

A non-exclusive backup breaks the above very simple steps because
two backup functions are executed on different connections.
So, how should we modify the steps for a non-exclusive backup?

You should at least not use psql in that way. You need to open psql in a pipe and drive it through that. Or use a more appropriate client.


Basically we need to pause psql after pg_start_backup(), signal it
to resume after the copy of database cluster is taken, and make
it execute pg_stop_backup(). I'm afraid that the backup script
will be complicated because of this pain of non-exclusive backup.

Yes, if you insist on using psql - which is not a good tool for this - then yes. But you could for example make it a psql script that does something along
SELECT pg_start_backup();
\! rsync ...
SELECT pg_stop_backup()

Or as said above, drive it through a pipe instead. Or use an appropriate client library.

The bottom line is yes, it makes it a bit worse. But it is something you can handle quite simply from your client. And wen you do, it will be *right*.

You have no way at all to fix the problem if your server crashes during an exclusive backup, for example. No matter how you connect to the database, because that's not where that problem is.

 

+     The <function>pg_stop_backup</> will return one row with three
+     values. The second of these fields should be written to a file named
+     <filename>backup_label</> in the root directory of the backup. The
+     third field should be written to a file named
+     <filename>tablespace_map</> unless the field is empty.

How should we write those two values to different files when
we execute pg_stop_backup() via psql? Whole output of
pg_stop_backup() should be written to a transient file and
it should be filtered and written to different two files by
using some Linux commands? This also seems to make the backup
script more complicated.

Yes, you would have to do that. And yes, psql is not a good tool for this. So the solution is to not force yourself to use a tool that doesn't work well for this problem. But if you do, then yes, you will have to go through some extra pain to handle it.

--

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Declarative partitioning
Next
From: Andres Freund
Date:
Subject: max_parallel_degree > 0 for 9.6 beta