Thread: Allowing multiple concurrent base backups

Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
Now that we have a basic over-the-wire base backup capability in
walsender, it would be nice to allow taking multiple base backups at the
same time. It might not seem very useful at first, but it makes it
easier to set up standbys for small databases. At the moment, if you
want to set up two standbys, you have to either take a single base
backup and distribute it to both standbys, or somehow coordinate that
they don't try to take the base backup at the same time. Also, you don't
want initializing a standby to conflict with a nightly backup cron script.

So, this patch modifies the internal do_pg_start/stop_backup functions
so that in addition to the traditional mode of operation, where a
backup_label file is created in the data directory where it's backed up
along with all other files, the backup label file is be returned to the
caller, and the caller is responsible for including it in the backup.
The code in replication/basebackup.c includes it in the tar file that's
streamed the client, as "backup_label".

To make that safe, I've changed forcePageWrites into an integer.
Whenever a backup is started, it's incremented, and when one ends, it's
decremented. When forcePageWrites == 0, there's no backup in progress.

The user-visible pg_start_backup() function is not changed. You can only
have one backup started that way in progress at a time. But you can do
streaming base backups at the same time with traditional pg_start_backup().

I implemented this in two ways, and can't decide which I like better:

1. The contents of the backup label file are returned to the caller of
do_pg_start_backup() as a palloc'd string.

2. do_pg_start_backup() creates a temporary file that the backup label
is written to (instead of "backup_label").

Implementation 1 changes more code, as pg_start/stop_backup() need to be
changed to write/read from memory instead of file, but the result isn't
any more complicated. Nevertheless, I somehow feel more comfortable with 2.

Patches for both approaches attached. They're also available in my
github repository at git@github.com:hlinnaka/postgres.git.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Allowing multiple concurrent base backups

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I implemented this in two ways, and can't decide which I like better:

> 1. The contents of the backup label file are returned to the caller of 
> do_pg_start_backup() as a palloc'd string.

> 2. do_pg_start_backup() creates a temporary file that the backup label 
> is written to (instead of "backup_label").

> Implementation 1 changes more code, as pg_start/stop_backup() need to be 
> changed to write/read from memory instead of file, but the result isn't 
> any more complicated. Nevertheless, I somehow feel more comfortable with 2.

Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.  How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.
        regards, tom lane


Re: Allowing multiple concurrent base backups

From
Magnus Hagander
Date:
On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> I implemented this in two ways, and can't decide which I like better:
>
>> 1. The contents of the backup label file are returned to the caller of
>> do_pg_start_backup() as a palloc'd string.
>
>> 2. do_pg_start_backup() creates a temporary file that the backup label
>> is written to (instead of "backup_label").
>
>> Implementation 1 changes more code, as pg_start/stop_backup() need to be
>> changed to write/read from memory instead of file, but the result isn't
>> any more complicated. Nevertheless, I somehow feel more comfortable with 2.
>
> Seems like either one of these is fairly problematic in that you have to
> have some monstrous kluge to get the backup_label file to appear with
> the right name in the tarfile.  How badly do we actually need this?
> I don't think the use-case for concurrent base backups is all that large
> in practice given the I/O hit it's going to involve.

I think it can be done cleaner in the tar file injection - I've been
chatting with Heikki offlist about that. Not sure, but I have a
feeling it does.

As for the use-case, it doesn't necessarily involve a huge I/O hit if
either the entire DB is in RAM (not all that uncommon - though then
the backup finishes quickly as well), or if the *client* is slow, thus
making the server backlogged on sending the base backup.

Having it possible to do it concurrently also makes for *much* easier
use of the feature. It might be just about overlapping with a few
seconds, for example - which would probably have no major effect, but
takes a lot more code on the other end to work around.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 11.01.2011 20:51, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> I implemented this in two ways, and can't decide which I like better:
>
>> 1. The contents of the backup label file are returned to the caller of
>> do_pg_start_backup() as a palloc'd string.
>
>> 2. do_pg_start_backup() creates a temporary file that the backup label
>> is written to (instead of "backup_label").
>
>> Implementation 1 changes more code, as pg_start/stop_backup() need to be
>> changed to write/read from memory instead of file, but the result isn't
>> any more complicated. Nevertheless, I somehow feel more comfortable with 2.
>
> Seems like either one of these is fairly problematic in that you have to
> have some monstrous kluge to get the backup_label file to appear with
> the right name in the tarfile.

Oh. I'm surprised you feel that way - that part didn't feel ugly or 
kludgey at all to me.

>  How badly do we actually need this?
> I don't think the use-case for concurrent base backups is all that large
> in practice given the I/O hit it's going to involve.

It makes it very convenient to set up standbys, without having to worry 
that you'll conflict e.g with a nightly backup. I don't imagine people 
will use streaming base backups for very large databases anyway.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Josh Berkus
Date:
> It makes it very convenient to set up standbys, without having to worry
> that you'll conflict e.g with a nightly backup. I don't imagine people
> will use streaming base backups for very large databases anyway.

Also, imagine that you're provisioning a 10-node replication cluster on
EC2.  This would make that worlds easier.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Allowing multiple concurrent base backups

From
Tom Lane
Date:
Magnus Hagander <magnus@hagander.net> writes:
> On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Seems like either one of these is fairly problematic in that you have to
>> have some monstrous kluge to get the backup_label file to appear with
>> the right name in the tarfile. �How badly do we actually need this?
>> I don't think the use-case for concurrent base backups is all that large
>> in practice given the I/O hit it's going to involve.

> I think it can be done cleaner in the tar file injection - I've been
> chatting with Heikki offlist about that. Not sure, but I have a
> feeling it does.

One point that I'm particularly interested to see how you'll kluge it
is ensuring that the tarball contains only the desired temp data and not
also the "real" $PGDATA/backup_label, should there be a normal base
backup being done concurrently with the streamed one.

The whole thing just seems too fragile and dangerous to be worth dealing
with given that actual usage will be a corner case.  *I* sure wouldn't
trust it to work when the chips were down.
        regards, tom lane


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 11.01.2011 20:17, Heikki Linnakangas wrote:
> Patches for both approaches attached. They're also available in my
> github repository at git@github.com:hlinnaka/postgres.git.

Just so people won't report the same issues again, a couple of bugs have 
already cropped up (thanks Magnus):

* a backup_label file in the data directory should now not be tarred up 
- we're injecting a different backup_label file there.

* pg_start_backup_callback needs to be updated to just decrement 
forcePageWrites, not reset it to 'false'

* pg_stop_backup() decrements forcePageWrites twice. oops.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Dimitri Fontaine
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

> Now that we have a basic over-the-wire base backup capability in walsender,
> it would be nice to allow taking multiple base backups at the same time.

I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 11.01.2011 21:50, Dimitri Fontaine wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>
>> Now that we have a basic over-the-wire base backup capability in walsender,
>> it would be nice to allow taking multiple base backups at the same time.
>
> I would prefer to be able to take a base backup from a standby, or is
> that already possible?  What about cascading the wal shipping?  Maybe
> those ideas are much bigger projects, but if I don't ask, I don't get to
> know :)

Yeah, those are bigger projects..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Jeff Davis
Date:
On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
> So, this patch modifies the internal do_pg_start/stop_backup functions 
> so that in addition to the traditional mode of operation, where a 
> backup_label file is created in the data directory where it's backed up 
> along with all other files, the backup label file is be returned to the 
> caller, and the caller is responsible for including it in the backup. 
> The code in replication/basebackup.c includes it in the tar file that's 
> streamed the client, as "backup_label".

Perhaps we can use this more intelligent form of base backup to
differentiate between:a. a primary that has crashed while a backup was in progress; andb. an online backup that is
beingrestored.
 

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):
1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and2. In the same situation, if the WAL referenced in the
backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?

Regards,Jeff Davis



Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 11.01.2011 22:16, Jeff Davis wrote:
> On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
>> So, this patch modifies the internal do_pg_start/stop_backup functions
>> so that in addition to the traditional mode of operation, where a
>> backup_label file is created in the data directory where it's backed up
>> along with all other files, the backup label file is be returned to the
>> caller, and the caller is responsible for including it in the backup.
>> The code in replication/basebackup.c includes it in the tar file that's
>> streamed the client, as "backup_label".
>
> Perhaps we can use this more intelligent form of base backup to
> differentiate between:
>   a. a primary that has crashed while a backup was in progress; and
>   b. an online backup that is being restored.
>
> Allowing the user to do an unrestricted file copy as a base backup
> doesn't allow us to make that differentiation. That lead to the two bugs
> that we fixed in StartupXLOG(). And right now there are still two
> problems remaining (albeit less severe):
>
>   1. If it's a primary recovering from a crash, and there is a
> backup_label file, and the WAL referenced in the backup_label exists,
> then it does a bunch of extra work during recovery; and
>   2. In the same situation, if the WAL referenced in the backup_label
> does not exist, then it PANICs with a HINT to tell you to remove the
> backup_label.
>
> Is this an opportunity to solve these problems and simplify the code?

It won't change the situation for pg_start_backup(), but with the patch 
the base backups done via streaming won't have those issues, because 
backup_label is not created (with that name) in the master.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Jeff Davis
Date:
On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
> >   1. If it's a primary recovering from a crash, and there is a
> > backup_label file, and the WAL referenced in the backup_label exists,
> > then it does a bunch of extra work during recovery; and
> >   2. In the same situation, if the WAL referenced in the backup_label
> > does not exist, then it PANICs with a HINT to tell you to remove the
> > backup_label.
> >
> > Is this an opportunity to solve these problems and simplify the code?
> 
> It won't change the situation for pg_start_backup(), but with the patch 
> the base backups done via streaming won't have those issues, because 
> backup_label is not created (with that name) in the master.

Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.

Regards,Jeff Davis




Re: Allowing multiple concurrent base backups

From
Magnus Hagander
Date:
On Tue, Jan 11, 2011 at 22:51, Jeff Davis <pgsql@j-davis.com> wrote:
> On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
>> >   1. If it's a primary recovering from a crash, and there is a
>> > backup_label file, and the WAL referenced in the backup_label exists,
>> > then it does a bunch of extra work during recovery; and
>> >   2. In the same situation, if the WAL referenced in the backup_label
>> > does not exist, then it PANICs with a HINT to tell you to remove the
>> > backup_label.
>> >
>> > Is this an opportunity to solve these problems and simplify the code?
>>
>> It won't change the situation for pg_start_backup(), but with the patch
>> the base backups done via streaming won't have those issues, because
>> backup_label is not created (with that name) in the master.
>
> Do you think we should change the backup protocol for normal base
> backups to try to fix this? Or do you think that the simplicity of
> unrestricted file copy is worth these problems?
>
> We could probably make some fairly minor changes, like making a file on
> the primary and telling users to exclude it from any base backup. The
> danger, of course, is that they do copy it, and their backup is
> compromised.


I think keeping the flexibility is important. If it does add an extra
step I think that's ok once we have pg_basebackup, but it must be
reasonably *safe*. Corrupt backups from forgetting to exclude a file
seems not so.

But if the problem is you forgot to exclude it, can't you just remove
it at a later time?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Allowing multiple concurrent base backups

From
Robert Haas
Date:
On Jan 11, 2011, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The whole thing just seems too fragile and dangerous to be worth dealing
> with given that actual usage will be a corner case.  *I* sure wouldn't
> trust it to work when the chips were down.

I hope this assessment proves to be incorrect, because like Magnus and Heikki, I think this will be a major usability
improvement.It takes us from "there's a way to do that" to "it just works". 

...Robert

Re: Allowing multiple concurrent base backups

From
David Fetter
Date:
On Tue, Jan 11, 2011 at 05:06:34PM -0500, Robert Haas wrote:
> On Jan 11, 2011, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The whole thing just seems too fragile and dangerous to be worth dealing
> > with given that actual usage will be a corner case.  *I* sure wouldn't
> > trust it to work when the chips were down.
> 
> I hope this assessment proves to be incorrect, because like Magnus and Heikki, I think this will be a major usability
improvement.It takes us from "there's a way to do that" to "it just works".
 

(It just works)++ :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Allowing multiple concurrent base backups

From
Jeff Davis
Date:
On Tue, 2011-01-11 at 23:07 +0100, Magnus Hagander wrote: 
> I think keeping the flexibility is important. If it does add an extra
> step I think that's ok once we have pg_basebackup, but it must be
> reasonably *safe*. Corrupt backups from forgetting to exclude a file
> seems not so.

Agreed.

> But if the problem is you forgot to exclude it, can't you just remove
> it at a later time?

If you think you are recovering the primary, and it's really the backup,
then you get corruption. It's too late to remove a file after that
(unless you have a backup of your backup ;) ).

If you think you are restoring a backup, and it's really a primary that
crashed, then you run into one of the two problems that I mentioned
(which are less severe than corruption, but very annoying).

Regards,Jeff Davis



Re: Allowing multiple concurrent base backups

From
Cédric Villemain
Date:
2011/1/11 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
> On 11.01.2011 21:50, Dimitri Fontaine wrote:
>>
>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>>
>>> Now that we have a basic over-the-wire base backup capability in
>>> walsender,
>>> it would be nice to allow taking multiple base backups at the same time.
>>
>> I would prefer to be able to take a base backup from a standby, or is
>> that already possible?  What about cascading the wal shipping?  Maybe
>> those ideas are much bigger projects, but if I don't ask, I don't get to
>> know :)
>
> Yeah, those are bigger projects..

Way more interesting, IMHO.
Feature to allow multiple backup at the same time looks useless to me.
If DB is small, then it is not that hard to do that before or after a
possible nightly backup.
If DB is large necessary to comment ?

The only case I find interesting is if you can use the bandwith of
more than one ethernet device, else I would use rsync between nodes
after the inital basebackup, pretty sure.

>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 11.01.2011 23:51, Jeff Davis wrote:
> On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
>>>    1. If it's a primary recovering from a crash, and there is a
>>> backup_label file, and the WAL referenced in the backup_label exists,
>>> then it does a bunch of extra work during recovery; and
>>>    2. In the same situation, if the WAL referenced in the backup_label
>>> does not exist, then it PANICs with a HINT to tell you to remove the
>>> backup_label.
>>>
>>> Is this an opportunity to solve these problems and simplify the code?
>>
>> It won't change the situation for pg_start_backup(), but with the patch
>> the base backups done via streaming won't have those issues, because
>> backup_label is not created (with that name) in the master.
>
> Do you think we should change the backup protocol for normal base
> backups to try to fix this? Or do you think that the simplicity of
> unrestricted file copy is worth these problems?
>
> We could probably make some fairly minor changes, like making a file on
> the primary and telling users to exclude it from any base backup. The
> danger, of course, is that they do copy it, and their backup is
> compromised.

Yeah, I don't think it's a good idea to provide such a foot-gun.

Last time we discussed this there was the idea of creating a file in 
$PGDATA, and removing read-permissions from it, so that it couldn't be 
accidentally included in the backup. Even with that safeguard, it 
doesn't feel very safe - a backup running as root or some other special 
privileges might still be able to read it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
marcin mank
Date:
On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Seems like either one of these is fairly problematic in that you have to
>>> have some monstrous kluge to get the backup_label file to appear with
>>> the right name in the tarfile.  How badly do we actually need this?
>>> I don't think the use-case for concurrent base backups is all that large
>>> in practice given the I/O hit it's going to involve.
>
>> I think it can be done cleaner in the tar file injection - I've been
>> chatting with Heikki offlist about that. Not sure, but I have a
>> feeling it does.
>
> One point that I'm particularly interested to see how you'll kluge it
> is ensuring that the tarball contains only the desired temp data and not
> also the "real" $PGDATA/backup_label, should there be a normal base
> backup being done concurrently with the streamed one.
>
> The whole thing just seems too fragile and dangerous to be worth dealing
> with given that actual usage will be a corner case.  *I* sure wouldn't
> trust it to work when the chips were down.
>

Maybe if pg_start_backup() notices that there is another backup
running should block waiting for another session to run
pg_stop_backup() ? Or have a new function like pg_start_backup_wait()
?

Considering that parallell base backups would be io-bound (or
network-bound), there is little need to actually run them in parallell
.

Greetings
Marcin


Re: Allowing multiple concurrent base backups

From
David Fetter
Date:
On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
> On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Magnus Hagander <magnus@hagander.net> writes:
> >> On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Seems like either one of these is fairly problematic in that you have to
> >>> have some monstrous kluge to get the backup_label file to appear with
> >>> the right name in the tarfile.  How badly do we actually need this?
> >>> I don't think the use-case for concurrent base backups is all that large
> >>> in practice given the I/O hit it's going to involve.
> >
> >> I think it can be done cleaner in the tar file injection - I've been
> >> chatting with Heikki offlist about that. Not sure, but I have a
> >> feeling it does.
> >
> > One point that I'm particularly interested to see how you'll kluge it
> > is ensuring that the tarball contains only the desired temp data and not
> > also the "real" $PGDATA/backup_label, should there be a normal base
> > backup being done concurrently with the streamed one.
> >
> > The whole thing just seems too fragile and dangerous to be worth dealing
> > with given that actual usage will be a corner case.  *I* sure wouldn't
> > trust it to work when the chips were down.
> 
> Maybe if pg_start_backup() notices that there is another backup
> running should block waiting for another session to run
> pg_stop_backup() ? Or have a new function like pg_start_backup_wait()
> ?
> 
> Considering that parallell base backups would be io-bound (or
> network-bound), there is little need to actually run them in parallell

That's not actually true.  Backups at the moment are CPU-bound, and
running them in parallel is one way to make them closer to I/O-bound,
which is what they *should* be.

There are other proposals out there, and some work being done, to make
backups less dependent on CPU, among them:

- Making the on-disk representation smaller
- Making COPY more efficient

As far as I know, none of this work is public yet.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 12.01.2011 17:15, David Fetter wrote:
> On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
>> Considering that parallell base backups would be io-bound (or
>> network-bound), there is little need to actually run them in parallell
>
> That's not actually true.  Backups at the moment are CPU-bound, and
> running them in parallel is one way to make them closer to I/O-bound,
> which is what they *should* be.

That's a different kind of "parallel". We're talking about taking 
multiple backups in parallel, each using one process, and you're talking 
about taking one backup using multiple parallel processes or threads.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
David Fetter
Date:
On Wed, Jan 12, 2011 at 05:17:38PM +0200, Heikki Linnakangas wrote:
> On 12.01.2011 17:15, David Fetter wrote:
> >On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
> >>Considering that parallell base backups would be io-bound (or
> >>network-bound), there is little need to actually run them in parallell
> >
> >That's not actually true.  Backups at the moment are CPU-bound, and
> >running them in parallel is one way to make them closer to I/O-bound,
> >which is what they *should* be.
> 
> That's a different kind of "parallel". We're talking about taking
> multiple backups in parallel, each using one process, and you're
> talking about taking one backup using multiple parallel processes or
> threads.

Good point.  The idea that IO/network bandwidth is always saturated by
one backup just isn't true, though.  Take the case of multiple
independent NICs, e.g.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Allowing multiple concurrent base backups

From
Aidan Van Dyk
Date:
On Wed, Jan 12, 2011 at 10:15 AM, David Fetter <david@fetter.org> wrote:

>> Considering that parallell base backups would be io-bound (or
>> network-bound), there is little need to actually run them in parallell
>
> That's not actually true.  Backups at the moment are CPU-bound, and
> running them in parallel is one way to make them closer to I/O-bound,
> which is what they *should* be.

Remember, we're talking about filesystem base backups here.  If you're
CPU can't handle a stream from disk -> network, byte for byte (maybe
encrypting it), then you've spend *WAAAAY* to much on your storage
sub-system, and way to little on CPU.

I can see trying to "parallize" the base backup such that each
table-space could be run concurrently, but that's about it.

> There are other proposals out there, and some work being done, to make
> backups less dependent on CPU, among them:
>
> - Making the on-disk representation smaller
> - Making COPY more efficient
>
> As far as I know, none of this work is public yet.

pg_dump is another story.  But it's not related to base backups for
PIT Recovery/Replication.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.


Re: Allowing multiple concurrent base backups

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 12.01.2011 17:15, David Fetter wrote:
>> On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
>>> Considering that parallell base backups would be io-bound (or
>>> network-bound), there is little need to actually run them in parallell
>> 
>> That's not actually true.  Backups at the moment are CPU-bound, and
>> running them in parallel is one way to make them closer to I/O-bound,
>> which is what they *should* be.

> That's a different kind of "parallel". We're talking about taking 
> multiple backups in parallel, each using one process, and you're talking 
> about taking one backup using multiple parallel processes or threads.

Even more to the point, you're confusing pg_dump with a base backup.
The reason pg_dump eats a lot of CPU is primarily COPY's data conversion
and formatting requirements, none of which will happen in a base backup
(streaming or otherwise).
        regards, tom lane


Re: Allowing multiple concurrent base backups

From
David Fetter
Date:
On Wed, Jan 12, 2011 at 10:24:31AM -0500, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > On 12.01.2011 17:15, David Fetter wrote:
> >> On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
> >>> Considering that parallell base backups would be io-bound (or
> >>> network-bound), there is little need to actually run them in
> >>> parallell
> >> 
> >> That's not actually true.  Backups at the moment are CPU-bound,
> >> and running them in parallel is one way to make them closer to
> >> I/O-bound, which is what they *should* be.
> 
> > That's a different kind of "parallel". We're talking about taking
> > multiple backups in parallel, each using one process, and you're
> > talking about taking one backup using multiple parallel processes
> > or threads.
> 
> Even more to the point, you're confusing pg_dump with a base backup.
> The reason pg_dump eats a lot of CPU is primarily COPY's data
> conversion and formatting requirements, none of which will happen in
> a base backup (streaming or otherwise).

Oops.  That'll teach me to post before coffee. :P

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Allowing multiple concurrent base backups

From
"Ross J. Reedstrom"
Date:
On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:
> 
> > It makes it very convenient to set up standbys, without having to worry
> > that you'll conflict e.g with a nightly backup. I don't imagine people
> > will use streaming base backups for very large databases anyway.
> 
> Also, imagine that you're provisioning a 10-node replication cluster on
> EC2.  This would make that worlds easier.

Hmm, perhaps. My concern is that a naive attempt to do that is going to
have 10 base-backups happening at the same time, completely slamming the
master, and none of them completing is a reasonable time. Is this
possible, or is it that simultaneity will buy you hot caches and backup
#2 -> #10 all run faster?

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE



Re: Allowing multiple concurrent base backups

From
Robert Haas
Date:
On Thu, Jan 13, 2011 at 2:19 PM, Ross J. Reedstrom <reedstrm@rice.edu> wrote:
> On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:
>>
>> > It makes it very convenient to set up standbys, without having to worry
>> > that you'll conflict e.g with a nightly backup. I don't imagine people
>> > will use streaming base backups for very large databases anyway.
>>
>> Also, imagine that you're provisioning a 10-node replication cluster on
>> EC2.  This would make that worlds easier.
>
> Hmm, perhaps. My concern is that a naive attempt to do that is going to
> have 10 base-backups happening at the same time, completely slamming the
> master, and none of them completing is a reasonable time. Is this
> possible, or is it that simultaneity will buy you hot caches and backup
> #2 -> #10 all run faster?

That's going to depend on the situation.  If the database fits in
memory, then it's just going to work.  If it fits on disk, it's less
obvious whether it'll be good or bad, but an arbitrary limitation here
doesn't serve us well.

P.S. Your reply-to header is busted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Allowing multiple concurrent base backups

From
Josh Berkus
Date:
On 1/13/11 12:11 PM, Robert Haas wrote:
> That's going to depend on the situation.  If the database fits in
> memory, then it's just going to work.  If it fits on disk, it's less
> obvious whether it'll be good or bad, but an arbitrary limitation here
> doesn't serve us well.

FWIW, if we had this feature right now in 9.0 we (PGX) would be using
it.  We run into the case of DB in memory, multiple slaves fairly often
these days.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 13.01.2011 22:57, Josh Berkus wrote:
> On 1/13/11 12:11 PM, Robert Haas wrote:
>> That's going to depend on the situation.  If the database fits in
>> memory, then it's just going to work.  If it fits on disk, it's less
>> obvious whether it'll be good or bad, but an arbitrary limitation here
>> doesn't serve us well.
>
> FWIW, if we had this feature right now in 9.0 we (PGX) would be using
> it.  We run into the case of DB in memory, multiple slaves fairly often
> these days.

Anyway, here's an updated patch with all the known issues fixed.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Allowing multiple concurrent base backups

From
Magnus Hagander
Date:
On Thu, Jan 13, 2011 at 22:32, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 13.01.2011 22:57, Josh Berkus wrote:
>>
>> On 1/13/11 12:11 PM, Robert Haas wrote:
>>>
>>> That's going to depend on the situation.  If the database fits in
>>> memory, then it's just going to work.  If it fits on disk, it's less
>>> obvious whether it'll be good or bad, but an arbitrary limitation here
>>> doesn't serve us well.
>>
>> FWIW, if we had this feature right now in 9.0 we (PGX) would be using
>> it.  We run into the case of DB in memory, multiple slaves fairly often
>> these days.
>
> Anyway, here's an updated patch with all the known issues fixed.

It's kind of crude that do_pg_stop_backup() has to parse the
backuplabel with sscanf() when it's coming from a non-exclusive backup
- we could pass the location as a parameter instead, to make it
cleaner. There might be a point to it being the same in both
codepaths, but I'm not sure it's that important...

Doesn't this code put the backup label in *every* tarfile, and not
just the main one? From what I can tell the code in
SendBackupDirectory() relies on labelfile being NULL in tar files for
"other tablespaces", but the caller never sets this.

Finally, I'd move the addition of the labelfile to it's own function,
but that's just me ;)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Allowing multiple concurrent base backups

From
Simon Riggs
Date:
On Thu, 2011-01-13 at 23:32 +0200, Heikki Linnakangas wrote:
> On 13.01.2011 22:57, Josh Berkus wrote:
> > On 1/13/11 12:11 PM, Robert Haas wrote:
> >> That's going to depend on the situation.  If the database fits in
> >> memory, then it's just going to work.  If it fits on disk, it's less
> >> obvious whether it'll be good or bad, but an arbitrary limitation here
> >> doesn't serve us well.
> >
> > FWIW, if we had this feature right now in 9.0 we (PGX) would be using
> > it.  We run into the case of DB in memory, multiple slaves fairly often
> > these days.
> 
> Anyway, here's an updated patch with all the known issues fixed.

It's good we have this as an option and I like the way you've done this.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 13.01.2011 23:32, Heikki Linnakangas wrote:
> Anyway, here's an updated patch with all the known issues fixed.

Another updated patch. Fixed bitrot, and addressed the privilege issue
Fujii-san raised here:
http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com.
I changed the privilege checks so that pg_start/stop_backup() functions
require superuser privileges again, but not for a base backup via the
replication protocol (replication privilege is needed to establish a
replication connection to begin with).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Allowing multiple concurrent base backups

From
Magnus Hagander
Date:
On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 13.01.2011 23:32, Heikki Linnakangas wrote:
>>
>> Anyway, here's an updated patch with all the known issues fixed.
>
> Another updated patch. Fixed bitrot, and addressed the privilege issue
> Fujii-san raised here:
> http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com.
> I changed the privilege checks so that pg_start/stop_backup() functions
> require superuser privileges again, but not for a base backup via the
> replication protocol (replication privilege is needed to establish a
> replication connection to begin with).

I'm not entirely sure the replication privilege removal is correct.
Doing that, it's no longer possible to deploy a slave *without* using
pg_basebackup, unless you are superuser. Do we really want to put that
restriction back in?

(And if we do, the docs proably need an update...)

I can't see an explicit check for the user ttempting to do
pg_stop_backup() when there is a nonexclusive backup running? Maybe
I'm reading it wrong? The case being when a user has started a backup
with pg_basebackup but then connects and manually does a
pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
decrement, but also doesn't throw an error?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 24.01.2011 20:22, Magnus Hagander wrote:
> On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> Another updated patch. Fixed bitrot, and addressed the privilege issue
>> Fujii-san raised here:
>> http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com.
>> I changed the privilege checks so that pg_start/stop_backup() functions
>> require superuser privileges again, but not for a base backup via the
>> replication protocol (replication privilege is needed to establish a
>> replication connection to begin with).
>
> I'm not entirely sure the replication privilege removal is correct.
> Doing that, it's no longer possible to deploy a slave *without* using
> pg_basebackup, unless you are superuser. Do we really want to put that
> restriction back in?

Hmm, I thought we do, I thought that was changed just to make 
pg_basebackup work without superuser privileges. But I guess it makes 
sense apart from that. So the question is, should 'replication' 
privilege be enough to use pg_start/stop_backup(), or should we require 
superuser access for that? In any case, replication privilege will be 
enough for pg_basebackup.

Ok, I won't touch that. But then we'll need to decide what to do about 
Fujii's observation 
(http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php):
> Both the user with REPLICATION privilege and the superuser can
> call pg_stop_backup. But only superuser can connect to the server
> to cancel online backup during shutdown. The non-superuser with
> REPLICATION privilege cannot. Is this behavior intentional? Or just
> oversight?


> I can't see an explicit check for the user ttempting to do
> pg_stop_backup() when there is a nonexclusive backup running? Maybe
> I'm reading it wrong? The case being when a user has started a backup
> with pg_basebackup but then connects and manually does a
> pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
> decrement, but also doesn't throw an error?

It throws an error later when it won't find backup_label. For better or 
worse, it's always been like that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Magnus Hagander
Date:
On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 24.01.2011 20:22, Magnus Hagander wrote:
>> I can't see an explicit check for the user ttempting to do
>> pg_stop_backup() when there is a nonexclusive backup running? Maybe
>> I'm reading it wrong? The case being when a user has started a backup
>> with pg_basebackup but then connects and manually does a
>> pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
>> decrement, but also doesn't throw an error?
>
> It throws an error later when it won't find backup_label. For better or
> worse, it's always been like that.

Isn't that going to leave us in a broken state though? As in a
mistaken pg_stop_backup() will decrement the counter both breaking the
streaming base backup, and also possibly throwing an assert when that
one eventually wants to do it's do_pg_stop_backup()?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 24.01.2011 22:31, Magnus Hagander wrote:
> On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> On 24.01.2011 20:22, Magnus Hagander wrote:
>>> I can't see an explicit check for the user ttempting to do
>>> pg_stop_backup() when there is a nonexclusive backup running? Maybe
>>> I'm reading it wrong? The case being when a user has started a backup
>>> with pg_basebackup but then connects and manually does a
>>> pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
>>> decrement, but also doesn't throw an error?
>>
>> It throws an error later when it won't find backup_label. For better or
>> worse, it's always been like that.
>
> Isn't that going to leave us in a broken state though? As in a
> mistaken pg_stop_backup() will decrement the counter both breaking the
> streaming base backup, and also possibly throwing an assert when that
> one eventually wants to do it's do_pg_stop_backup()?

Umm, no. pg_stop_backup() won't decrement the counter unless there's an
exclusive backup in operation according to the exclusiveBackup flag.

Hmm, perhaps the code would be more readable if instead of the
forcePageWrites counter that counts exclusive and non-exclusive backups,
and an exclusiveBackup boolean indicating if one of the in-progress
backups is an exclusive one, we had a counter that only counts
non-exclusive backups, plus a boolean indicating if an exclusive backup
is in progress in addition to them.

Attached is a patch for that (against master branch, including only xlog.c).

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Allowing multiple concurrent base backups

From
Fujii Masao
Date:
On Tue, Jan 25, 2011 at 5:14 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> I'm not entirely sure the replication privilege removal is correct.
>> Doing that, it's no longer possible to deploy a slave *without* using
>> pg_basebackup, unless you are superuser. Do we really want to put that
>> restriction back in?
>
> Hmm, I thought we do, I thought that was changed just to make pg_basebackup
> work without superuser privileges.

If we encourage users not to use the "replication" privilege to log in
the database, putting that restriction seems to be reasonable.

> Ok, I won't touch that. But then we'll need to decide what to do about
> Fujii's observation
> (http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php):

Yes. If we allow the "replication" users to call pg_start/stop_backup,
we also allow them to connect to the database even during shutdown
in order to cancel the backup.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Allowing multiple concurrent base backups

From
Fujii Masao
Date:
On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Hmm, perhaps the code would be more readable if instead of the
> forcePageWrites counter that counts exclusive and non-exclusive backups, and
> an exclusiveBackup boolean indicating if one of the in-progress backups is
> an exclusive one, we had a counter that only counts non-exclusive backups,
> plus a boolean indicating if an exclusive backup is in progress in addition
> to them.
>
> Attached is a patch for that (against master branch, including only xlog.c).

I read this patch and previous-posted one. Those look good.

Comments:

+ * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
+ * function.

Typo: s/do_pg_start_backup/do_pg_stop_backup

It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Allowing multiple concurrent base backups

From
Fujii Masao
Date:
On Tue, Jan 25, 2011 at 1:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Hmm, perhaps the code would be more readable if instead of the
>> forcePageWrites counter that counts exclusive and non-exclusive backups, and
>> an exclusiveBackup boolean indicating if one of the in-progress backups is
>> an exclusive one, we had a counter that only counts non-exclusive backups,
>> plus a boolean indicating if an exclusive backup is in progress in addition
>> to them.
>>
>> Attached is a patch for that (against master branch, including only xlog.c).
>
> I read this patch and previous-posted one. Those look good.
>
> Comments:
>
> + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
> + * function.
>
> Typo: s/do_pg_start_backup/do_pg_stop_backup
>
> It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere.

When I read the patch, I found that pg_stop_backup removes the backup history
file as soon as it creates the file, if archive_mode is not enabled.
This looks like
oversight. We should prevent pg_stop_backup from removing the fresh history
file? Or we should prevent pg_stop_backup from creating the history
file from the
beginning since it's not used at all if archiving is disabled?
(If archiving is enabled, the history file can be used to clean the
archived files up).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 25.01.2011 06:02, Fujii Masao wrote:
> On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> Hmm, perhaps the code would be more readable if instead of the
>> forcePageWrites counter that counts exclusive and non-exclusive backups, and
>> an exclusiveBackup boolean indicating if one of the in-progress backups is
>> an exclusive one, we had a counter that only counts non-exclusive backups,
>> plus a boolean indicating if an exclusive backup is in progress in addition
>> to them.
>>
>> Attached is a patch for that (against master branch, including only xlog.c).
>
> I read this patch and previous-posted one. Those look good.
>
> Comments:
>
> + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
> + * function.
>
> Typo: s/do_pg_start_backup/do_pg_stop_backup
>
> It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere.

Thanks. I've committed this now, fixing that, and hopefully all the 
other issues mentioned in this thread.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 27.01.2011 15:15, Fujii Masao wrote:
> When I read the patch, I found that pg_stop_backup removes the backup history
> file as soon as it creates the file, if archive_mode is not enabled.
> This looks like
> oversight. We should prevent pg_stop_backup from removing the fresh history
> file? Or we should prevent pg_stop_backup from creating the history
> file from the
> beginning since it's not used at all if archiving is disabled?
> (If archiving is enabled, the history file can be used to clean the
> archived files up).

Hmm, good point. It's harmless, but creating the history file in the 
first place sure seems like a waste of time.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Fujii Masao
Date:
On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Hmm, good point. It's harmless, but creating the history file in the first
> place sure seems like a waste of time.

The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.

--------------------
$ for ((i=0; i<4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i &
done

$ cat test0/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
CHECKPOINT LOCATION: 0/20000E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/000000010000000000000002.000000B0.backup
--------------------

This would cause a serious problem. Because the backup-end record
which indicates the same "START WAL LOCATION" can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required WAL
file.

        /*
         * Force a CHECKPOINT.    Aside from being necessary to prevent torn
         * page problems, this guarantees that two successive backup runs will
         * have different checkpoint positions and hence different history
         * file names, even if nothing happened in between.
         *
         * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
         * fast = true).  Otherwise this can take awhile.
         */
        RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                          (fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that. Or we should include backup label name in the backup-end
record, to prevent a recovery from reading not-its-own backup-end record.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Allowing multiple concurrent base backups

From
Robert Haas
Date:
On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Hmm, good point. It's harmless, but creating the history file in the first
>> place sure seems like a waste of time.
>
> The attached patch changes pg_stop_backup so that it doesn't create
> the backup history file if archiving is not enabled.
>
> When I tested the multiple backups, I found that they can have the same
> checkpoint location and the same history file name.
>
> --------------------
> $ for ((i=0; i<4; i++)); do
> pg_basebackup -D test$i -c fast -x -l test$i &
> done
>
> $ cat test0/backup_label
> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
> CHECKPOINT LOCATION: 0/20000E8
> START TIME: 2011-02-01 12:12:31 JST
> LABEL: test0
>
> $ cat test1/backup_label
> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
> CHECKPOINT LOCATION: 0/20000E8
> START TIME: 2011-02-01 12:12:31 JST
> LABEL: test1
>
> $ cat test2/backup_label
> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
> CHECKPOINT LOCATION: 0/20000E8
> START TIME: 2011-02-01 12:12:31 JST
> LABEL: test2
>
> $ cat test3/backup_label
> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
> CHECKPOINT LOCATION: 0/20000E8
> START TIME: 2011-02-01 12:12:31 JST
> LABEL: test3
>
> $ ls archive/*.backup
> archive/000000010000000000000002.000000B0.backup
> --------------------
>
> This would cause a serious problem. Because the backup-end record
> which indicates the same "START WAL LOCATION" can be written by the
> first backup before the other finishes. So we might think wrongly that
> we've already reached a consistency state by reading the backup-end
> record (written by the first backup) before reading the last required WAL
> file.
>
>                /*
>                 * Force a CHECKPOINT.  Aside from being necessary to prevent torn
>                 * page problems, this guarantees that two successive backup runs will
>                 * have different checkpoint positions and hence different history
>                 * file names, even if nothing happened in between.
>                 *
>                 * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>                 * fast = true).  Otherwise this can take awhile.
>                 */
>                RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>                                                  (fast ? CHECKPOINT_IMMEDIATE : 0));
>
> This problem happens because the above code (in do_pg_start_backup)
> actually doesn't ensure that the concurrent backups have the different
> checkpoint locations. ISTM that we should change the above or elsewhere
> to ensure that. Or we should include backup label name in the backup-end
> record, to prevent a recovery from reading not-its-own backup-end record.
>
> Thought?

This patch is on the 9.1 open items list, but I don't understand it
well enough to know whether it's correct.  Can someone else pick it
up?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 17.03.2011 21:39, Robert Haas wrote:
> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com>  wrote:
>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> Hmm, good point. It's harmless, but creating the history file in the first
>>> place sure seems like a waste of time.
>>
>> The attached patch changes pg_stop_backup so that it doesn't create
>> the backup history file if archiving is not enabled.
>>
>> When I tested the multiple backups, I found that they can have the same
>> checkpoint location and the same history file name.
>>
>> --------------------
>> $ for ((i=0; i<4; i++)); do
>> pg_basebackup -D test$i -c fast -x -l test$i&
>> done
>>
>> $ cat test0/backup_label
>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>> CHECKPOINT LOCATION: 0/20000E8
>> START TIME: 2011-02-01 12:12:31 JST
>> LABEL: test0
>>
>> $ cat test1/backup_label
>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>> CHECKPOINT LOCATION: 0/20000E8
>> START TIME: 2011-02-01 12:12:31 JST
>> LABEL: test1
>>
>> $ cat test2/backup_label
>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>> CHECKPOINT LOCATION: 0/20000E8
>> START TIME: 2011-02-01 12:12:31 JST
>> LABEL: test2
>>
>> $ cat test3/backup_label
>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>> CHECKPOINT LOCATION: 0/20000E8
>> START TIME: 2011-02-01 12:12:31 JST
>> LABEL: test3
>>
>> $ ls archive/*.backup
>> archive/000000010000000000000002.000000B0.backup
>> --------------------
>>
>> This would cause a serious problem. Because the backup-end record
>> which indicates the same "START WAL LOCATION" can be written by the
>> first backup before the other finishes. So we might think wrongly that
>> we've already reached a consistency state by reading the backup-end
>> record (written by the first backup) before reading the last required WAL
>> file.
>>
>>                 /*
>>                  * Force a CHECKPOINT.  Aside from being necessary to prevent torn
>>                  * page problems, this guarantees that two successive backup runs will
>>                  * have different checkpoint positions and hence different history
>>                  * file names, even if nothing happened in between.
>>                  *
>>                  * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>                  * fast = true).  Otherwise this can take awhile.
>>                  */
>>                 RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>                                                   (fast ? CHECKPOINT_IMMEDIATE : 0));
>>
>> This problem happens because the above code (in do_pg_start_backup)
>> actually doesn't ensure that the concurrent backups have the different
>> checkpoint locations. ISTM that we should change the above or elsewhere
>> to ensure that.

Yes, good point.

>> Or we should include backup label name in the backup-end
>> record, to prevent a recovery from reading not-its-own backup-end record.

Backup labels are not guaranteed to be unique either, so including 
backup label in the backup-end-record doesn't solve the problem. But 
something else like a backup-start counter in shared memory or process 
id would work.

It won't make the history file names unique, though. Now that we use on 
the end-of-backup record for detecting end-of-backup, the history files 
are just for documenting purposes. Do we want to give up on history 
files for backups performed with pg_basebackup? Or we can include the 
backup counter or similar in the filename too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 18.03.2011 10:48, Heikki Linnakangas wrote:
> On 17.03.2011 21:39, Robert Haas wrote:
>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com>
>> wrote:
>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>> Hmm, good point. It's harmless, but creating the history file in the
>>>> first
>>>> place sure seems like a waste of time.
>>>
>>> The attached patch changes pg_stop_backup so that it doesn't create
>>> the backup history file if archiving is not enabled.
>>>
>>> When I tested the multiple backups, I found that they can have the same
>>> checkpoint location and the same history file name.
>>>
>>> --------------------
>>> $ for ((i=0; i<4; i++)); do
>>> pg_basebackup -D test$i -c fast -x -l test$i&
>>> done
>>>
>>> $ cat test0/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test0
>>>
>>> $ cat test1/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test1
>>>
>>> $ cat test2/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test2
>>>
>>> $ cat test3/backup_label
>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>> CHECKPOINT LOCATION: 0/20000E8
>>> START TIME: 2011-02-01 12:12:31 JST
>>> LABEL: test3
>>>
>>> $ ls archive/*.backup
>>> archive/000000010000000000000002.000000B0.backup
>>> --------------------
>>>
>>> This would cause a serious problem. Because the backup-end record
>>> which indicates the same "START WAL LOCATION" can be written by the
>>> first backup before the other finishes. So we might think wrongly that
>>> we've already reached a consistency state by reading the backup-end
>>> record (written by the first backup) before reading the last required
>>> WAL
>>> file.
>>>
>>> /*
>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn
>>> * page problems, this guarantees that two successive backup runs will
>>> * have different checkpoint positions and hence different history
>>> * file names, even if nothing happened in between.
>>> *
>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>> * fast = true). Otherwise this can take awhile.
>>> */
>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>> (fast ? CHECKPOINT_IMMEDIATE : 0));
>>>
>>> This problem happens because the above code (in do_pg_start_backup)
>>> actually doesn't ensure that the concurrent backups have the different
>>> checkpoint locations. ISTM that we should change the above or elsewhere
>>> to ensure that.
>
> Yes, good point.

Here's a patch based on that approach, ensuring that each base backup
uses a different checkpoint as the start location. I think I'll commit
this, rather than invent a new unique ID mechanism for backups. The
latter would need changes in recovery and control file too, and I don't
feel like tinkering with that at this stage.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

Re: Allowing multiple concurrent base backups

From
Heikki Linnakangas
Date:
On 18.03.2011 13:56, Heikki Linnakangas wrote:
> On 18.03.2011 10:48, Heikki Linnakangas wrote:
>> On 17.03.2011 21:39, Robert Haas wrote:
>>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com>
>>> wrote:
>>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
>>>> <heikki.linnakangas@enterprisedb.com> wrote:
>>>>> Hmm, good point. It's harmless, but creating the history file in the
>>>>> first
>>>>> place sure seems like a waste of time.
>>>>
>>>> The attached patch changes pg_stop_backup so that it doesn't create
>>>> the backup history file if archiving is not enabled.
>>>>
>>>> When I tested the multiple backups, I found that they can have the same
>>>> checkpoint location and the same history file name.
>>>>
>>>> --------------------
>>>> $ for ((i=0; i<4; i++)); do
>>>> pg_basebackup -D test$i -c fast -x -l test$i&
>>>> done
>>>>
>>>> $ cat test0/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test0
>>>>
>>>> $ cat test1/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test1
>>>>
>>>> $ cat test2/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test2
>>>>
>>>> $ cat test3/backup_label
>>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002)
>>>> CHECKPOINT LOCATION: 0/20000E8
>>>> START TIME: 2011-02-01 12:12:31 JST
>>>> LABEL: test3
>>>>
>>>> $ ls archive/*.backup
>>>> archive/000000010000000000000002.000000B0.backup
>>>> --------------------
>>>>
>>>> This would cause a serious problem. Because the backup-end record
>>>> which indicates the same "START WAL LOCATION" can be written by the
>>>> first backup before the other finishes. So we might think wrongly that
>>>> we've already reached a consistency state by reading the backup-end
>>>> record (written by the first backup) before reading the last required
>>>> WAL
>>>> file.
>>>>
>>>> /*
>>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn
>>>> * page problems, this guarantees that two successive backup runs will
>>>> * have different checkpoint positions and hence different history
>>>> * file names, even if nothing happened in between.
>>>> *
>>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
>>>> * fast = true). Otherwise this can take awhile.
>>>> */
>>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
>>>> (fast ? CHECKPOINT_IMMEDIATE : 0));
>>>>
>>>> This problem happens because the above code (in do_pg_start_backup)
>>>> actually doesn't ensure that the concurrent backups have the different
>>>> checkpoint locations. ISTM that we should change the above or elsewhere
>>>> to ensure that.
>>
>> Yes, good point.
>
> Here's a patch based on that approach, ensuring that each base backup
> uses a different checkpoint as the start location. I think I'll commit
> this, rather than invent a new unique ID mechanism for backups. The
> latter would need changes in recovery and control file too, and I don't
> feel like tinkering with that at this stage.

Ok, committed this.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com