Thread: Allowing multiple concurrent base backups
Now that we have a basic over-the-wire base backup capability in walsender, it would be nice to allow taking multiple base backups at the same time. It might not seem very useful at first, but it makes it easier to set up standbys for small databases. At the moment, if you want to set up two standbys, you have to either take a single base backup and distribute it to both standbys, or somehow coordinate that they don't try to take the base backup at the same time. Also, you don't want initializing a standby to conflict with a nightly backup cron script. So, this patch modifies the internal do_pg_start/stop_backup functions so that in addition to the traditional mode of operation, where a backup_label file is created in the data directory where it's backed up along with all other files, the backup label file is be returned to the caller, and the caller is responsible for including it in the backup. The code in replication/basebackup.c includes it in the tar file that's streamed the client, as "backup_label". To make that safe, I've changed forcePageWrites into an integer. Whenever a backup is started, it's incremented, and when one ends, it's decremented. When forcePageWrites == 0, there's no backup in progress. The user-visible pg_start_backup() function is not changed. You can only have one backup started that way in progress at a time. But you can do streaming base backups at the same time with traditional pg_start_backup(). I implemented this in two ways, and can't decide which I like better: 1. The contents of the backup label file are returned to the caller of do_pg_start_backup() as a palloc'd string. 2. do_pg_start_backup() creates a temporary file that the backup label is written to (instead of "backup_label"). Implementation 1 changes more code, as pg_start/stop_backup() need to be changed to write/read from memory instead of file, but the result isn't any more complicated. Nevertheless, I somehow feel more comfortable with 2. Patches for both approaches attached. They're also available in my github repository at git@github.com:hlinnaka/postgres.git. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I implemented this in two ways, and can't decide which I like better: > 1. The contents of the backup label file are returned to the caller of > do_pg_start_backup() as a palloc'd string. > 2. do_pg_start_backup() creates a temporary file that the backup label > is written to (instead of "backup_label"). > Implementation 1 changes more code, as pg_start/stop_backup() need to be > changed to write/read from memory instead of file, but the result isn't > any more complicated. Nevertheless, I somehow feel more comfortable with 2. Seems like either one of these is fairly problematic in that you have to have some monstrous kluge to get the backup_label file to appear with the right name in the tarfile. How badly do we actually need this? I don't think the use-case for concurrent base backups is all that large in practice given the I/O hit it's going to involve. regards, tom lane
On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> I implemented this in two ways, and can't decide which I like better: > >> 1. The contents of the backup label file are returned to the caller of >> do_pg_start_backup() as a palloc'd string. > >> 2. do_pg_start_backup() creates a temporary file that the backup label >> is written to (instead of "backup_label"). > >> Implementation 1 changes more code, as pg_start/stop_backup() need to be >> changed to write/read from memory instead of file, but the result isn't >> any more complicated. Nevertheless, I somehow feel more comfortable with 2. > > Seems like either one of these is fairly problematic in that you have to > have some monstrous kluge to get the backup_label file to appear with > the right name in the tarfile. How badly do we actually need this? > I don't think the use-case for concurrent base backups is all that large > in practice given the I/O hit it's going to involve. I think it can be done cleaner in the tar file injection - I've been chatting with Heikki offlist about that. Not sure, but I have a feeling it does. As for the use-case, it doesn't necessarily involve a huge I/O hit if either the entire DB is in RAM (not all that uncommon - though then the backup finishes quickly as well), or if the *client* is slow, thus making the server backlogged on sending the base backup. Having it possible to do it concurrently also makes for *much* easier use of the feature. It might be just about overlapping with a few seconds, for example - which would probably have no major effect, but takes a lot more code on the other end to work around. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 11.01.2011 20:51, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> I implemented this in two ways, and can't decide which I like better: > >> 1. The contents of the backup label file are returned to the caller of >> do_pg_start_backup() as a palloc'd string. > >> 2. do_pg_start_backup() creates a temporary file that the backup label >> is written to (instead of "backup_label"). > >> Implementation 1 changes more code, as pg_start/stop_backup() need to be >> changed to write/read from memory instead of file, but the result isn't >> any more complicated. Nevertheless, I somehow feel more comfortable with 2. > > Seems like either one of these is fairly problematic in that you have to > have some monstrous kluge to get the backup_label file to appear with > the right name in the tarfile. Oh. I'm surprised you feel that way - that part didn't feel ugly or kludgey at all to me. > How badly do we actually need this? > I don't think the use-case for concurrent base backups is all that large > in practice given the I/O hit it's going to involve. It makes it very convenient to set up standbys, without having to worry that you'll conflict e.g with a nightly backup. I don't imagine people will use streaming base backups for very large databases anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> It makes it very convenient to set up standbys, without having to worry > that you'll conflict e.g with a nightly backup. I don't imagine people > will use streaming base backups for very large databases anyway. Also, imagine that you're provisioning a 10-node replication cluster on EC2. This would make that worlds easier. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Magnus Hagander <magnus@hagander.net> writes: > On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Seems like either one of these is fairly problematic in that you have to >> have some monstrous kluge to get the backup_label file to appear with >> the right name in the tarfile. �How badly do we actually need this? >> I don't think the use-case for concurrent base backups is all that large >> in practice given the I/O hit it's going to involve. > I think it can be done cleaner in the tar file injection - I've been > chatting with Heikki offlist about that. Not sure, but I have a > feeling it does. One point that I'm particularly interested to see how you'll kluge it is ensuring that the tarball contains only the desired temp data and not also the "real" $PGDATA/backup_label, should there be a normal base backup being done concurrently with the streamed one. The whole thing just seems too fragile and dangerous to be worth dealing with given that actual usage will be a corner case. *I* sure wouldn't trust it to work when the chips were down. regards, tom lane
On 11.01.2011 20:17, Heikki Linnakangas wrote: > Patches for both approaches attached. They're also available in my > github repository at git@github.com:hlinnaka/postgres.git. Just so people won't report the same issues again, a couple of bugs have already cropped up (thanks Magnus): * a backup_label file in the data directory should now not be tarred up - we're injecting a different backup_label file there. * pg_start_backup_callback needs to be updated to just decrement forcePageWrites, not reset it to 'false' * pg_stop_backup() decrements forcePageWrites twice. oops. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > Now that we have a basic over-the-wire base backup capability in walsender, > it would be nice to allow taking multiple base backups at the same time. I would prefer to be able to take a base backup from a standby, or is that already possible? What about cascading the wal shipping? Maybe those ideas are much bigger projects, but if I don't ask, I don't get to know :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 11.01.2011 21:50, Dimitri Fontaine wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: > >> Now that we have a basic over-the-wire base backup capability in walsender, >> it would be nice to allow taking multiple base backups at the same time. > > I would prefer to be able to take a base backup from a standby, or is > that already possible? What about cascading the wal shipping? Maybe > those ideas are much bigger projects, but if I don't ask, I don't get to > know :) Yeah, those are bigger projects.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote: > So, this patch modifies the internal do_pg_start/stop_backup functions > so that in addition to the traditional mode of operation, where a > backup_label file is created in the data directory where it's backed up > along with all other files, the backup label file is be returned to the > caller, and the caller is responsible for including it in the backup. > The code in replication/basebackup.c includes it in the tar file that's > streamed the client, as "backup_label". Perhaps we can use this more intelligent form of base backup to differentiate between:a. a primary that has crashed while a backup was in progress; andb. an online backup that is beingrestored. Allowing the user to do an unrestricted file copy as a base backup doesn't allow us to make that differentiation. That lead to the two bugs that we fixed in StartupXLOG(). And right now there are still two problems remaining (albeit less severe): 1. If it's a primary recovering from a crash, and there is a backup_label file, and the WAL referenced in the backup_label exists, then it does a bunch of extra work during recovery; and2. In the same situation, if the WAL referenced in the backup_label does not exist, then it PANICs with a HINT to tell you to remove the backup_label. Is this an opportunity to solve these problems and simplify the code? Regards,Jeff Davis
On 11.01.2011 22:16, Jeff Davis wrote: > On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote: >> So, this patch modifies the internal do_pg_start/stop_backup functions >> so that in addition to the traditional mode of operation, where a >> backup_label file is created in the data directory where it's backed up >> along with all other files, the backup label file is be returned to the >> caller, and the caller is responsible for including it in the backup. >> The code in replication/basebackup.c includes it in the tar file that's >> streamed the client, as "backup_label". > > Perhaps we can use this more intelligent form of base backup to > differentiate between: > a. a primary that has crashed while a backup was in progress; and > b. an online backup that is being restored. > > Allowing the user to do an unrestricted file copy as a base backup > doesn't allow us to make that differentiation. That lead to the two bugs > that we fixed in StartupXLOG(). And right now there are still two > problems remaining (albeit less severe): > > 1. If it's a primary recovering from a crash, and there is a > backup_label file, and the WAL referenced in the backup_label exists, > then it does a bunch of extra work during recovery; and > 2. In the same situation, if the WAL referenced in the backup_label > does not exist, then it PANICs with a HINT to tell you to remove the > backup_label. > > Is this an opportunity to solve these problems and simplify the code? It won't change the situation for pg_start_backup(), but with the patch the base backups done via streaming won't have those issues, because backup_label is not created (with that name) in the master. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote: > > 1. If it's a primary recovering from a crash, and there is a > > backup_label file, and the WAL referenced in the backup_label exists, > > then it does a bunch of extra work during recovery; and > > 2. In the same situation, if the WAL referenced in the backup_label > > does not exist, then it PANICs with a HINT to tell you to remove the > > backup_label. > > > > Is this an opportunity to solve these problems and simplify the code? > > It won't change the situation for pg_start_backup(), but with the patch > the base backups done via streaming won't have those issues, because > backup_label is not created (with that name) in the master. Do you think we should change the backup protocol for normal base backups to try to fix this? Or do you think that the simplicity of unrestricted file copy is worth these problems? We could probably make some fairly minor changes, like making a file on the primary and telling users to exclude it from any base backup. The danger, of course, is that they do copy it, and their backup is compromised. Regards,Jeff Davis
On Tue, Jan 11, 2011 at 22:51, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote: >> > 1. If it's a primary recovering from a crash, and there is a >> > backup_label file, and the WAL referenced in the backup_label exists, >> > then it does a bunch of extra work during recovery; and >> > 2. In the same situation, if the WAL referenced in the backup_label >> > does not exist, then it PANICs with a HINT to tell you to remove the >> > backup_label. >> > >> > Is this an opportunity to solve these problems and simplify the code? >> >> It won't change the situation for pg_start_backup(), but with the patch >> the base backups done via streaming won't have those issues, because >> backup_label is not created (with that name) in the master. > > Do you think we should change the backup protocol for normal base > backups to try to fix this? Or do you think that the simplicity of > unrestricted file copy is worth these problems? > > We could probably make some fairly minor changes, like making a file on > the primary and telling users to exclude it from any base backup. The > danger, of course, is that they do copy it, and their backup is > compromised. I think keeping the flexibility is important. If it does add an extra step I think that's ok once we have pg_basebackup, but it must be reasonably *safe*. Corrupt backups from forgetting to exclude a file seems not so. But if the problem is you forgot to exclude it, can't you just remove it at a later time? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Jan 11, 2011, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The whole thing just seems too fragile and dangerous to be worth dealing > with given that actual usage will be a corner case. *I* sure wouldn't > trust it to work when the chips were down. I hope this assessment proves to be incorrect, because like Magnus and Heikki, I think this will be a major usability improvement.It takes us from "there's a way to do that" to "it just works". ...Robert
On Tue, Jan 11, 2011 at 05:06:34PM -0500, Robert Haas wrote: > On Jan 11, 2011, at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The whole thing just seems too fragile and dangerous to be worth dealing > > with given that actual usage will be a corner case. *I* sure wouldn't > > trust it to work when the chips were down. > > I hope this assessment proves to be incorrect, because like Magnus and Heikki, I think this will be a major usability improvement.It takes us from "there's a way to do that" to "it just works". (It just works)++ :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, 2011-01-11 at 23:07 +0100, Magnus Hagander wrote: > I think keeping the flexibility is important. If it does add an extra > step I think that's ok once we have pg_basebackup, but it must be > reasonably *safe*. Corrupt backups from forgetting to exclude a file > seems not so. Agreed. > But if the problem is you forgot to exclude it, can't you just remove > it at a later time? If you think you are recovering the primary, and it's really the backup, then you get corruption. It's too late to remove a file after that (unless you have a backup of your backup ;) ). If you think you are restoring a backup, and it's really a primary that crashed, then you run into one of the two problems that I mentioned (which are less severe than corruption, but very annoying). Regards,Jeff Davis
2011/1/11 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > On 11.01.2011 21:50, Dimitri Fontaine wrote: >> >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> >>> Now that we have a basic over-the-wire base backup capability in >>> walsender, >>> it would be nice to allow taking multiple base backups at the same time. >> >> I would prefer to be able to take a base backup from a standby, or is >> that already possible? What about cascading the wal shipping? Maybe >> those ideas are much bigger projects, but if I don't ask, I don't get to >> know :) > > Yeah, those are bigger projects.. Way more interesting, IMHO. Feature to allow multiple backup at the same time looks useless to me. If DB is small, then it is not that hard to do that before or after a possible nightly backup. If DB is large necessary to comment ? The only case I find interesting is if you can use the bandwith of more than one ethernet device, else I would use rsync between nodes after the inital basebackup, pretty sure. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On 11.01.2011 23:51, Jeff Davis wrote: > On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote: >>> 1. If it's a primary recovering from a crash, and there is a >>> backup_label file, and the WAL referenced in the backup_label exists, >>> then it does a bunch of extra work during recovery; and >>> 2. In the same situation, if the WAL referenced in the backup_label >>> does not exist, then it PANICs with a HINT to tell you to remove the >>> backup_label. >>> >>> Is this an opportunity to solve these problems and simplify the code? >> >> It won't change the situation for pg_start_backup(), but with the patch >> the base backups done via streaming won't have those issues, because >> backup_label is not created (with that name) in the master. > > Do you think we should change the backup protocol for normal base > backups to try to fix this? Or do you think that the simplicity of > unrestricted file copy is worth these problems? > > We could probably make some fairly minor changes, like making a file on > the primary and telling users to exclude it from any base backup. The > danger, of course, is that they do copy it, and their backup is > compromised. Yeah, I don't think it's a good idea to provide such a foot-gun. Last time we discussed this there was the idea of creating a file in $PGDATA, and removing read-permissions from it, so that it couldn't be accidentally included in the backup. Even with that safeguard, it doesn't feel very safe - a backup running as root or some other special privileges might still be able to read it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Seems like either one of these is fairly problematic in that you have to >>> have some monstrous kluge to get the backup_label file to appear with >>> the right name in the tarfile. How badly do we actually need this? >>> I don't think the use-case for concurrent base backups is all that large >>> in practice given the I/O hit it's going to involve. > >> I think it can be done cleaner in the tar file injection - I've been >> chatting with Heikki offlist about that. Not sure, but I have a >> feeling it does. > > One point that I'm particularly interested to see how you'll kluge it > is ensuring that the tarball contains only the desired temp data and not > also the "real" $PGDATA/backup_label, should there be a normal base > backup being done concurrently with the streamed one. > > The whole thing just seems too fragile and dangerous to be worth dealing > with given that actual usage will be a corner case. *I* sure wouldn't > trust it to work when the chips were down. > Maybe if pg_start_backup() notices that there is another backup running should block waiting for another session to run pg_stop_backup() ? Or have a new function like pg_start_backup_wait() ? Considering that parallell base backups would be io-bound (or network-bound), there is little need to actually run them in parallell . Greetings Marcin
On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote: > On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Magnus Hagander <magnus@hagander.net> writes: > >> On Tue, Jan 11, 2011 at 19:51, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Seems like either one of these is fairly problematic in that you have to > >>> have some monstrous kluge to get the backup_label file to appear with > >>> the right name in the tarfile. How badly do we actually need this? > >>> I don't think the use-case for concurrent base backups is all that large > >>> in practice given the I/O hit it's going to involve. > > > >> I think it can be done cleaner in the tar file injection - I've been > >> chatting with Heikki offlist about that. Not sure, but I have a > >> feeling it does. > > > > One point that I'm particularly interested to see how you'll kluge it > > is ensuring that the tarball contains only the desired temp data and not > > also the "real" $PGDATA/backup_label, should there be a normal base > > backup being done concurrently with the streamed one. > > > > The whole thing just seems too fragile and dangerous to be worth dealing > > with given that actual usage will be a corner case. *I* sure wouldn't > > trust it to work when the chips were down. > > Maybe if pg_start_backup() notices that there is another backup > running should block waiting for another session to run > pg_stop_backup() ? Or have a new function like pg_start_backup_wait() > ? > > Considering that parallell base backups would be io-bound (or > network-bound), there is little need to actually run them in parallell That's not actually true. Backups at the moment are CPU-bound, and running them in parallel is one way to make them closer to I/O-bound, which is what they *should* be. There are other proposals out there, and some work being done, to make backups less dependent on CPU, among them: - Making the on-disk representation smaller - Making COPY more efficient As far as I know, none of this work is public yet. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 12.01.2011 17:15, David Fetter wrote: > On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote: >> Considering that parallell base backups would be io-bound (or >> network-bound), there is little need to actually run them in parallell > > That's not actually true. Backups at the moment are CPU-bound, and > running them in parallel is one way to make them closer to I/O-bound, > which is what they *should* be. That's a different kind of "parallel". We're talking about taking multiple backups in parallel, each using one process, and you're talking about taking one backup using multiple parallel processes or threads. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Jan 12, 2011 at 05:17:38PM +0200, Heikki Linnakangas wrote: > On 12.01.2011 17:15, David Fetter wrote: > >On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote: > >>Considering that parallell base backups would be io-bound (or > >>network-bound), there is little need to actually run them in parallell > > > >That's not actually true. Backups at the moment are CPU-bound, and > >running them in parallel is one way to make them closer to I/O-bound, > >which is what they *should* be. > > That's a different kind of "parallel". We're talking about taking > multiple backups in parallel, each using one process, and you're > talking about taking one backup using multiple parallel processes or > threads. Good point. The idea that IO/network bandwidth is always saturated by one backup just isn't true, though. Take the case of multiple independent NICs, e.g. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Jan 12, 2011 at 10:15 AM, David Fetter <david@fetter.org> wrote: >> Considering that parallell base backups would be io-bound (or >> network-bound), there is little need to actually run them in parallell > > That's not actually true. Backups at the moment are CPU-bound, and > running them in parallel is one way to make them closer to I/O-bound, > which is what they *should* be. Remember, we're talking about filesystem base backups here. If you're CPU can't handle a stream from disk -> network, byte for byte (maybe encrypting it), then you've spend *WAAAAY* to much on your storage sub-system, and way to little on CPU. I can see trying to "parallize" the base backup such that each table-space could be run concurrently, but that's about it. > There are other proposals out there, and some work being done, to make > backups less dependent on CPU, among them: > > - Making the on-disk representation smaller > - Making COPY more efficient > > As far as I know, none of this work is public yet. pg_dump is another story. But it's not related to base backups for PIT Recovery/Replication. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 12.01.2011 17:15, David Fetter wrote: >> On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote: >>> Considering that parallell base backups would be io-bound (or >>> network-bound), there is little need to actually run them in parallell >> >> That's not actually true. Backups at the moment are CPU-bound, and >> running them in parallel is one way to make them closer to I/O-bound, >> which is what they *should* be. > That's a different kind of "parallel". We're talking about taking > multiple backups in parallel, each using one process, and you're talking > about taking one backup using multiple parallel processes or threads. Even more to the point, you're confusing pg_dump with a base backup. The reason pg_dump eats a lot of CPU is primarily COPY's data conversion and formatting requirements, none of which will happen in a base backup (streaming or otherwise). regards, tom lane
On Wed, Jan 12, 2011 at 10:24:31AM -0500, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > On 12.01.2011 17:15, David Fetter wrote: > >> On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote: > >>> Considering that parallell base backups would be io-bound (or > >>> network-bound), there is little need to actually run them in > >>> parallell > >> > >> That's not actually true. Backups at the moment are CPU-bound, > >> and running them in parallel is one way to make them closer to > >> I/O-bound, which is what they *should* be. > > > That's a different kind of "parallel". We're talking about taking > > multiple backups in parallel, each using one process, and you're > > talking about taking one backup using multiple parallel processes > > or threads. > > Even more to the point, you're confusing pg_dump with a base backup. > The reason pg_dump eats a lot of CPU is primarily COPY's data > conversion and formatting requirements, none of which will happen in > a base backup (streaming or otherwise). Oops. That'll teach me to post before coffee. :P Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote: > > > It makes it very convenient to set up standbys, without having to worry > > that you'll conflict e.g with a nightly backup. I don't imagine people > > will use streaming base backups for very large databases anyway. > > Also, imagine that you're provisioning a 10-node replication cluster on > EC2. This would make that worlds easier. Hmm, perhaps. My concern is that a naive attempt to do that is going to have 10 base-backups happening at the same time, completely slamming the master, and none of them completing is a reasonable time. Is this possible, or is it that simultaneity will buy you hot caches and backup #2 -> #10 all run faster? Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Thu, Jan 13, 2011 at 2:19 PM, Ross J. Reedstrom <reedstrm@rice.edu> wrote: > On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote: >> >> > It makes it very convenient to set up standbys, without having to worry >> > that you'll conflict e.g with a nightly backup. I don't imagine people >> > will use streaming base backups for very large databases anyway. >> >> Also, imagine that you're provisioning a 10-node replication cluster on >> EC2. This would make that worlds easier. > > Hmm, perhaps. My concern is that a naive attempt to do that is going to > have 10 base-backups happening at the same time, completely slamming the > master, and none of them completing is a reasonable time. Is this > possible, or is it that simultaneity will buy you hot caches and backup > #2 -> #10 all run faster? That's going to depend on the situation. If the database fits in memory, then it's just going to work. If it fits on disk, it's less obvious whether it'll be good or bad, but an arbitrary limitation here doesn't serve us well. P.S. Your reply-to header is busted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/13/11 12:11 PM, Robert Haas wrote: > That's going to depend on the situation. If the database fits in > memory, then it's just going to work. If it fits on disk, it's less > obvious whether it'll be good or bad, but an arbitrary limitation here > doesn't serve us well. FWIW, if we had this feature right now in 9.0 we (PGX) would be using it. We run into the case of DB in memory, multiple slaves fairly often these days. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 13.01.2011 22:57, Josh Berkus wrote: > On 1/13/11 12:11 PM, Robert Haas wrote: >> That's going to depend on the situation. If the database fits in >> memory, then it's just going to work. If it fits on disk, it's less >> obvious whether it'll be good or bad, but an arbitrary limitation here >> doesn't serve us well. > > FWIW, if we had this feature right now in 9.0 we (PGX) would be using > it. We run into the case of DB in memory, multiple slaves fairly often > these days. Anyway, here's an updated patch with all the known issues fixed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Thu, Jan 13, 2011 at 22:32, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 13.01.2011 22:57, Josh Berkus wrote: >> >> On 1/13/11 12:11 PM, Robert Haas wrote: >>> >>> That's going to depend on the situation. If the database fits in >>> memory, then it's just going to work. If it fits on disk, it's less >>> obvious whether it'll be good or bad, but an arbitrary limitation here >>> doesn't serve us well. >> >> FWIW, if we had this feature right now in 9.0 we (PGX) would be using >> it. We run into the case of DB in memory, multiple slaves fairly often >> these days. > > Anyway, here's an updated patch with all the known issues fixed. It's kind of crude that do_pg_stop_backup() has to parse the backuplabel with sscanf() when it's coming from a non-exclusive backup - we could pass the location as a parameter instead, to make it cleaner. There might be a point to it being the same in both codepaths, but I'm not sure it's that important... Doesn't this code put the backup label in *every* tarfile, and not just the main one? From what I can tell the code in SendBackupDirectory() relies on labelfile being NULL in tar files for "other tablespaces", but the caller never sets this. Finally, I'd move the addition of the labelfile to it's own function, but that's just me ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Thu, 2011-01-13 at 23:32 +0200, Heikki Linnakangas wrote: > On 13.01.2011 22:57, Josh Berkus wrote: > > On 1/13/11 12:11 PM, Robert Haas wrote: > >> That's going to depend on the situation. If the database fits in > >> memory, then it's just going to work. If it fits on disk, it's less > >> obvious whether it'll be good or bad, but an arbitrary limitation here > >> doesn't serve us well. > > > > FWIW, if we had this feature right now in 9.0 we (PGX) would be using > > it. We run into the case of DB in memory, multiple slaves fairly often > > these days. > > Anyway, here's an updated patch with all the known issues fixed. It's good we have this as an option and I like the way you've done this. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
On 13.01.2011 23:32, Heikki Linnakangas wrote: > Anyway, here's an updated patch with all the known issues fixed. Another updated patch. Fixed bitrot, and addressed the privilege issue Fujii-san raised here: http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com. I changed the privilege checks so that pg_start/stop_backup() functions require superuser privileges again, but not for a base backup via the replication protocol (replication privilege is needed to establish a replication connection to begin with). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 13.01.2011 23:32, Heikki Linnakangas wrote: >> >> Anyway, here's an updated patch with all the known issues fixed. > > Another updated patch. Fixed bitrot, and addressed the privilege issue > Fujii-san raised here: > http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com. > I changed the privilege checks so that pg_start/stop_backup() functions > require superuser privileges again, but not for a base backup via the > replication protocol (replication privilege is needed to establish a > replication connection to begin with). I'm not entirely sure the replication privilege removal is correct. Doing that, it's no longer possible to deploy a slave *without* using pg_basebackup, unless you are superuser. Do we really want to put that restriction back in? (And if we do, the docs proably need an update...) I can't see an explicit check for the user ttempting to do pg_stop_backup() when there is a nonexclusive backup running? Maybe I'm reading it wrong? The case being when a user has started a backup with pg_basebackup but then connects and manually does a pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the decrement, but also doesn't throw an error? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 24.01.2011 20:22, Magnus Hagander wrote: > On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Another updated patch. Fixed bitrot, and addressed the privilege issue >> Fujii-san raised here: >> http://archives.postgresql.org/message-id/4D380560.3040400@enterprisedb.com. >> I changed the privilege checks so that pg_start/stop_backup() functions >> require superuser privileges again, but not for a base backup via the >> replication protocol (replication privilege is needed to establish a >> replication connection to begin with). > > I'm not entirely sure the replication privilege removal is correct. > Doing that, it's no longer possible to deploy a slave *without* using > pg_basebackup, unless you are superuser. Do we really want to put that > restriction back in? Hmm, I thought we do, I thought that was changed just to make pg_basebackup work without superuser privileges. But I guess it makes sense apart from that. So the question is, should 'replication' privilege be enough to use pg_start/stop_backup(), or should we require superuser access for that? In any case, replication privilege will be enough for pg_basebackup. Ok, I won't touch that. But then we'll need to decide what to do about Fujii's observation (http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php): > Both the user with REPLICATION privilege and the superuser can > call pg_stop_backup. But only superuser can connect to the server > to cancel online backup during shutdown. The non-superuser with > REPLICATION privilege cannot. Is this behavior intentional? Or just > oversight? > I can't see an explicit check for the user ttempting to do > pg_stop_backup() when there is a nonexclusive backup running? Maybe > I'm reading it wrong? The case being when a user has started a backup > with pg_basebackup but then connects and manually does a > pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the > decrement, but also doesn't throw an error? It throws an error later when it won't find backup_label. For better or worse, it's always been like that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 24.01.2011 20:22, Magnus Hagander wrote: >> I can't see an explicit check for the user ttempting to do >> pg_stop_backup() when there is a nonexclusive backup running? Maybe >> I'm reading it wrong? The case being when a user has started a backup >> with pg_basebackup but then connects and manually does a >> pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the >> decrement, but also doesn't throw an error? > > It throws an error later when it won't find backup_label. For better or > worse, it's always been like that. Isn't that going to leave us in a broken state though? As in a mistaken pg_stop_backup() will decrement the counter both breaking the streaming base backup, and also possibly throwing an assert when that one eventually wants to do it's do_pg_stop_backup()? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 24.01.2011 22:31, Magnus Hagander wrote: > On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 24.01.2011 20:22, Magnus Hagander wrote: >>> I can't see an explicit check for the user ttempting to do >>> pg_stop_backup() when there is a nonexclusive backup running? Maybe >>> I'm reading it wrong? The case being when a user has started a backup >>> with pg_basebackup but then connects and manually does a >>> pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the >>> decrement, but also doesn't throw an error? >> >> It throws an error later when it won't find backup_label. For better or >> worse, it's always been like that. > > Isn't that going to leave us in a broken state though? As in a > mistaken pg_stop_backup() will decrement the counter both breaking the > streaming base backup, and also possibly throwing an assert when that > one eventually wants to do it's do_pg_stop_backup()? Umm, no. pg_stop_backup() won't decrement the counter unless there's an exclusive backup in operation according to the exclusiveBackup flag. Hmm, perhaps the code would be more readable if instead of the forcePageWrites counter that counts exclusive and non-exclusive backups, and an exclusiveBackup boolean indicating if one of the in-progress backups is an exclusive one, we had a counter that only counts non-exclusive backups, plus a boolean indicating if an exclusive backup is in progress in addition to them. Attached is a patch for that (against master branch, including only xlog.c). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Tue, Jan 25, 2011 at 5:14 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: >> I'm not entirely sure the replication privilege removal is correct. >> Doing that, it's no longer possible to deploy a slave *without* using >> pg_basebackup, unless you are superuser. Do we really want to put that >> restriction back in? > > Hmm, I thought we do, I thought that was changed just to make pg_basebackup > work without superuser privileges. If we encourage users not to use the "replication" privilege to log in the database, putting that restriction seems to be reasonable. > Ok, I won't touch that. But then we'll need to decide what to do about > Fujii's observation > (http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php): Yes. If we allow the "replication" users to call pg_start/stop_backup, we also allow them to connect to the database even during shutdown in order to cancel the backup. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Hmm, perhaps the code would be more readable if instead of the > forcePageWrites counter that counts exclusive and non-exclusive backups, and > an exclusiveBackup boolean indicating if one of the in-progress backups is > an exclusive one, we had a counter that only counts non-exclusive backups, > plus a boolean indicating if an exclusive backup is in progress in addition > to them. > > Attached is a patch for that (against master branch, including only xlog.c). I read this patch and previous-posted one. Those look good. Comments: + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup() + * function. Typo: s/do_pg_start_backup/do_pg_stop_backup It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 25, 2011 at 1:02 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Hmm, perhaps the code would be more readable if instead of the >> forcePageWrites counter that counts exclusive and non-exclusive backups, and >> an exclusiveBackup boolean indicating if one of the in-progress backups is >> an exclusive one, we had a counter that only counts non-exclusive backups, >> plus a boolean indicating if an exclusive backup is in progress in addition >> to them. >> >> Attached is a patch for that (against master branch, including only xlog.c). > > I read this patch and previous-posted one. Those look good. > > Comments: > > + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup() > + * function. > > Typo: s/do_pg_start_backup/do_pg_stop_backup > > It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere. When I read the patch, I found that pg_stop_backup removes the backup history file as soon as it creates the file, if archive_mode is not enabled. This looks like oversight. We should prevent pg_stop_backup from removing the fresh history file? Or we should prevent pg_stop_backup from creating the history file from the beginning since it's not used at all if archiving is disabled? (If archiving is enabled, the history file can be used to clean the archived files up). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 25.01.2011 06:02, Fujii Masao wrote: > On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Hmm, perhaps the code would be more readable if instead of the >> forcePageWrites counter that counts exclusive and non-exclusive backups, and >> an exclusiveBackup boolean indicating if one of the in-progress backups is >> an exclusive one, we had a counter that only counts non-exclusive backups, >> plus a boolean indicating if an exclusive backup is in progress in addition >> to them. >> >> Attached is a patch for that (against master branch, including only xlog.c). > > I read this patch and previous-posted one. Those look good. > > Comments: > > + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup() > + * function. > > Typo: s/do_pg_start_backup/do_pg_stop_backup > > It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere. Thanks. I've committed this now, fixing that, and hopefully all the other issues mentioned in this thread. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 27.01.2011 15:15, Fujii Masao wrote: > When I read the patch, I found that pg_stop_backup removes the backup history > file as soon as it creates the file, if archive_mode is not enabled. > This looks like > oversight. We should prevent pg_stop_backup from removing the fresh history > file? Or we should prevent pg_stop_backup from creating the history > file from the > beginning since it's not used at all if archiving is disabled? > (If archiving is enabled, the history file can be used to clean the > archived files up). Hmm, good point. It's harmless, but creating the history file in the first place sure seems like a waste of time. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Hmm, good point. It's harmless, but creating the history file in the first > place sure seems like a waste of time. The attached patch changes pg_stop_backup so that it doesn't create the backup history file if archiving is not enabled. When I tested the multiple backups, I found that they can have the same checkpoint location and the same history file name. -------------------- $ for ((i=0; i<4; i++)); do pg_basebackup -D test$i -c fast -x -l test$i & done $ cat test0/backup_label START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/20000E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test0 $ cat test1/backup_label START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/20000E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test1 $ cat test2/backup_label START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/20000E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test2 $ cat test3/backup_label START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/20000E8 START TIME: 2011-02-01 12:12:31 JST LABEL: test3 $ ls archive/*.backup archive/000000010000000000000002.000000B0.backup -------------------- This would cause a serious problem. Because the backup-end record which indicates the same "START WAL LOCATION" can be written by the first backup before the other finishes. So we might think wrongly that we've already reached a consistency state by reading the backup-end record (written by the first backup) before reading the last required WAL file. /* * Force a CHECKPOINT. Aside from being necessary to prevent torn * page problems, this guarantees that two successive backup runs will * have different checkpoint positions and hence different history * file names, even if nothing happened in between. * * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing * fast = true). Otherwise this can take awhile. */ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | (fast ? CHECKPOINT_IMMEDIATE : 0)); This problem happens because the above code (in do_pg_start_backup) actually doesn't ensure that the concurrent backups have the different checkpoint locations. ISTM that we should change the above or elsewhere to ensure that. Or we should include backup label name in the backup-end record, to prevent a recovery from reading not-its-own backup-end record. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Hmm, good point. It's harmless, but creating the history file in the first >> place sure seems like a waste of time. > > The attached patch changes pg_stop_backup so that it doesn't create > the backup history file if archiving is not enabled. > > When I tested the multiple backups, I found that they can have the same > checkpoint location and the same history file name. > > -------------------- > $ for ((i=0; i<4; i++)); do > pg_basebackup -D test$i -c fast -x -l test$i & > done > > $ cat test0/backup_label > START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) > CHECKPOINT LOCATION: 0/20000E8 > START TIME: 2011-02-01 12:12:31 JST > LABEL: test0 > > $ cat test1/backup_label > START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) > CHECKPOINT LOCATION: 0/20000E8 > START TIME: 2011-02-01 12:12:31 JST > LABEL: test1 > > $ cat test2/backup_label > START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) > CHECKPOINT LOCATION: 0/20000E8 > START TIME: 2011-02-01 12:12:31 JST > LABEL: test2 > > $ cat test3/backup_label > START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) > CHECKPOINT LOCATION: 0/20000E8 > START TIME: 2011-02-01 12:12:31 JST > LABEL: test3 > > $ ls archive/*.backup > archive/000000010000000000000002.000000B0.backup > -------------------- > > This would cause a serious problem. Because the backup-end record > which indicates the same "START WAL LOCATION" can be written by the > first backup before the other finishes. So we might think wrongly that > we've already reached a consistency state by reading the backup-end > record (written by the first backup) before reading the last required WAL > file. > > /* > * Force a CHECKPOINT. Aside from being necessary to prevent torn > * page problems, this guarantees that two successive backup runs will > * have different checkpoint positions and hence different history > * file names, even if nothing happened in between. > * > * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing > * fast = true). Otherwise this can take awhile. > */ > RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | > (fast ? CHECKPOINT_IMMEDIATE : 0)); > > This problem happens because the above code (in do_pg_start_backup) > actually doesn't ensure that the concurrent backups have the different > checkpoint locations. ISTM that we should change the above or elsewhere > to ensure that. Or we should include backup label name in the backup-end > record, to prevent a recovery from reading not-its-own backup-end record. > > Thought? This patch is on the 9.1 open items list, but I don't understand it well enough to know whether it's correct. Can someone else pick it up? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 17.03.2011 21:39, Robert Haas wrote: > On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com> wrote: >> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >>> Hmm, good point. It's harmless, but creating the history file in the first >>> place sure seems like a waste of time. >> >> The attached patch changes pg_stop_backup so that it doesn't create >> the backup history file if archiving is not enabled. >> >> When I tested the multiple backups, I found that they can have the same >> checkpoint location and the same history file name. >> >> -------------------- >> $ for ((i=0; i<4; i++)); do >> pg_basebackup -D test$i -c fast -x -l test$i& >> done >> >> $ cat test0/backup_label >> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >> CHECKPOINT LOCATION: 0/20000E8 >> START TIME: 2011-02-01 12:12:31 JST >> LABEL: test0 >> >> $ cat test1/backup_label >> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >> CHECKPOINT LOCATION: 0/20000E8 >> START TIME: 2011-02-01 12:12:31 JST >> LABEL: test1 >> >> $ cat test2/backup_label >> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >> CHECKPOINT LOCATION: 0/20000E8 >> START TIME: 2011-02-01 12:12:31 JST >> LABEL: test2 >> >> $ cat test3/backup_label >> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >> CHECKPOINT LOCATION: 0/20000E8 >> START TIME: 2011-02-01 12:12:31 JST >> LABEL: test3 >> >> $ ls archive/*.backup >> archive/000000010000000000000002.000000B0.backup >> -------------------- >> >> This would cause a serious problem. Because the backup-end record >> which indicates the same "START WAL LOCATION" can be written by the >> first backup before the other finishes. So we might think wrongly that >> we've already reached a consistency state by reading the backup-end >> record (written by the first backup) before reading the last required WAL >> file. >> >> /* >> * Force a CHECKPOINT. Aside from being necessary to prevent torn >> * page problems, this guarantees that two successive backup runs will >> * have different checkpoint positions and hence different history >> * file names, even if nothing happened in between. >> * >> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing >> * fast = true). Otherwise this can take awhile. >> */ >> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | >> (fast ? CHECKPOINT_IMMEDIATE : 0)); >> >> This problem happens because the above code (in do_pg_start_backup) >> actually doesn't ensure that the concurrent backups have the different >> checkpoint locations. ISTM that we should change the above or elsewhere >> to ensure that. Yes, good point. >> Or we should include backup label name in the backup-end >> record, to prevent a recovery from reading not-its-own backup-end record. Backup labels are not guaranteed to be unique either, so including backup label in the backup-end-record doesn't solve the problem. But something else like a backup-start counter in shared memory or process id would work. It won't make the history file names unique, though. Now that we use on the end-of-backup record for detecting end-of-backup, the history files are just for documenting purposes. Do we want to give up on history files for backups performed with pg_basebackup? Or we can include the backup counter or similar in the filename too. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 18.03.2011 10:48, Heikki Linnakangas wrote: > On 17.03.2011 21:39, Robert Haas wrote: >> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com> >> wrote: >>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> Hmm, good point. It's harmless, but creating the history file in the >>>> first >>>> place sure seems like a waste of time. >>> >>> The attached patch changes pg_stop_backup so that it doesn't create >>> the backup history file if archiving is not enabled. >>> >>> When I tested the multiple backups, I found that they can have the same >>> checkpoint location and the same history file name. >>> >>> -------------------- >>> $ for ((i=0; i<4; i++)); do >>> pg_basebackup -D test$i -c fast -x -l test$i& >>> done >>> >>> $ cat test0/backup_label >>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>> CHECKPOINT LOCATION: 0/20000E8 >>> START TIME: 2011-02-01 12:12:31 JST >>> LABEL: test0 >>> >>> $ cat test1/backup_label >>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>> CHECKPOINT LOCATION: 0/20000E8 >>> START TIME: 2011-02-01 12:12:31 JST >>> LABEL: test1 >>> >>> $ cat test2/backup_label >>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>> CHECKPOINT LOCATION: 0/20000E8 >>> START TIME: 2011-02-01 12:12:31 JST >>> LABEL: test2 >>> >>> $ cat test3/backup_label >>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>> CHECKPOINT LOCATION: 0/20000E8 >>> START TIME: 2011-02-01 12:12:31 JST >>> LABEL: test3 >>> >>> $ ls archive/*.backup >>> archive/000000010000000000000002.000000B0.backup >>> -------------------- >>> >>> This would cause a serious problem. Because the backup-end record >>> which indicates the same "START WAL LOCATION" can be written by the >>> first backup before the other finishes. So we might think wrongly that >>> we've already reached a consistency state by reading the backup-end >>> record (written by the first backup) before reading the last required >>> WAL >>> file. >>> >>> /* >>> * Force a CHECKPOINT. Aside from being necessary to prevent torn >>> * page problems, this guarantees that two successive backup runs will >>> * have different checkpoint positions and hence different history >>> * file names, even if nothing happened in between. >>> * >>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing >>> * fast = true). Otherwise this can take awhile. >>> */ >>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | >>> (fast ? CHECKPOINT_IMMEDIATE : 0)); >>> >>> This problem happens because the above code (in do_pg_start_backup) >>> actually doesn't ensure that the concurrent backups have the different >>> checkpoint locations. ISTM that we should change the above or elsewhere >>> to ensure that. > > Yes, good point. Here's a patch based on that approach, ensuring that each base backup uses a different checkpoint as the start location. I think I'll commit this, rather than invent a new unique ID mechanism for backups. The latter would need changes in recovery and control file too, and I don't feel like tinkering with that at this stage. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On 18.03.2011 13:56, Heikki Linnakangas wrote: > On 18.03.2011 10:48, Heikki Linnakangas wrote: >> On 17.03.2011 21:39, Robert Haas wrote: >>> On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao<masao.fujii@gmail.com> >>> wrote: >>>> On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas >>>> <heikki.linnakangas@enterprisedb.com> wrote: >>>>> Hmm, good point. It's harmless, but creating the history file in the >>>>> first >>>>> place sure seems like a waste of time. >>>> >>>> The attached patch changes pg_stop_backup so that it doesn't create >>>> the backup history file if archiving is not enabled. >>>> >>>> When I tested the multiple backups, I found that they can have the same >>>> checkpoint location and the same history file name. >>>> >>>> -------------------- >>>> $ for ((i=0; i<4; i++)); do >>>> pg_basebackup -D test$i -c fast -x -l test$i& >>>> done >>>> >>>> $ cat test0/backup_label >>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>>> CHECKPOINT LOCATION: 0/20000E8 >>>> START TIME: 2011-02-01 12:12:31 JST >>>> LABEL: test0 >>>> >>>> $ cat test1/backup_label >>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>>> CHECKPOINT LOCATION: 0/20000E8 >>>> START TIME: 2011-02-01 12:12:31 JST >>>> LABEL: test1 >>>> >>>> $ cat test2/backup_label >>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>>> CHECKPOINT LOCATION: 0/20000E8 >>>> START TIME: 2011-02-01 12:12:31 JST >>>> LABEL: test2 >>>> >>>> $ cat test3/backup_label >>>> START WAL LOCATION: 0/20000B0 (file 000000010000000000000002) >>>> CHECKPOINT LOCATION: 0/20000E8 >>>> START TIME: 2011-02-01 12:12:31 JST >>>> LABEL: test3 >>>> >>>> $ ls archive/*.backup >>>> archive/000000010000000000000002.000000B0.backup >>>> -------------------- >>>> >>>> This would cause a serious problem. Because the backup-end record >>>> which indicates the same "START WAL LOCATION" can be written by the >>>> first backup before the other finishes. So we might think wrongly that >>>> we've already reached a consistency state by reading the backup-end >>>> record (written by the first backup) before reading the last required >>>> WAL >>>> file. >>>> >>>> /* >>>> * Force a CHECKPOINT. Aside from being necessary to prevent torn >>>> * page problems, this guarantees that two successive backup runs will >>>> * have different checkpoint positions and hence different history >>>> * file names, even if nothing happened in between. >>>> * >>>> * We use CHECKPOINT_IMMEDIATE only if requested by user (via passing >>>> * fast = true). Otherwise this can take awhile. >>>> */ >>>> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | >>>> (fast ? CHECKPOINT_IMMEDIATE : 0)); >>>> >>>> This problem happens because the above code (in do_pg_start_backup) >>>> actually doesn't ensure that the concurrent backups have the different >>>> checkpoint locations. ISTM that we should change the above or elsewhere >>>> to ensure that. >> >> Yes, good point. > > Here's a patch based on that approach, ensuring that each base backup > uses a different checkpoint as the start location. I think I'll commit > this, rather than invent a new unique ID mechanism for backups. The > latter would need changes in recovery and control file too, and I don't > feel like tinkering with that at this stage. Ok, committed this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com