Thread: Include WAL in base backup
Here's a cutdown version of the idea about including WAL in the base backup. What I initially wanted was to introduce a way to guarantee that the required WAL (with some sort of cutoff of course) would be available for the backup, but I ran out of time for that. We can always add that later. For now, you need to set wal_keep_segments to make it work properly, but if you do the idea is that the tar file/stream generated in the base backup will include all the required WAL files. That means that you can start a postmaster directly against the directory extracted from the tarball, and you no longer need to set up archive logging to get a backup. I've got some refactoring I want to do around the SendBackupDirectory() function after this, but a review of the functionality first would be good. And obviously, documentation is still necessary. The patch to pg_basebackup applies on top of the previously posted version. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Attachment
Magnus Hagander <magnus@hagander.net> writes: > Here's a cutdown version of the idea about including WAL in the base > backup. What I initially wanted was to introduce a way to guarantee > that the required WAL (with some sort of cutoff of course) would be > available for the backup, but I ran out of time for that. We can > always add that later. What if you start a concurrent process that begins streaming the WAL segments just before you start the backup, and you stop it after having stopped the backup. I would think that then, the local received files would be complete. We would only need a program able to stream the WAL segments and build WAL files from that… do you know about one? :) > For now, you need to set wal_keep_segments to make it work properly, That's quite a big foot gun, isn't it? You would have to at least offer an option to check for your backup or to call it broken when you miss some WAL files on the server. The only other safe option I know about that's not a pg_streamrecv subprocess would be to require archiving for the duration of the base backup, but I think we agreed that it would be nice being able to bypass that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sat, Jan 15, 2011 at 23:32, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> Here's a cutdown version of the idea about including WAL in the base >> backup. What I initially wanted was to introduce a way to guarantee >> that the required WAL (with some sort of cutoff of course) would be >> available for the backup, but I ran out of time for that. We can >> always add that later. > > What if you start a concurrent process that begins streaming the WAL > segments just before you start the backup, and you stop it after having > stopped the backup. I would think that then, the local received files > would be complete. We would only need a program able to stream the WAL > segments and build WAL files from that… do you know about one? :) Sure, if you stream the backups "on the side", then you don't need this feature. This is for "very simple filesystem backups", without the need to set up streaming of archiving. If you're streaming the WAL separately, you'd use pg_basebackup in non-wal mode. That's why it's optional. >> For now, you need to set wal_keep_segments to make it work properly, > > That's quite a big foot gun, isn't it? You would have to at least offer > an option to check for your backup or to call it broken when you miss > some WAL files on the server. Oh, it will give you an ERROR if any of the required WAL files aren't around anymore. So you will know that your backup is incomplete. > The only other safe option I know about that's not a pg_streamrecv > subprocess would be to require archiving for the duration of the base > backup, but I think we agreed that it would be nice being able to bypass > that. No, the safe option I'm envisioning for this one is to just prevent the server from recycling the logfiles until they have been sent off. Witha cap so we don't prevent recycling forever if the client hangs. This is the part I started working on, but didn't hav etime to finish for the CF. I've not given up on it, it's just waiting for later. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: >> What if you start a concurrent process that begins streaming the WAL >> segments just before you start the backup, and you stop it after having >> stopped the backup. I would think that then, the local received files >> would be complete. We would only need a program able to stream the WAL >> segments and build WAL files from that… do you know about one? :) > > Sure, if you stream the backups "on the side", then you don't need > this feature. This is for "very simple filesystem backups", without > the need to set up streaming of archiving. What I meant is: why don't we provide an option to automate just that behavior in pg_basebackup? It looks like a fork() then calling code you already wrote. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Jan 16, 2011 at 18:45, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Magnus Hagander <magnus@hagander.net> writes: >>> What if you start a concurrent process that begins streaming the WAL >>> segments just before you start the backup, and you stop it after having >>> stopped the backup. I would think that then, the local received files >>> would be complete. We would only need a program able to stream the WAL >>> segments and build WAL files from that… do you know about one? :) >> >> Sure, if you stream the backups "on the side", then you don't need >> this feature. This is for "very simple filesystem backups", without >> the need to set up streaming of archiving. > > What I meant is: why don't we provide an option to automate just that > behavior in pg_basebackup? It looks like a fork() then calling code you > already wrote. Ah, I see. That's a good idea. However, it's not quite that simple. "just adding a fork()" doesn't work on all our platforms, and you need to deal with feedback and such between them as well. I think it's definitely something worth doing in the long run, but I think we should start with the simpler way. Oh, and this might be the use-case for integrating the streaming log code as well :-) But if we plan to do that, perhaps we should pick a different name for the binary? Or maybe just share code with another one later.. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > However, it's not quite that simple. "just adding a fork()" doesn't > work on all our platforms, and you need to deal with feedback and such > between them as well. I'd think client-side, we have an existing implementation with the parallel pg_restore option. Don't know (yet) how easy it is to reuse that code… > Oh, and this might be the use-case for integrating the streaming log > code as well :-) But if we plan to do that, perhaps we should pick a > different name for the binary? Or maybe just share code with another > one later.. You're talking about the pg_streamrecv binary? Then yes, my idea about it is that it should become the default archive client we ship with PostgreSQL. And grow into offering a way to be the default restore command too. I'd see the current way that the standby can switch between restoring from archive and from a live stream as a first step into that direction. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Jan 16, 2011 at 20:13, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> However, it's not quite that simple. "just adding a fork()" doesn't >> work on all our platforms, and you need to deal with feedback and such >> between them as well. > > I'd think client-side, we have an existing implementation with the > parallel pg_restore option. Don't know (yet) how easy it is to reuse > that code… True. But however we do it, it will be significantly more complex than just including the WAL. And I want to make sure we get *something* done in time for 9.1, and then improve upon it. If we can get the improvement into 9.1 that's great, but if not it will have to wait until 9.2 - and I don't want to leave us without anything for 9.1. >> Oh, and this might be the use-case for integrating the streaming log >> code as well :-) But if we plan to do that, perhaps we should pick a >> different name for the binary? Or maybe just share code with another >> one later.. > > You're talking about the pg_streamrecv binary? Then yes, my idea about > it is that it should become the default archive client we ship with > PostgreSQL. And grow into offering a way to be the default restore > command too. I'd see the current way that the standby can switch > between restoring from archive and from a live stream as a first step > into that direction. Right. I did put both the base backup and the wal streaming in the same binary earlier, and the only shared code was the one to connect to the db. So I split them apart again. This is the reason to put them back together perhaps - or just have the ability for pg_basebackup to fork()+exec() the wal streamer. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > But however we do it, it will be significantly more complex than just > including the WAL. And I want to make sure we get *something* done in > time for 9.1, and then improve upon it. If we can get the improvement > into 9.1 that's great, but if not it will have to wait until 9.2 - and > I don't want to leave us without anything for 9.1. +1. The only point I'm not clear on is the complexity, and I trust you to cut off at the right point here… meanwhile, I'm still asking for this little more until you say your plate's full :) > Right. I did put both the base backup and the wal streaming in the > same binary earlier, and the only shared code was the one to connect > to the db. So I split them apart again. This is the reason to put them > back together perhaps - or just have the ability for pg_basebackup to > fork()+exec() the wal streamer. That would be awesome. Then pg_streamrecv could somehow accept options that make it suitable for use as an archive command, connecting to your (still?) third-party daemon? At this point it'd be pg_walsender :) Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Greetings, * Magnus Hagander (magnus@hagander.net) wrote: > For now, you need to set wal_keep_segments to make it work properly, > but if you do the idea is that the tar file/stream generated in the > base backup will include all the required WAL files. Is there some reason to not ERROR outright if we're asked to provide WAL and wal_keep_segments isn't set..? I'd rather do that than only ERROR when a particular WAL is missing.. That could lead to transient backup errors that an inexperienced sysadmin or DBA might miss or ignore. They'll notice if it doesn't work the first time they try it and spits out a hint about wal_keep_segments. > That means that > you can start a postmaster directly against the directory extracted > from the tarball, and you no longer need to set up archive logging to > get a backup. Like that part. > I've got some refactoring I want to do around the > SendBackupDirectory() function after this, but a review of the > functionality first would be good. And obviously, documentation is > still necessary. mkay, I'm not going to try to make this ready for committer, but will provide my comments on it overall. Bit difficult to review when someone else is reviewing the base patch too. :/ Here goes: - I'm not a huge fan of the whole 'closetar' option, that feels really rather wrong to me. Why not just open it and closeit in perform_base_backup(), unconditionally? - I wonder if you're not getting to a level where you shold be using a struct to pass the relevant information to perform_base_backup()instead of adding more arguments on.. That's going to get unwieldy at some point. - Why not initialize logid and logseg like so?: int logid = startptr.xlogid;int logseg = startptr.xrecoff / XLogSegSize; Then use those in your elog? Seems cleaner to me. - A #define right in the middle of a while loop...? Really? - The grammar changes strike me as.. odd. Typically, you would have an 'option' production that you can then have a listof and then let each option be whatever the OR'd set of options is. Wouldn't the current grammar require that you putthe options in a specific order? That'd blow. @@ -687,7 +690,7 @@ BaseBackup() * once since it can be relocated, and it will be checked before we do *anything anyway. */ - if (basedir != NULL && i > 0) + if (basedir != NULL && !PQgetisnull(res, i, 1)) verify_dir_is_empty_or_create(PQgetvalue(res, i, 1)); } - Should the 'i > 0' conditional above still be there..? So, that's my review from just reading the source code and the thread.. Hope it's useful, sorry it's not more. :/ Thanks, Stephen
Magnus, * Stephen Frost (sfrost@snowman.net) wrote: > mkay, I'm not going to try to make this ready for committer, but will > provide my comments on it overall. Bit difficult to review when someone > else is reviewing the base patch too. :/ I went ahead and marked it as 'waiting on author', since I'd like feedback on my comments, but I'll try to make time in the next few days to apply the two patches and test it out, unless I hear otherwise. Thanks, Stephen
On Thu, Jan 20, 2011 at 05:03, Stephen Frost <sfrost@snowman.net> wrote: > Greetings, > > * Magnus Hagander (magnus@hagander.net) wrote: >> For now, you need to set wal_keep_segments to make it work properly, >> but if you do the idea is that the tar file/stream generated in the >> base backup will include all the required WAL files. > > Is there some reason to not ERROR outright if we're asked to provide WAL > and wal_keep_segments isn't set..? I'd rather do that than only ERROR > when a particular WAL is missing.. That could lead to transient backup > errors that an inexperienced sysadmin or DBA might miss or ignore. > They'll notice if it doesn't work the first time they try it and spits > out a hint about wal_keep_segments. Well, in a "smaller:ish" database you can easily do the full backup before you run out of segments in the data directory even when you haven't set wal_keep_segments. If we error out, we force extra work on the user in the trivial case. I'd rather not change that, but instead (as Fujii-san has also mentioned is needed anyway) put some more effort into documenting in which cases you need to set it. >> I've got some refactoring I want to do around the >> SendBackupDirectory() function after this, but a review of the >> functionality first would be good. And obviously, documentation is >> still necessary. > > mkay, I'm not going to try to make this ready for committer, but will > provide my comments on it overall. Bit difficult to review when someone > else is reviewing the base patch too. :/ Heh, yeah. > Here goes: > > - I'm not a huge fan of the whole 'closetar' option, that feels really > rather wrong to me. Why not just open it and close it in > perform_base_backup(), unconditionally? Yeah, we could move the whole thing up there. Or, as I mentioned in an IM conversation with Heikki, just get rid of SendBackupDirectory() completely and inline it inside the loop in perform_base_backup(). Given that it's basically just 5 lines + a call to sendDir().. > - I wonder if you're not getting to a level where you shold be using a > struct to pass the relevant information to perform_base_backup() > instead of adding more arguments on.. That's going to get unwieldy at > some point. Yeah, probably. We *could* pass the BaseBackupCmd struct from the parser all the way in - or is that cheating too much on abstractions? It seems if we don't, we're just going to hav ea copy of that struct without the NodeTag member.. > - Why not initialize logid and logseg like so?: > > int logid = startptr.xlogid; > int logseg = startptr.xrecoff / XLogSegSize; > > Then use those in your elog? Seems cleaner to me. Hmm. Yes. Agreed. > - A #define right in the middle of a while loop...? Really? Haha, yeah, that was a typo. I didn't remember the name of the variable so I stuck it there for testing and forgot it. It should be ThisTimeLineID, and no #define at all. > - The grammar changes strike me as.. odd. Typically, you would have an > 'option' production that you can then have a list of and then let each > option be whatever the OR'd set of options is. Wouldn't the current > grammar require that you put the options in a specific order? That'd > blow. It does require them in a specific order. The advantage is that it makes the code easier. and it's not like end users are expected to run them anyway... Now, I'm no bison export, so it might be an easy fix. But the way I could figure, to make them order indepdent I have to basically collect them up together as a List instead of just as a struct, and then loop through that list to build a struct later. If someone who knows Bison better can tell me a neater way to do that, I'll be happy to change :-) > @@ -687,7 +690,7 @@ BaseBackup() > * once since it can be relocated, and it will be checked before we do > * anything anyway. > */ > - if (basedir != NULL && i > 0) > + if (basedir != NULL && !PQgetisnull(res, i, 1)) > verify_dir_is_empty_or_create(PQgetvalue(res, i, 1)); > } > > - Should the 'i > 0' conditional above still be there..? No. That's a cheat-check that assumes the base directory is always sent first. Which is not true anymore - with this patch we always send it *last* so we can include the WAL in it. > So, that's my review from just reading the source code and the thread.. > Hope it's useful, sorry it's not more. :/ Thanks - it certainly is! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: >> - Why not initialize logid and logseg like so?: >> >> � � � �int logid = startptr.xlogid; >> � � � �int logseg = startptr.xrecoff / XLogSegSize; >> >> �Then use those in your elog? �Seems cleaner to me. > Hmm. Yes. Agreed. Marginal complaint here: int is the wrong type, I'm pretty sure. regards, tom lane
On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >>> - Why not initialize logid and logseg like so?: >>> >>> int logid = startptr.xlogid; >>> int logseg = startptr.xrecoff / XLogSegSize; >>> >>> Then use those in your elog? Seems cleaner to me. > >> Hmm. Yes. Agreed. > > Marginal complaint here: int is the wrong type, I'm pretty sure. And, we should use XLByteToPrevSeg here instead of just =, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote: > On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Magnus Hagander <magnus@hagander.net> writes: >>>> - Why not initialize logid and logseg like so?: >>>> >>>> int logid = startptr.xlogid; >>>> int logseg = startptr.xrecoff / XLogSegSize; >>>> >>>> Then use those in your elog? Seems cleaner to me. >> >>> Hmm. Yes. Agreed. >> >> Marginal complaint here: int is the wrong type, I'm pretty sure. > > And, we should use XLByteToPrevSeg here instead of just =, I think. Not XLByteToSeg? (I admit I missed the existance of both those macros, though) I plan to post a rebased version of this patch today or tomorrow, btw, that should work against all the changes that have been applied to git. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, Jan 24, 2011 at 4:47 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Magnus Hagander <magnus@hagander.net> writes: >>>>> - Why not initialize logid and logseg like so?: >>>>> >>>>> int logid = startptr.xlogid; >>>>> int logseg = startptr.xrecoff / XLogSegSize; >>>>> >>>>> Then use those in your elog? Seems cleaner to me. >>> >>>> Hmm. Yes. Agreed. >>> >>> Marginal complaint here: int is the wrong type, I'm pretty sure. >> >> And, we should use XLByteToPrevSeg here instead of just =, I think. > > Not XLByteToSeg? Checking... yeah, you are right. We should use XLByteToSeg since the REDO starting WAL record doesn't exist in the previous WAL segment when the REDO starting location is a boundary byte. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Mon, Jan 24, 2011 at 09:03, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jan 24, 2011 at 4:47 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Magnus Hagander <magnus@hagander.net> writes: >>>>>> - Why not initialize logid and logseg like so?: >>>>>> >>>>>> int logid = startptr.xlogid; >>>>>> int logseg = startptr.xrecoff / XLogSegSize; >>>>>> >>>>>> Then use those in your elog? Seems cleaner to me. >>>> >>>>> Hmm. Yes. Agreed. >>>> >>>> Marginal complaint here: int is the wrong type, I'm pretty sure. >>> >>> And, we should use XLByteToPrevSeg here instead of just =, I think. >> >> Not XLByteToSeg? > > Checking... yeah, you are right. We should use XLByteToSeg since > the REDO starting WAL record doesn't exist in the previous WAL > segment when the REDO starting location is a boundary byte. Here's an updated version of the patch: * rebased on the current git master (including changing the switch from -w to -x since -w was used now) * includes some documentation * use XLogByteToSeg and uint32 as mentioned here * refactored to remove SendBackupDirectory(), removing the ugly closetar boolean -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Attachment
On Tue, Jan 25, 2011 at 12:33 AM, Magnus Hagander <magnus@hagander.net> wrote: > Here's an updated version of the patch: > > * rebased on the current git master (including changing the switch > from -w to -x since -w was used now) > * includes some documentation > * use XLogByteToSeg and uint32 as mentioned here > * refactored to remove SendBackupDirectory(), removing the ugly closetar boolean I reviewed the patch. Here are comments. + {"xlog", no_argument, NULL, 'w'}, Typo: s/w/x /** BASE_BACKUP [LABEL <label>] [PROGRESS] [FAST]*/ In repl_gram.y, the above comment needs to be updated. Both this patch and the multiple-backup one removes SendBackupDirectory in the almost same way. So, how about committing that common part first? + XLByteToSeg(endptr, endlogid, endlogseg); XLByteToPrevSeg should be used for the backup end location. Because when it's a boundary byte, all the required WAL records exist in the previous WAL segment. This is why pg_stop_backup uses XLByteToPrevSeg for the backup end location. + elog(DEBUG1, "Going to send wal from %i.%i to %i.%i", + logid, logseg, + endlogid, endlogseg); %u should be used instead of %i. Or how about logging the filename? + char xlogname[64]; How about using MAXFNAMELEN instead of 64? + XLogFileName(xlogname, ThisTimeLineID, logid, logseg); + sprintf(fn, "./pg_xlog/%s", xlogname); How about using XLogFilePath? instead? + if (logid > endptr.xlogid || + (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize)) Why don't you use endlogseg? The estimated size of $PGDATA doesn't include WAL files, but the actual one does. This inconsistency messes up the progress report as follows: 33708/17323 kB (194%) 1/1 tablespaces So, what about sparating the progress report into two parts: one for $PGDATA and tablespaces, another for WAL files? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 25, 2011 at 10:56, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jan 25, 2011 at 12:33 AM, Magnus Hagander <magnus@hagander.net> wrote: >> Here's an updated version of the patch: >> >> * rebased on the current git master (including changing the switch >> from -w to -x since -w was used now) >> * includes some documentation >> * use XLogByteToSeg and uint32 as mentioned here >> * refactored to remove SendBackupDirectory(), removing the ugly closetar boolean > > I reviewed the patch. Here are comments. > > > + {"xlog", no_argument, NULL, 'w'}, > > Typo: s/w/x Ah, oops. thanks. > /* > * BASE_BACKUP [LABEL <label>] [PROGRESS] [FAST] > */ > > In repl_gram.y, the above comment needs to be updated. Same here - oops, thanks. It was also missing the quotes around <label>. > Both this patch and the multiple-backup one removes SendBackupDirectory > in the almost same way. So, how about committing that common part first? I think they're small enough that we'll just commit it along with whichever comes first and then have the other one merge with it. > + XLByteToSeg(endptr, endlogid, endlogseg); > > XLByteToPrevSeg should be used for the backup end location. Because > when it's a boundary byte, all the required WAL records exist in the > previous WAL segment. This is why pg_stop_backup uses XLByteToPrevSeg > for the backup end location. Well, it's just for debugging output, really, but see below. > + elog(DEBUG1, "Going to send wal from %i.%i to %i.%i", > + logid, logseg, > + endlogid, endlogseg); > > %u should be used instead of %i. Or how about logging the filename? I actually plan to remove that DEBUG output completely (sorry, forgot to mention that), I'm not sure it's useful to have it around once we apply. Or do you think it would be useful to keep? > + char xlogname[64]; > > How about using MAXFNAMELEN instead of 64? > Agreed. > + XLogFileName(xlogname, ThisTimeLineID, logid, logseg); > + sprintf(fn, "./pg_xlog/%s", xlogname); > > How about using XLogFilePath? instead? Can't do that, because we need the "./" part when we call sendFile() - it's structured around having that one, since we get it from the file name looping. > + if (logid > endptr.xlogid || > + (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize)) > > Why don't you use endlogseg? Honestly? Because I thought I added endlogseg just for the debugging output, didn't realize I had it down here. But if I do, then I should not use the XLByteToPrevSeg() per your suggestion above, right? Keep it the way it is. > The estimated size of $PGDATA doesn't include WAL files, but the > actual one does. > This inconsistency messes up the progress report as follows: > > 33708/17323 kB (194%) 1/1 tablespaces Yeah, that is definitely a potential problem. I think we're just going to have to document it - and in a big database, it's not going to be quite as bad... > So, what about sparating the progress report into two parts: one for $PGDATA and > tablespaces, another for WAL files? We can't really do that. We need to send the progress report before we begin the tar file, and we don't know how many xlog segments we will have at that time. And we don't want to send pg_xlog as a separate tar stream, because if we do we won't be able to get the single-tar-output in the simple case - thus no piping etc. I actually had the xlog data as it's own tar stream in the beginning, but Heikki convinced me of the advantage of keeping it in the main one... What we could do, I guess, is to code pg_basebackup to detect when it starts receiving xlog files and then stop increasing the percentage at that point, instead just doing a counter? (the discussed changse above have been applied and pushed to github) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jan 25, 2011 at 8:21 PM, Magnus Hagander <magnus@hagander.net> wrote: >> + elog(DEBUG1, "Going to send wal from %i.%i to %i.%i", >> + logid, logseg, >> + endlogid, endlogseg); >> >> %u should be used instead of %i. Or how about logging the filename? > > I actually plan to remove that DEBUG output completely (sorry, forgot > to mention that), I'm not sure it's useful to have it around once we > apply. Or do you think it would be useful to keep? Nope. I'm OK to remove that. >> + XLogFileName(xlogname, ThisTimeLineID, logid, logseg); >> + sprintf(fn, "./pg_xlog/%s", xlogname); >> >> How about using XLogFilePath? instead? > > Can't do that, because we need the "./" part when we call sendFile() - > it's structured around having that one, since we get it from the file > name looping. Understood. >> + if (logid > endptr.xlogid || >> + (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize)) >> >> Why don't you use endlogseg? > > Honestly? Because I thought I added endlogseg just for the debugging > output, didn't realize I had it down here. > > But if I do, then I should not use the XLByteToPrevSeg() per your > suggestion above, right? Keep it the way it is. Yes. If we check "logseg >= endlogseg", we should use XLByteToSeg. >> So, what about sparating the progress report into two parts: one for $PGDATA and >> tablespaces, another for WAL files? > > We can't really do that. We need to send the progress report before we > begin the tar file, and we don't know how many xlog segments we will > have at that time. And we don't want to send pg_xlog as a separate tar > stream, because if we do we won't be able to get the single-tar-output > in the simple case - thus no piping etc. I actually had the xlog data > as it's own tar stream in the beginning, but Heikki convinced me of > the advantage of keeping it in the main one... > > What we could do, I guess, is to code pg_basebackup to detect when it > starts receiving xlog files and then stop increasing the percentage at > that point, instead just doing a counter? Umm.. not progressing the report during receiving WAL files seems also odd. But I don't have another good idea... For now, I'm OK if the behavior of the progress report is well documented. > (the discussed changse above have been applied and pushed to github) Thanks! I'll test and review that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> (the discussed changse above have been applied and pushed to github) > > Thanks! I'll test and review that. WAL file might get recycled or removed while walsender is reading it. So the WAL file which pg_basebackup seemingly received successfully might be actually invalid. Shouldn't we need to check that what we read is valid as XLogRead does? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 25, 2011 at 15:04, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> (the discussed changse above have been applied and pushed to github) >> >> Thanks! I'll test and review that. > > WAL file might get recycled or removed while walsender is reading it. > So the WAL file which pg_basebackup seemingly received successfully > might be actually invalid. Shouldn't we need to check that what we read > is valid as XLogRead does? Yikes. Yes, you are correct. We do catch the case when it is removed, but not when it's recycled. I'll need to look at that. Not sure if I'll have time today, but I'll do it as soon as I can :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jan 25, 2011 at 16:34, Magnus Hagander <magnus@hagander.net> wrote: > On Tue, Jan 25, 2011 at 15:04, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> (the discussed changse above have been applied and pushed to github) >>> >>> Thanks! I'll test and review that. >> >> WAL file might get recycled or removed while walsender is reading it. >> So the WAL file which pg_basebackup seemingly received successfully >> might be actually invalid. Shouldn't we need to check that what we read >> is valid as XLogRead does? We should, and the easiest way is to actually use XLogRead() since the code is already there. How about the way in this patch? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Attachment
On Wed, Jan 26, 2011 at 5:17 AM, Magnus Hagander <magnus@hagander.net> wrote: > We should, and the easiest way is to actually use XLogRead() since the > code is already there. How about the way in this patch? Thanks for the update. I reread the patch. + MemSet(&statbuf, 0, sizeof(statbuf)); + statbuf.st_mode=S_IRUSR | S_IWUSR; +#ifndef WIN32 + statbuf.st_uid=getuid(); + statbuf.st_gid=getgid(); +#endif + statbuf.st_size=XLogSegSize; + statbuf.st_mtime=time(NULL); Can you put a space around "="? In the multiple-backups patch, Heikki uses geteuid and getegid for the same purpose instead of getuid and getgid, as follows. ! /* Windows doesn't have the concept of uid and gid */ ! #ifdef WIN32 ! statbuf.st_uid = 0; ! statbuf.st_gid = 0; ! #else ! statbuf.st_uid = geteuid(); ! statbuf.st_gid = getegid(); ! #endif ! statbuf.st_mtime = time(NULL); ! statbuf.st_mode = S_IRUSR | S_IWUSR; ! statbuf.st_size = len; Which is correct? Since we cannot start the PostgreSQL when getuid != geteuid, I don't think that there is really difference between getuid and geteuid. But other code always uses geteuid, so I think that geteuid should be used here instead of getuid for the sake of consistency. OTOH, I'm not sure if there is really difference between getgid and getegid in the backend (though I guess getgid == getegid because getuid == geteuid), and which should be used here. What is your opinion? + XLogFileName(xlogname, ThisTimeLineID, logid, logseg); + + sprintf(fn, "./pg_xlog/%s", xlogname); + _tarWriteHeader(fn, NULL, &statbuf); Can we use XLogFilePath? instead? Because sendFile has not been used. + XLByteToSeg(endptr, endlogid, endlogseg); <snip> + /* Have we reached our stop position yet? */ + if (logid > endlogid || + (logid == endlogid && logseg >= endlogseg)) + break; What I said in upthread is wrong. We should use XLByteToPrevSeg for endptr and check "logseg > endlogseg". Otherwise, if endptr is not a boundary byte, endlogid/endlogseg indicates the last necessary WAL file, but it's not sent. + if (percent > 100) + percent = 100; I'm not sure if this is good idea because the total received can go further than the estimated size though the percentage stops at 100. This looks more confusing than the previous behavior. Anyway, I think that we need to document about the combination of -P and -x in pg_basebackup. In pg_basebackup.sgml <term><option>--checkpoint <replaceable class="parameter">fast|spread</replaceable></option></term> Though this is not the problem of the patch, I found the inconsistency of the descriptions about options of pg_basebackup. We should s/--checkpoint/--checkpoint= Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 27.01.2011 06:44, Fujii Masao wrote: > + XLByteToSeg(endptr, endlogid, endlogseg); > <snip> > + /* Have we reached our stop position yet? */ > + if (logid> endlogid || > + (logid == endlogid&& logseg>= endlogseg)) > + break; > > What I said in upthread is wrong. We should use XLByteToPrevSeg > for endptr and check "logseg> endlogseg". Otherwise, if endptr is > not a boundary byte, endlogid/endlogseg indicates the last > necessary WAL file, but it's not sent. We should use XLByteToPrevSeg, but I believe >= is still correct. logid/logseg is the last WAL segment we've successfully sent, and endlogif/endlogid is the last WAL segment we need to send. When they are the same, we're done. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sat, Jan 29, 2011 at 6:02 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 27.01.2011 06:44, Fujii Masao wrote: >> >> + XLByteToSeg(endptr, endlogid, endlogseg); >> <snip> >> + /* Have we reached our stop position yet? */ >> + if (logid> endlogid || >> + (logid == endlogid&& logseg>= endlogseg)) >> + break; >> >> What I said in upthread is wrong. We should use XLByteToPrevSeg >> for endptr and check "logseg> endlogseg". Otherwise, if endptr is >> not a boundary byte, endlogid/endlogseg indicates the last >> necessary WAL file, but it's not sent. > > We should use XLByteToPrevSeg, but I believe >= is still correct. > logid/logseg is the last WAL segment we've successfully sent, and > endlogif/endlogid is the last WAL segment we need to send. When they are the > same, we're done. Really? logid/logseg is incremented just before the check as follows. So, when they are the same, the WAL file which logid/logseg indicates has not been sent yet. Am I missing something? + /* Advance to the next WAL file */ + NextLogSeg(logid, logseg); + + /* Have we reached our stop position yet? */ + if (logid > endlogid || + (logid == endlogid && logseg >= endlogseg)) + break; Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 29.01.2011 09:10, Fujii Masao wrote: > On Sat, Jan 29, 2011 at 6:02 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 27.01.2011 06:44, Fujii Masao wrote: >>> >>> + XLByteToSeg(endptr, endlogid, endlogseg); >>> <snip> >>> + /* Have we reached our stop position yet? */ >>> + if (logid> endlogid || >>> + (logid == endlogid&& logseg>= endlogseg)) >>> + break; >>> >>> What I said in upthread is wrong. We should use XLByteToPrevSeg >>> for endptr and check "logseg> endlogseg". Otherwise, if endptr is >>> not a boundary byte, endlogid/endlogseg indicates the last >>> necessary WAL file, but it's not sent. >> >> We should use XLByteToPrevSeg, but I believe>= is still correct. >> logid/logseg is the last WAL segment we've successfully sent, and >> endlogif/endlogid is the last WAL segment we need to send. When they are the >> same, we're done. > > Really? logid/logseg is incremented just before the check as follows. > So, when they are the same, the WAL file which logid/logseg indicates > has not been sent yet. Am I missing something? > > + /* Advance to the next WAL file */ > + NextLogSeg(logid, logseg); > + > + /* Have we reached our stop position yet? */ > + if (logid> endlogid || > + (logid == endlogid&& logseg>= endlogseg)) > + break; Ah, you're right, I misread it. Never mind.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Jan 27, 2011 at 05:44, Fujii Masao <masao.fujii@gmail.com> wrote: > On Wed, Jan 26, 2011 at 5:17 AM, Magnus Hagander <magnus@hagander.net> wrote: >> We should, and the easiest way is to actually use XLogRead() since the >> code is already there. How about the way in this patch? > > Thanks for the update. I reread the patch. > > + MemSet(&statbuf, 0, sizeof(statbuf)); > + statbuf.st_mode=S_IRUSR | S_IWUSR; > +#ifndef WIN32 > + statbuf.st_uid=getuid(); > + statbuf.st_gid=getgid(); > +#endif > + statbuf.st_size=XLogSegSize; > + statbuf.st_mtime=time(NULL); > > Can you put a space around "="? I'll run pgindent, it'll take care of that. > Which is correct? Since we cannot start the PostgreSQL when > getuid != geteuid, I don't think that there is really difference between > getuid and geteuid. But other code always uses geteuid, so I think > that geteuid should be used here instead of getuid for the sake of > consistency. Yeah, changed for consistency. > + XLogFileName(xlogname, ThisTimeLineID, logid, logseg); > + > + sprintf(fn, "./pg_xlog/%s", xlogname); > + _tarWriteHeader(fn, NULL, &statbuf); > > Can we use XLogFilePath? instead? Because sendFile has not been > used. We can now, changed. > What I said in upthread is wrong. We should use XLByteToPrevSeg > for endptr and check "logseg > endlogseg". Otherwise, if endptr is > not a boundary byte, endlogid/endlogseg indicates the last > necessary WAL file, but it's not sent. Yeah, thanks for this - and thanks to Heikki for straightening it out for me. > + if (percent > 100) > + percent = 100; > > I'm not sure if this is good idea because the total received can go > further than the estimated size though the percentage stops at 100. > This looks more confusing than the previous behavior. Anyway, > I think that we need to document about the combination of -P and > -x in pg_basebackup. I found it less confusing - but still somewhat confusing. I'll add some docs. > In pg_basebackup.sgml > <term><option>--checkpoint <replaceable > class="parameter">fast|spread</replaceable></option></term> > > Though this is not the problem of the patch, I found the inconsistency > of the descriptions about options of pg_basebackup. We should > s/--checkpoint/--checkpoint= Agreed, changed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, Jan 31, 2011 at 4:22 AM, Magnus Hagander <magnus@hagander.net> wrote: >> In pg_basebackup.sgml >> <term><option>--checkpoint <replaceable >> class="parameter">fast|spread</replaceable></option></term> >> >> Though this is not the problem of the patch, I found the inconsistency >> of the descriptions about options of pg_basebackup. We should >> s/--checkpoint/--checkpoint= > > Agreed, changed. Thanks for all your efforts! pg_basebackup and this new option are VERY useful. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Sun, Jan 30, 2011 at 9:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jan 31, 2011 at 4:22 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> In pg_basebackup.sgml >>> <term><option>--checkpoint <replaceable >>> class="parameter">fast|spread</replaceable></option></term> >>> >>> Though this is not the problem of the patch, I found the inconsistency >>> of the descriptions about options of pg_basebackup. We should >>> s/--checkpoint/--checkpoint= >> >> Agreed, changed. > > Thanks for all your efforts! pg_basebackup and this new option are VERY useful. +1! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company