Thread: Include WAL in base backup

Include WAL in base backup

From

Magnus Hagander

Date:

15 January 2011, 20:08:05

Here's a cutdown version of the idea about including WAL in the base
backup. What I initially wanted was to introduce a way to guarantee
that the required WAL (with some sort of cutoff of course) would be
available for the backup, but I ran out of time for that. We can
always add that later.

For now, you need to set wal_keep_segments to make it work properly,
but if you do the idea is that the tar file/stream generated in the
base backup will include all the required WAL files. That means that
you can start a postmaster directly against the directory extracted
from the tarball, and you no longer need to set up archive logging to
get a backup.

I've got some refactoring I want to do around the
SendBackupDirectory() function after this, but a review of the
functionality first would be good. And obviously, documentation is
still necessary.

The patch to pg_basebackup applies on top of the previously posted version.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

basebackup_wal.patch

Re: Include WAL in base backup

From

Dimitri Fontaine

Date:

15 January 2011, 21:32:55

Magnus Hagander <magnus@hagander.net> writes:
> Here's a cutdown version of the idea about including WAL in the base
> backup. What I initially wanted was to introduce a way to guarantee
> that the required WAL (with some sort of cutoff of course) would be
> available for the backup, but I ran out of time for that. We can
> always add that later.

What if you start a concurrent process that begins streaming the WAL
segments just before you start the backup, and you stop it after having
stopped the backup.  I would think that then, the local received files
would be complete.  We would only need a program able to stream the WAL
segments and build WAL files from that… do you know about one? :)

> For now, you need to set wal_keep_segments to make it work properly,

That's quite a big foot gun, isn't it?  You would have to at least offer
an option to check for your backup or to call it broken when you miss
some WAL files on the server.

The only other safe option I know about that's not a pg_streamrecv
subprocess would be to require archiving for the duration of the base
backup, but I think we agreed that it would be nice being able to bypass
that.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Include WAL in base backup

From

Magnus Hagander

Date:

15 January 2011, 22:54:19

On Sat, Jan 15, 2011 at 23:32, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> Here's a cutdown version of the idea about including WAL in the base
>> backup. What I initially wanted was to introduce a way to guarantee
>> that the required WAL (with some sort of cutoff of course) would be
>> available for the backup, but I ran out of time for that. We can
>> always add that later.
>
> What if you start a concurrent process that begins streaming the WAL
> segments just before you start the backup, and you stop it after having
> stopped the backup.  I would think that then, the local received files
> would be complete.  We would only need a program able to stream the WAL
> segments and build WAL files from that… do you know about one? :)

Sure, if you stream the backups "on the side", then you don't need
this feature. This is for "very simple filesystem backups", without
the need to set up streaming of archiving.

If you're streaming the WAL separately, you'd use pg_basebackup in
non-wal mode. That's why it's optional.

>> For now, you need to set wal_keep_segments to make it work properly,
>
> That's quite a big foot gun, isn't it?  You would have to at least offer
> an option to check for your backup or to call it broken when you miss
> some WAL files on the server.

Oh, it will give you an ERROR if any of the required WAL files aren't
around anymore. So you will know that your backup is incomplete.

> The only other safe option I know about that's not a pg_streamrecv
> subprocess would be to require archiving for the duration of the base
> backup, but I think we agreed that it would be nice being able to bypass
> that.

No, the safe option I'm envisioning for this one is to just prevent
the server from recycling the logfiles until they have been sent off.
Witha cap so we don't prevent recycling forever if the client hangs.
This is the part I started working on, but didn't hav etime to finish
for the CF. I've not given up on it, it's just waiting for later.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Dimitri Fontaine

Date:

16 January 2011, 16:45:31

Magnus Hagander <magnus@hagander.net> writes:
>> What if you start a concurrent process that begins streaming the WAL
>> segments just before you start the backup, and you stop it after having
>> stopped the backup.  I would think that then, the local received files
>> would be complete.  We would only need a program able to stream the WAL
>> segments and build WAL files from that… do you know about one? :)
>
> Sure, if you stream the backups "on the side", then you don't need
> this feature. This is for "very simple filesystem backups", without
> the need to set up streaming of archiving.

What I meant is: why don't we provide an option to automate just that
behavior in pg_basebackup?  It looks like a fork() then calling code you
already wrote.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Include WAL in base backup

From

Magnus Hagander

Date:

16 January 2011, 17:16:23

On Sun, Jan 16, 2011 at 18:45, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>>> What if you start a concurrent process that begins streaming the WAL
>>> segments just before you start the backup, and you stop it after having
>>> stopped the backup.  I would think that then, the local received files
>>> would be complete.  We would only need a program able to stream the WAL
>>> segments and build WAL files from that… do you know about one? :)
>>
>> Sure, if you stream the backups "on the side", then you don't need
>> this feature. This is for "very simple filesystem backups", without
>> the need to set up streaming of archiving.
>
> What I meant is: why don't we provide an option to automate just that
> behavior in pg_basebackup?  It looks like a fork() then calling code you
> already wrote.

Ah, I see. That's a good idea.

However, it's not quite that simple. "just adding a fork()" doesn't
work on all our platforms, and you need to deal with feedback and such
between them as well.

I think it's definitely something worth doing in the long run, but I
think we should start with the simpler way.

Oh, and this might be the use-case for integrating the streaming log
code as well :-) But if we plan to do that, perhaps we should pick a
different name for the binary? Or maybe just share code with another
one later..

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Dimitri Fontaine

Date:

16 January 2011, 18:15:44

Magnus Hagander <magnus@hagander.net> writes:
> However, it's not quite that simple. "just adding a fork()" doesn't
> work on all our platforms, and you need to deal with feedback and such
> between them as well.

I'd think client-side, we have an existing implementation with the
parallel pg_restore option.  Don't know (yet) how easy it is to reuse
that code…

> Oh, and this might be the use-case for integrating the streaming log
> code as well :-) But if we plan to do that, perhaps we should pick a
> different name for the binary? Or maybe just share code with another
> one later..

You're talking about the pg_streamrecv binary?  Then yes, my idea about
it is that it should become the default archive client we ship with
PostgreSQL.  And grow into offering a way to be the default restore
command too.  I'd see the current way that the standby can switch
between restoring from archive and from a live stream as a first step
into that direction.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Include WAL in base backup

From

Magnus Hagander

Date:

17 January 2011, 08:59:25

On Sun, Jan 16, 2011 at 20:13, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> However, it's not quite that simple. "just adding a fork()" doesn't
>> work on all our platforms, and you need to deal with feedback and such
>> between them as well.
>
> I'd think client-side, we have an existing implementation with the
> parallel pg_restore option.  Don't know (yet) how easy it is to reuse
> that code…

True.

But however we do it, it will be significantly more complex than just
including the WAL. And I want to make sure we get *something* done in
time for 9.1, and then improve upon it. If we can get the improvement
into 9.1 that's great, but if not it will have to wait until 9.2 - and
I don't want to leave us without anything for 9.1.

>> Oh, and this might be the use-case for integrating the streaming log
>> code as well :-) But if we plan to do that, perhaps we should pick a
>> different name for the binary? Or maybe just share code with another
>> one later..
>
> You're talking about the pg_streamrecv binary?  Then yes, my idea about
> it is that it should become the default archive client we ship with
> PostgreSQL.  And grow into offering a way to be the default restore
> command too.  I'd see the current way that the standby can switch
> between restoring from archive and from a live stream as a first step
> into that direction.

Right. I did put both the base backup and the wal streaming in the
same binary earlier, and the only shared code was the one to connect
to the db. So I split them apart again. This is the reason to put them
back together perhaps - or just have the ability for pg_basebackup to
fork()+exec() the wal streamer.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Dimitri Fontaine

Date:

17 January 2011, 16:15:35

Magnus Hagander <magnus@hagander.net> writes:
> But however we do it, it will be significantly more complex than just
> including the WAL. And I want to make sure we get *something* done in
> time for 9.1, and then improve upon it. If we can get the improvement
> into 9.1 that's great, but if not it will have to wait until 9.2 - and
> I don't want to leave us without anything for 9.1.

+1.  The only point I'm not clear on is the complexity, and I trust you
to cut off at the right point here… meanwhile, I'm still asking for this
little more until you say your plate's full :)

> Right. I did put both the base backup and the wal streaming in the
> same binary earlier, and the only shared code was the one to connect
> to the db. So I split them apart again. This is the reason to put them
> back together perhaps - or just have the ability for pg_basebackup to
> fork()+exec() the wal streamer.

That would be awesome.

Then pg_streamrecv could somehow accept options that make it suitable
for use as an archive command, connecting to your (still?) third-party
daemon?  At this point it'd be pg_walsender :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Include WAL in base backup

From

Stephen Frost

Date:

20 January 2011, 03:03:16

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> For now, you need to set wal_keep_segments to make it work properly,
> but if you do the idea is that the tar file/stream generated in the
> base backup will include all the required WAL files.

Is there some reason to not ERROR outright if we're asked to provide WAL
and wal_keep_segments isn't set..?  I'd rather do that than only ERROR
when a particular WAL is missing..  That could lead to transient backup
errors that an inexperienced sysadmin or DBA might miss or ignore.
They'll notice if it doesn't work the first time they try it and spits
out a hint about wal_keep_segments.

> That means that
> you can start a postmaster directly against the directory extracted
> from the tarball, and you no longer need to set up archive logging to
> get a backup.

Like that part.

> I've got some refactoring I want to do around the
> SendBackupDirectory() function after this, but a review of the
> functionality first would be good. And obviously, documentation is
> still necessary.

mkay, I'm not going to try to make this ready for committer, but will
provide my comments on it overall.  Bit difficult to review when someone
else is reviewing the base patch too. :/

Here goes:

- I'm not a huge fan of the whole 'closetar' option, that feels really rather wrong to me.  Why not just open it and
closeit in perform_base_backup(), unconditionally?
 

- I wonder if you're not getting to a level where you shold be using a struct to pass the relevant information to
perform_base_backup()instead of adding more arguments on..  That's going to get unwieldy at some point.
 

- Why not initialize logid and logseg like so?:
int logid = startptr.xlogid;int logseg = startptr.xrecoff / XLogSegSize;
 Then use those in your elog?  Seems cleaner to me.

- A #define right in the middle of a while loop...?  Really?

- The grammar changes strike me as..  odd.  Typically, you would have an 'option' production that you can then have a
listof and then let each option be whatever the OR'd set of options is.  Wouldn't the current grammar require that you
putthe options in a specific order?  That'd blow.
 

@@ -687,7 +690,7 @@ BaseBackup()         * once since it can be relocated, and it will be checked before we do
*anything anyway.         */
 
-        if (basedir != NULL && i > 0)
+        if (basedir != NULL && !PQgetisnull(res, i, 1))            verify_dir_is_empty_or_create(PQgetvalue(res, i,
1));   }
 

- Should the 'i > 0' conditional above still be there..?

So, that's my review from just reading the source code and the thread..
Hope it's useful, sorry it's not more. :/
Thanks,
    Stephen

Re: Include WAL in base backup

From

Stephen Frost

Date:

20 January 2011, 03:06:34

Magnus,

* Stephen Frost (sfrost@snowman.net) wrote:
> mkay, I'm not going to try to make this ready for committer, but will
> provide my comments on it overall.  Bit difficult to review when someone
> else is reviewing the base patch too. :/

I went ahead and marked it as 'waiting on author', since I'd like
feedback on my comments, but I'll try to make time in the next few days
to apply the two patches and test it out, unless I hear otherwise.
Thanks,
    Stephen

Re: Include WAL in base backup

From

Magnus Hagander

Date:

20 January 2011, 11:41:54

On Thu, Jan 20, 2011 at 05:03, Stephen Frost <sfrost@snowman.net> wrote:
> Greetings,
>
> * Magnus Hagander (magnus@hagander.net) wrote:
>> For now, you need to set wal_keep_segments to make it work properly,
>> but if you do the idea is that the tar file/stream generated in the
>> base backup will include all the required WAL files.
>
> Is there some reason to not ERROR outright if we're asked to provide WAL
> and wal_keep_segments isn't set..?  I'd rather do that than only ERROR
> when a particular WAL is missing..  That could lead to transient backup
> errors that an inexperienced sysadmin or DBA might miss or ignore.
> They'll notice if it doesn't work the first time they try it and spits
> out a hint about wal_keep_segments.

Well, in a "smaller:ish" database you can easily do the full backup
before you run out of segments in the data directory even when you
haven't set wal_keep_segments. If we error out, we force extra work on
the user in the trivial case.

I'd rather not change that, but instead (as Fujii-san has also
mentioned is needed anyway) put some more effort into documenting in
which cases you need to set it.

>> I've got some refactoring I want to do around the
>> SendBackupDirectory() function after this, but a review of the
>> functionality first would be good. And obviously, documentation is
>> still necessary.
>
> mkay, I'm not going to try to make this ready for committer, but will
> provide my comments on it overall.  Bit difficult to review when someone
> else is reviewing the base patch too. :/

Heh, yeah.

> Here goes:
>
> - I'm not a huge fan of the whole 'closetar' option, that feels really
>  rather wrong to me.  Why not just open it and close it in
>  perform_base_backup(), unconditionally?

Yeah, we could move the whole thing up there. Or, as I mentioned in an
IM conversation with Heikki, just get rid of SendBackupDirectory()
completely and inline it inside the loop in perform_base_backup().
Given that it's basically just 5 lines + a call to sendDir()..

> - I wonder if you're not getting to a level where you shold be using a
>  struct to pass the relevant information to perform_base_backup()
>  instead of adding more arguments on..  That's going to get unwieldy at
>  some point.

Yeah, probably.

We *could* pass the BaseBackupCmd struct from the parser all the way
in - or is that cheating too much on abstractions? It seems if we
don't, we're just going to hav ea copy of that struct without the
NodeTag member..

> - Why not initialize logid and logseg like so?:
>
>        int logid = startptr.xlogid;
>        int logseg = startptr.xrecoff / XLogSegSize;
>
>  Then use those in your elog?  Seems cleaner to me.

Hmm. Yes. Agreed.

> - A #define right in the middle of a while loop...?  Really?

Haha, yeah, that was a typo. I didn't remember the name of the
variable so I stuck it there for testing and forgot it. It should be
ThisTimeLineID, and no #define at all.

> - The grammar changes strike me as..  odd.  Typically, you would have an
>  'option' production that you can then have a list of and then let each
>  option be whatever the OR'd set of options is.  Wouldn't the current
>  grammar require that you put the options in a specific order?  That'd
>  blow.

It does require them in a specific order. The advantage is that it
makes the code easier. and it's not like end users are expected to run
them anyway...

Now, I'm no bison export, so it might be an easy fix. But the way I
could figure, to make them order indepdent I have to basically collect
them up together as a List instead of just as a struct, and then loop
through that list to build a struct later.

If someone who knows Bison better can tell me a neater way to do that,
I'll be happy to change :-)

> @@ -687,7 +690,7 @@ BaseBackup()
>                 * once since it can be relocated, and it will be checked before we do
>                 * anything anyway.
>                 */
> -               if (basedir != NULL && i > 0)
> +               if (basedir != NULL && !PQgetisnull(res, i, 1))
>                        verify_dir_is_empty_or_create(PQgetvalue(res, i, 1));
>        }
>
> - Should the 'i > 0' conditional above still be there..?

No. That's a cheat-check that assumes the base directory is always
sent first. Which is not true anymore - with this patch we always send
it *last* so we can include the WAL in it.

> So, that's my review from just reading the source code and the thread..
> Hope it's useful, sorry it's not more. :/

Thanks - it certainly is!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Tom Lane

Date:

20 January 2011, 14:28:34

Magnus Hagander <magnus@hagander.net> writes:
>> - Why not initialize logid and logseg like so?:
>> 
>> � � � �int logid = startptr.xlogid;
>> � � � �int logseg = startptr.xrecoff / XLogSegSize;
>> 
>> �Then use those in your elog? �Seems cleaner to me.

> Hmm. Yes. Agreed.

Marginal complaint here: int is the wrong type, I'm pretty sure.
        regards, tom lane

Re: Include WAL in base backup

From

Fujii Masao

Date:

24 January 2011, 06:51:49

On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>>> - Why not initialize logid and logseg like so?:
>>>
>>>        int logid = startptr.xlogid;
>>>        int logseg = startptr.xrecoff / XLogSegSize;
>>>
>>>  Then use those in your elog?  Seems cleaner to me.
>
>> Hmm. Yes. Agreed.
>
> Marginal complaint here: int is the wrong type, I'm pretty sure.

And, we should use XLByteToPrevSeg here instead of just =, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Magnus Hagander

Date:

24 January 2011, 06:52:05

On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>>>> - Why not initialize logid and logseg like so?:
>>>>
>>>>        int logid = startptr.xlogid;
>>>>        int logseg = startptr.xrecoff / XLogSegSize;
>>>>
>>>>  Then use those in your elog?  Seems cleaner to me.
>>
>>> Hmm. Yes. Agreed.
>>
>> Marginal complaint here: int is the wrong type, I'm pretty sure.
>
> And, we should use XLByteToPrevSeg here instead of just =, I think.

Not XLByteToSeg?

(I admit I missed the existance of both those macros, though)

I plan to post a rebased version of this patch today or tomorrow, btw,
that should work against all the changes that have been applied to
git.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Fujii Masao

Date:

24 January 2011, 07:03:40

On Mon, Jan 24, 2011 at 4:47 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Magnus Hagander <magnus@hagander.net> writes:
>>>>> - Why not initialize logid and logseg like so?:
>>>>>
>>>>>        int logid = startptr.xlogid;
>>>>>        int logseg = startptr.xrecoff / XLogSegSize;
>>>>>
>>>>>  Then use those in your elog?  Seems cleaner to me.
>>>
>>>> Hmm. Yes. Agreed.
>>>
>>> Marginal complaint here: int is the wrong type, I'm pretty sure.
>>
>> And, we should use XLByteToPrevSeg here instead of just =, I think.
>
> Not XLByteToSeg?

Checking... yeah, you are right. We should use XLByteToSeg since
the REDO starting WAL record doesn't exist in the previous WAL
segment when the REDO starting location is a boundary byte.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Magnus Hagander

Date:

24 January 2011, 14:36:51

On Mon, Jan 24, 2011 at 09:03, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 24, 2011 at 4:47 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Mon, Jan 24, 2011 at 08:45, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> On Fri, Jan 21, 2011 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Magnus Hagander <magnus@hagander.net> writes:
>>>>>> - Why not initialize logid and logseg like so?:
>>>>>>
>>>>>>        int logid = startptr.xlogid;
>>>>>>        int logseg = startptr.xrecoff / XLogSegSize;
>>>>>>
>>>>>>  Then use those in your elog?  Seems cleaner to me.
>>>>
>>>>> Hmm. Yes. Agreed.
>>>>
>>>> Marginal complaint here: int is the wrong type, I'm pretty sure.
>>>
>>> And, we should use XLByteToPrevSeg here instead of just =, I think.
>>
>> Not XLByteToSeg?
>
> Checking... yeah, you are right. We should use XLByteToSeg since
> the REDO starting WAL record doesn't exist in the previous WAL
> segment when the REDO starting location is a boundary byte.

Here's an updated version of the patch:

* rebased on the current git master (including changing the switch
from -w to -x since -w was used now)
* includes some documentation
* use XLogByteToSeg and uint32 as mentioned here
* refactored to remove SendBackupDirectory(), removing the ugly closetar boolean

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

basebackup_wal.patch

Re: Include WAL in base backup

From

Fujii Masao

Date:

25 January 2011, 08:56:28

On Tue, Jan 25, 2011 at 12:33 AM, Magnus Hagander <magnus@hagander.net> wrote:
> Here's an updated version of the patch:
>
> * rebased on the current git master (including changing the switch
> from -w to -x since -w was used now)
> * includes some documentation
> * use XLogByteToSeg and uint32 as mentioned here
> * refactored to remove SendBackupDirectory(), removing the ugly closetar boolean

I reviewed the patch. Here are comments.

+        {"xlog", no_argument, NULL, 'w'},

Typo: s/w/x

/** BASE_BACKUP [LABEL <label>] [PROGRESS] [FAST]*/

In repl_gram.y, the above comment needs to be updated.

Both this patch and the multiple-backup one removes SendBackupDirectory
in the almost same way. So, how about committing that common part first?

+        XLByteToSeg(endptr, endlogid, endlogseg);

XLByteToPrevSeg should be used for the backup end location. Because
when it's a boundary byte, all the required WAL records exist in the
previous WAL segment. This is why pg_stop_backup uses XLByteToPrevSeg
for the backup end location.

+        elog(DEBUG1, "Going to send wal from %i.%i to %i.%i",
+             logid, logseg,
+             endlogid, endlogseg);

%u should be used instead of %i. Or how about logging the filename?

+            char    xlogname[64];

How about using MAXFNAMELEN instead of 64?

+            XLogFileName(xlogname, ThisTimeLineID, logid, logseg);
+            sprintf(fn, "./pg_xlog/%s", xlogname);

How about using XLogFilePath? instead?

+            if (logid > endptr.xlogid ||
+                (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize))

Why don't you use endlogseg?

The estimated size of $PGDATA doesn't include WAL files, but the
actual one does.
This inconsistency messes up the progress report as follows:
   33708/17323 kB (194%) 1/1 tablespaces

So, what about sparating the progress report into two parts: one for $PGDATA and
tablespaces, another for WAL files?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Magnus Hagander

Date:

25 January 2011, 10:21:19

On Tue, Jan 25, 2011 at 10:56, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 25, 2011 at 12:33 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> Here's an updated version of the patch:
>>
>> * rebased on the current git master (including changing the switch
>> from -w to -x since -w was used now)
>> * includes some documentation
>> * use XLogByteToSeg and uint32 as mentioned here
>> * refactored to remove SendBackupDirectory(), removing the ugly closetar boolean
>
> I reviewed the patch. Here are comments.
>
>
> +               {"xlog", no_argument, NULL, 'w'},
>
> Typo: s/w/x

Ah, oops. thanks.

> /*
>  * BASE_BACKUP [LABEL <label>] [PROGRESS] [FAST]
>  */
>
> In repl_gram.y, the above comment needs to be updated.

Same here - oops, thanks. It was also missing the quotes around <label>.

> Both this patch and the multiple-backup one removes SendBackupDirectory
> in the almost same way. So, how about committing that common part first?

I think they're small enough that we'll just commit it along with
whichever comes first and then have the other one merge with it.

> +               XLByteToSeg(endptr, endlogid, endlogseg);
>
> XLByteToPrevSeg should be used for the backup end location. Because
> when it's a boundary byte, all the required WAL records exist in the
> previous WAL segment. This is why pg_stop_backup uses XLByteToPrevSeg
> for the backup end location.

Well, it's just for debugging output, really, but see below.

> +               elog(DEBUG1, "Going to send wal from %i.%i to %i.%i",
> +                        logid, logseg,
> +                        endlogid, endlogseg);
>
> %u should be used instead of %i. Or how about logging the filename?

I actually plan to remove that DEBUG output completely (sorry, forgot
to mention that), I'm not sure it's useful to have it around once we
apply. Or do you think it would be useful to keep?

> +                       char    xlogname[64];
>
> How about using MAXFNAMELEN instead of 64?
>

Agreed.

> +                       XLogFileName(xlogname, ThisTimeLineID, logid, logseg);
> +                       sprintf(fn, "./pg_xlog/%s", xlogname);
>
> How about using XLogFilePath? instead?

Can't do that, because we need the "./" part when we call sendFile() -
it's structured around having that one, since we get it from the file
name looping.

> +                       if (logid > endptr.xlogid ||
> +                               (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize))
>
> Why don't you use endlogseg?

Honestly? Because I thought I added endlogseg just for the debugging
output, didn't realize I had it down here.

But if I do, then I should not use the XLByteToPrevSeg() per your
suggestion above, right? Keep it the way it is.

> The estimated size of $PGDATA doesn't include WAL files, but the
> actual one does.
> This inconsistency messes up the progress report as follows:
>
>    33708/17323 kB (194%) 1/1 tablespaces

Yeah, that is definitely a potential problem. I think we're just going
to have to document it - and in a big database, it's not going to be
quite as bad...

> So, what about sparating the progress report into two parts: one for $PGDATA and
> tablespaces, another for WAL files?

We can't really do that. We need to send the progress report before we
begin the tar file, and we don't know how many xlog segments we will
have at that time. And we don't want to send pg_xlog as a separate tar
stream, because if we do we won't be able to get the single-tar-output
in the simple case - thus no piping etc. I actually had the xlog data
as it's own tar stream in the beginning, but Heikki convinced me of
the advantage of keeping it in the main one...

What we could do, I guess, is to code pg_basebackup to detect when it
starts receiving xlog files and then stop increasing the percentage at
that point, instead just doing a counter?

(the discussed changse above have been applied and pushed to github)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Fujii Masao

Date:

25 January 2011, 12:29:48

On Tue, Jan 25, 2011 at 8:21 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> +               elog(DEBUG1, "Going to send wal from %i.%i to %i.%i",
>> +                        logid, logseg,
>> +                        endlogid, endlogseg);
>>
>> %u should be used instead of %i. Or how about logging the filename?
>
> I actually plan to remove that DEBUG output completely (sorry, forgot
> to mention that), I'm not sure it's useful to have it around once we
> apply. Or do you think it would be useful to keep?

Nope. I'm OK to remove that.

>> +                       XLogFileName(xlogname, ThisTimeLineID, logid, logseg);
>> +                       sprintf(fn, "./pg_xlog/%s", xlogname);
>>
>> How about using XLogFilePath? instead?
>
> Can't do that, because we need the "./" part when we call sendFile() -
> it's structured around having that one, since we get it from the file
> name looping.

Understood.

>> +                       if (logid > endptr.xlogid ||
>> +                               (logid == endptr.xlogid && logseg >= endptr.xrecoff / XLogSegSize))
>>
>> Why don't you use endlogseg?
>
> Honestly? Because I thought I added endlogseg just for the debugging
> output, didn't realize I had it down here.
>
> But if I do, then I should not use the XLByteToPrevSeg() per your
> suggestion above, right? Keep it the way it is.

Yes. If we check "logseg >= endlogseg", we should use XLByteToSeg.

>> So, what about sparating the progress report into two parts: one for $PGDATA and
>> tablespaces, another for WAL files?
>
> We can't really do that. We need to send the progress report before we
> begin the tar file, and we don't know how many xlog segments we will
> have at that time. And we don't want to send pg_xlog as a separate tar
> stream, because if we do we won't be able to get the single-tar-output
> in the simple case - thus no piping etc. I actually had the xlog data
> as it's own tar stream in the beginning, but Heikki convinced me of
> the advantage of keeping it in the main one...
>
> What we could do, I guess, is to code pg_basebackup to detect when it
> starts receiving xlog files and then stop increasing the percentage at
> that point, instead just doing a counter?

Umm.. not progressing the report during receiving WAL files seems
also odd. But I don't have another good idea... For now, I'm OK if the
behavior of the progress report is well documented.

> (the discussed changse above have been applied and pushed to github)

Thanks! I'll test and review that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Fujii Masao

Date:

25 January 2011, 13:04:24

On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> (the discussed changse above have been applied and pushed to github)
>
> Thanks! I'll test and review that.

WAL file might get recycled or removed while walsender is reading it.
So the WAL file which pg_basebackup seemingly received successfully
might be actually invalid. Shouldn't we need to check that what we read
is valid as XLogRead does?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Magnus Hagander

Date:

25 January 2011, 14:35:05

On Tue, Jan 25, 2011 at 15:04, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> (the discussed changse above have been applied and pushed to github)
>>
>> Thanks! I'll test and review that.
>
> WAL file might get recycled or removed while walsender is reading it.
> So the WAL file which pg_basebackup seemingly received successfully
> might be actually invalid. Shouldn't we need to check that what we read
> is valid as XLogRead does?

Yikes. Yes, you are correct. We do catch the case when it is removed,
but not when it's recycled.

I'll need to look at that. Not sure if I'll have time today, but I'll
do it as soon as I can :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Magnus Hagander

Date:

25 January 2011, 19:17:25

On Tue, Jan 25, 2011 at 16:34, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Jan 25, 2011 at 15:04, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Tue, Jan 25, 2011 at 10:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>>> (the discussed changse above have been applied and pushed to github)
>>>
>>> Thanks! I'll test and review that.
>>
>> WAL file might get recycled or removed while walsender is reading it.
>> So the WAL file which pg_basebackup seemingly received successfully
>> might be actually invalid. Shouldn't we need to check that what we read
>> is valid as XLogRead does?

We should, and the easiest way is to actually use XLogRead() since the
code is already there. How about the way in this patch?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment

basebackup_wal.patch

Re: Include WAL in base backup

From

Fujii Masao

Date:

27 January 2011, 03:44:30

On Wed, Jan 26, 2011 at 5:17 AM, Magnus Hagander <magnus@hagander.net> wrote:
> We should, and the easiest way is to actually use XLogRead() since the
> code is already there. How about the way in this patch?

Thanks for the update. I reread the patch.

+        MemSet(&statbuf, 0, sizeof(statbuf));
+        statbuf.st_mode=S_IRUSR | S_IWUSR;
+#ifndef WIN32
+        statbuf.st_uid=getuid();
+        statbuf.st_gid=getgid();
+#endif
+        statbuf.st_size=XLogSegSize;
+        statbuf.st_mtime=time(NULL);

Can you put a space around "="?

In the multiple-backups patch, Heikki uses geteuid and getegid for
the same purpose instead of getuid and getgid, as follows.

!     /* Windows doesn't have the concept of uid and gid */
! #ifdef WIN32
!     statbuf.st_uid = 0;
!     statbuf.st_gid = 0;
! #else
!     statbuf.st_uid = geteuid();
!     statbuf.st_gid = getegid();
! #endif
!     statbuf.st_mtime = time(NULL);
!     statbuf.st_mode = S_IRUSR | S_IWUSR;
!     statbuf.st_size = len;

Which is correct? Since we cannot start the PostgreSQL when
getuid != geteuid, I don't think that there is really difference between
getuid and geteuid. But other code always uses geteuid, so I think
that geteuid should be used here instead of getuid for the sake of
consistency.

OTOH, I'm not sure if there is really difference between getgid and
getegid in the backend (though I guess getgid == getegid because
getuid == geteuid), and which should be used here.
What is your opinion?

+            XLogFileName(xlogname, ThisTimeLineID, logid, logseg);
+
+            sprintf(fn, "./pg_xlog/%s", xlogname);
+            _tarWriteHeader(fn, NULL, &statbuf);

Can we use XLogFilePath? instead? Because sendFile has not been
used.

+        XLByteToSeg(endptr, endlogid, endlogseg);
<snip>
+            /* Have we reached our stop position yet? */
+            if (logid > endlogid ||
+                (logid == endlogid && logseg >= endlogseg))
+                break;

What I said in upthread is wrong. We should use XLByteToPrevSeg
for endptr and check "logseg > endlogseg". Otherwise, if endptr is
not a boundary byte, endlogid/endlogseg indicates the last
necessary WAL file, but it's not sent.

+    if (percent > 100)
+        percent = 100;

I'm not sure if this is good idea because the total received can go
further than the estimated size though the percentage stops at 100.
This looks more confusing than the previous behavior. Anyway,
I think that we need to document about the combination of -P and
-x in pg_basebackup.

In pg_basebackup.sgml    <term><option>--checkpoint <replaceable
class="parameter">fast|spread</replaceable></option></term>

Though this is not the problem of the patch, I found the inconsistency
of the descriptions about options of pg_basebackup. We should
s/--checkpoint/--checkpoint=

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Heikki Linnakangas

Date:

28 January 2011, 20:02:26

On 27.01.2011 06:44, Fujii Masao wrote:
> +        XLByteToSeg(endptr, endlogid, endlogseg);
> <snip>
> +            /* Have we reached our stop position yet? */
> +            if (logid>  endlogid ||
> +                (logid == endlogid&&  logseg>= endlogseg))
> +                break;
>
> What I said in upthread is wrong. We should use XLByteToPrevSeg
> for endptr and check "logseg>  endlogseg". Otherwise, if endptr is
> not a boundary byte, endlogid/endlogseg indicates the last
> necessary WAL file, but it's not sent.

We should use XLByteToPrevSeg, but I believe >= is still correct. 
logid/logseg is the last WAL segment we've successfully sent, and 
endlogif/endlogid is the last WAL segment we need to send. When they are 
the same, we're done.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Include WAL in base backup

From

Fujii Masao

Date:

29 January 2011, 06:10:29

On Sat, Jan 29, 2011 at 6:02 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 27.01.2011 06:44, Fujii Masao wrote:
>>
>> +               XLByteToSeg(endptr, endlogid, endlogseg);
>> <snip>
>> +                       /* Have we reached our stop position yet? */
>> +                       if (logid>  endlogid ||
>> +                               (logid == endlogid&&  logseg>= endlogseg))
>> +                               break;
>>
>> What I said in upthread is wrong. We should use XLByteToPrevSeg
>> for endptr and check "logseg>  endlogseg". Otherwise, if endptr is
>> not a boundary byte, endlogid/endlogseg indicates the last
>> necessary WAL file, but it's not sent.
>
> We should use XLByteToPrevSeg, but I believe >= is still correct.
> logid/logseg is the last WAL segment we've successfully sent, and
> endlogif/endlogid is the last WAL segment we need to send. When they are the
> same, we're done.

Really? logid/logseg is incremented just before the check as follows.
So, when they are the same, the WAL file which logid/logseg indicates
has not been sent yet. Am I missing something?

+            /* Advance to the next WAL file */
+            NextLogSeg(logid, logseg);
+
+            /* Have we reached our stop position yet? */
+            if (logid > endlogid ||
+                (logid == endlogid && logseg >= endlogseg))
+                break;

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Heikki Linnakangas

Date:

30 January 2011, 17:58:44

On 29.01.2011 09:10, Fujii Masao wrote:
> On Sat, Jan 29, 2011 at 6:02 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> On 27.01.2011 06:44, Fujii Masao wrote:
>>>
>>> +               XLByteToSeg(endptr, endlogid, endlogseg);
>>> <snip>
>>> +                       /* Have we reached our stop position yet? */
>>> +                       if (logid>    endlogid ||
>>> +                               (logid == endlogid&&    logseg>= endlogseg))
>>> +                               break;
>>>
>>> What I said in upthread is wrong. We should use XLByteToPrevSeg
>>> for endptr and check "logseg>    endlogseg". Otherwise, if endptr is
>>> not a boundary byte, endlogid/endlogseg indicates the last
>>> necessary WAL file, but it's not sent.
>>
>> We should use XLByteToPrevSeg, but I believe>= is still correct.
>> logid/logseg is the last WAL segment we've successfully sent, and
>> endlogif/endlogid is the last WAL segment we need to send. When they are the
>> same, we're done.
>
> Really? logid/logseg is incremented just before the check as follows.
> So, when they are the same, the WAL file which logid/logseg indicates
> has not been sent yet. Am I missing something?
>
> +            /* Advance to the next WAL file */
> +            NextLogSeg(logid, logseg);
> +
> +            /* Have we reached our stop position yet? */
> +            if (logid>  endlogid ||
> +                (logid == endlogid&&  logseg>= endlogseg))
> +                break;

Ah, you're right, I misread it. Never mind..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Include WAL in base backup

From

Magnus Hagander

Date:

30 January 2011, 18:22:11

On Thu, Jan 27, 2011 at 05:44, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Jan 26, 2011 at 5:17 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> We should, and the easiest way is to actually use XLogRead() since the
>> code is already there. How about the way in this patch?
>
> Thanks for the update. I reread the patch.
>
> +               MemSet(&statbuf, 0, sizeof(statbuf));
> +               statbuf.st_mode=S_IRUSR | S_IWUSR;
> +#ifndef WIN32
> +               statbuf.st_uid=getuid();
> +               statbuf.st_gid=getgid();
> +#endif
> +               statbuf.st_size=XLogSegSize;
> +               statbuf.st_mtime=time(NULL);
>
> Can you put a space around "="?

I'll run pgindent, it'll take care of that.


> Which is correct? Since we cannot start the PostgreSQL when
> getuid != geteuid, I don't think that there is really difference between
> getuid and geteuid. But other code always uses geteuid, so I think
> that geteuid should be used here instead of getuid for the sake of
> consistency.

Yeah, changed for consistency.


> +                       XLogFileName(xlogname, ThisTimeLineID, logid, logseg);
> +
> +                       sprintf(fn, "./pg_xlog/%s", xlogname);
> +                       _tarWriteHeader(fn, NULL, &statbuf);
>
> Can we use XLogFilePath? instead? Because sendFile has not been
> used.

We can now, changed.

> What I said in upthread is wrong. We should use XLByteToPrevSeg
> for endptr and check "logseg > endlogseg". Otherwise, if endptr is
> not a boundary byte, endlogid/endlogseg indicates the last
> necessary WAL file, but it's not sent.

Yeah, thanks for this - and thanks to Heikki for straightening it out for me.


> +       if (percent > 100)
> +               percent = 100;
>
> I'm not sure if this is good idea because the total received can go
> further than the estimated size though the percentage stops at 100.
> This looks more confusing than the previous behavior. Anyway,
> I think that we need to document about the combination of -P and
> -x in pg_basebackup.

I found it less confusing - but still somewhat confusing. I'll add some docs.


> In pg_basebackup.sgml
>     <term><option>--checkpoint <replaceable
> class="parameter">fast|spread</replaceable></option></term>
>
> Though this is not the problem of the patch, I found the inconsistency
> of the descriptions about options of pg_basebackup. We should
> s/--checkpoint/--checkpoint=

Agreed, changed.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Include WAL in base backup

From

Fujii Masao

Date:

31 January 2011, 01:41:46

On Mon, Jan 31, 2011 at 4:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
>> In pg_basebackup.sgml
>>     <term><option>--checkpoint <replaceable
>> class="parameter">fast|spread</replaceable></option></term>
>>
>> Though this is not the problem of the patch, I found the inconsistency
>> of the descriptions about options of pg_basebackup. We should
>> s/--checkpoint/--checkpoint=
>
> Agreed, changed.

Thanks for all your efforts! pg_basebackup and this new option are VERY useful.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Include WAL in base backup

From

Robert Haas

Date:

31 January 2011, 01:47:43

On Sun, Jan 30, 2011 at 9:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Jan 31, 2011 at 4:22 AM, Magnus Hagander <magnus@hagander.net> wrote:
>>> In pg_basebackup.sgml
>>>     <term><option>--checkpoint <replaceable
>>> class="parameter">fast|spread</replaceable></option></term>
>>>
>>> Though this is not the problem of the patch, I found the inconsistency
>>> of the descriptions about options of pg_basebackup. We should
>>> s/--checkpoint/--checkpoint=
>>
>> Agreed, changed.
>
> Thanks for all your efforts! pg_basebackup and this new option are VERY useful.

+1!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company