Thread: Using streaming replication as log archiving

Using streaming replication as log archiving

From
Magnus Hagander
Date:
As has been previously mentioned a couple of times, it should be
perfectly possible to use streaming replication to get around the
limitations of archive_command/archive_timeout to do log archiving for
PITR (being that you either keep archive_timeout high and risk data
loss or you set it very low and generate a huge log archive without
need).

I've put together a tool to do this. The basic idea is to just stream
down replication and write it to regular WAL files, which can then be
used for recovery. You'll still need to use archive_command together
with it to ensure that the backups are complete. Streaming replication
doesn't guarantee that - in fact, regular replication will fallback to
using whatever archive_command created when wal_keep_segments isn't
enough.

I've put up an early version of the tool at
http://github.com/mhagander/pg_streamrecv

Comments and contributions are most welcome. And frankly, a good
review is very much required before I'd trust it ;) Hopefully, I
didn't overlook something critical :D

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
Fujii Masao
Date:
On Mon, Sep 27, 2010 at 9:07 PM, Magnus Hagander <magnus@hagander.net> wrote:
> As has been previously mentioned a couple of times, it should be
> perfectly possible to use streaming replication to get around the
> limitations of archive_command/archive_timeout to do log archiving for
> PITR (being that you either keep archive_timeout high and risk data
> loss or you set it very low and generate a huge log archive without
> need).
>
> I've put together a tool to do this. The basic idea is to just stream
> down replication and write it to regular WAL files, which can then be
> used for recovery. You'll still need to use archive_command together
> with it to ensure that the backups are complete. Streaming replication
> doesn't guarantee that - in fact, regular replication will fallback to
> using whatever archive_command created when wal_keep_segments isn't
> enough.
>
> I've put up an early version of the tool at
> http://github.com/mhagander/pg_streamrecv

Great! This also might be useful for users who want something like
Oracle redo log mirroring.

> Comments and contributions are most welcome. And frankly, a good
> review is very much required before I'd trust it ;) Hopefully, I
> didn't overlook something critical :D

When I ran that, the size of the WAL file in inprogress directory
became more than 16MB. Obviously something isn't right.

When I requested immediate shutdown to the master, segmentation
fault occurred in pg_streamrecv. I guess that the return value 0
of PQgetCopyData would not be handled correctly.

After I repeated Ctrl+C and start of pg_streamrecv some times,
I encountered the following error and pg_streamrecv was never up.
Is this intentional?
   In progress directory contains more than one file!
   $ ls foo/inprogress/   00000001000000000000000D  00000001000000000000000D.save

When there is inprogress or archived WAL file, pg_streamrecv should
not execute pg_current_xlog_location because that result is not used?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Tue, Sep 28, 2010 at 06:25, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Mon, Sep 27, 2010 at 9:07 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> As has been previously mentioned a couple of times, it should be
>> perfectly possible to use streaming replication to get around the
>> limitations of archive_command/archive_timeout to do log archiving for
>> PITR (being that you either keep archive_timeout high and risk data
>> loss or you set it very low and generate a huge log archive without
>> need).
>>
>> I've put together a tool to do this. The basic idea is to just stream
>> down replication and write it to regular WAL files, which can then be
>> used for recovery. You'll still need to use archive_command together
>> with it to ensure that the backups are complete. Streaming replication
>> doesn't guarantee that - in fact, regular replication will fallback to
>> using whatever archive_command created when wal_keep_segments isn't
>> enough.
>>
>> I've put up an early version of the tool at
>> http://github.com/mhagander/pg_streamrecv
>
> Great! This also might be useful for users who want something like
> Oracle redo log mirroring.

Thanks.

>> Comments and contributions are most welcome. And frankly, a good
>> review is very much required before I'd trust it ;) Hopefully, I
>> didn't overlook something critical :D
>
> When I ran that, the size of the WAL file in inprogress directory
> became more than 16MB. Obviously something isn't right.

Wow, that's weird. I'm unable to reproduce that here - can you try to
figure out why that happened?


> When I requested immediate shutdown to the master, segmentation
> fault occurred in pg_streamrecv. I guess that the return value 0
> of PQgetCopyData would not be handled correctly.

Almost right - it actually returns -2 - which isn't handled. I've
added a fix for that - and while att it, covering anything that's so
small it doesn't contain the streaming replication header.


> After I repeated Ctrl+C and start of pg_streamrecv some times,
> I encountered the following error and pg_streamrecv was never up.
> Is this intentional?
>
>    In progress directory contains more than one file!
>
>    $ ls foo/inprogress/
>    00000001000000000000000D  00000001000000000000000D.save

Yes, that is actually intentional.

When it finds the ..0D file there the first time, it gets renamed to
".save", and it retries the transmission from the beginning of that
segment. as soon as the retransmission has passed the point that 0D
was at, the .save file is removed. If you Ctrl-C the process again
*before* it has reached that point, it will leave both files around -
it's up to you to clean them up. This is to make sure we don't
overwrite a file that contains more log data than is currently
available on the master.


> When there is inprogress or archived WAL file, pg_streamrecv should
> not execute pg_current_xlog_location because that result is not used?

Yeah, that's just a bit of lazy programming that I should fix :-)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
Fujii Masao
Date:
On Tue, Sep 28, 2010 at 5:23 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> When I ran that, the size of the WAL file in inprogress directory
>> became more than 16MB. Obviously something isn't right.
>
> Wow, that's weird. I'm unable to reproduce that here - can you try to
> figure out why that happened?

Sorry, I overlooked the single-digit figure in the result of "ls -l".
To be exact, the size of the WAL file in inprogress directory can be
less than 16MB. Here is the result of "ls -l inprogress".

$ ls -l inprogress/
total 1724
-rw-rw-r-- 1 postgres postgres 1757352 Sep 29 12:03 000000010000000000000003

This also would be problem since the WAL file smaller than 16MB cannot
be used for recovery. I think that pg_streamrecv should create 16MB
file with zero at first, and write the received WAL records in that, as
walreceiver does.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Wed, Sep 29, 2010 at 05:40, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Tue, Sep 28, 2010 at 5:23 PM, Magnus Hagander <magnus@hagander.net> wrote:
>>> When I ran that, the size of the WAL file in inprogress directory
>>> became more than 16MB. Obviously something isn't right.
>>
>> Wow, that's weird. I'm unable to reproduce that here - can you try to
>> figure out why that happened?
>
> Sorry, I overlooked the single-digit figure in the result of "ls -l".

Aha, that explains it.

> To be exact, the size of the WAL file in inprogress directory can be
> less than 16MB. Here is the result of "ls -l inprogress".
>
> $ ls -l inprogress/
> total 1724
> -rw-rw-r-- 1 postgres postgres 1757352 Sep 29 12:03 000000010000000000000003
>
> This also would be problem since the WAL file smaller than 16MB cannot
> be used for recovery. I think that pg_streamrecv should create 16MB
> file with zero at first, and write the received WAL records in that, as
> walreceiver does.

It's actually intentional. If we create a file at first, there is no
way to figure out exactly how far through a partial segment we are
without parsing the details of the log. This is useful both for the
admin (who can look at the directory and watch the file grow) and the
tool itself (to know when the .save file can be rotated away, when
recovering from a partial segment receive).

My idea was to just have the admin pad the file when it's time to do
the restore. I could perhaps even add an option to the tool to do it -
the idea being it's a manual step still.

Do you have another suggestion for how to provide those two things?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
"Kevin Grittner"
Date:
Magnus Hagander <magnus@hagander.net> wrote:
> Comments and contributions are most welcome.
This is probably too esoteric to be worked on yet, but for this to
be useful for us we would need to pass the resulting files through
pg_clearxlogtail and gzip in an automated fashion.  And we would
need to do regular log file archiving in parallel with it.
As background, our databases around the state archive to a directory
which is then pushed via rsync to a "dumb" backup location in the
same room as the database server (we're lucky to have rsync on the
target of this copy; any other executable is out of the question),
and the same directory is pulled via rsync to a central location. 
We would be interested in using streaming replication to a tool such
as you describe for the copy to the central location, but since we
would still be forcing a wal-file switch once per hour we would need
the current capability to shrink an "empty" file from 16MB to 16kB
using the above-mentioned tools.
Also, a the ability to limit bandwidth would be a nice feature for
us, preferably in a way which could be changed on the fly.
If you could keep the development "friendly" to such features, I may
get around to adding them to support our needs....
-Kevin


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Wed, Sep 29, 2010 at 23:45, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Magnus Hagander <magnus@hagander.net> wrote:
>
>> Comments and contributions are most welcome.
>
> This is probably too esoteric to be worked on yet, but for this to
> be useful for us we would need to pass the resulting files through
> pg_clearxlogtail and gzip in an automated fashion.  And we would
> need to do regular log file archiving in parallel with it.
>
> As background, our databases around the state archive to a directory
> which is then pushed via rsync to a "dumb" backup location in the
> same room as the database server (we're lucky to have rsync on the
> target of this copy; any other executable is out of the question),
> and the same directory is pulled via rsync to a central location.
> We would be interested in using streaming replication to a tool such
> as you describe for the copy to the central location, but since we
> would still be forcing a wal-file switch once per hour we would need
> the current capability to shrink an "empty" file from 16MB to 16kB
> using the above-mentioned tools.

You could just have one stream going local and one stream going to the
other location in parallell, though?

Or use the stream to the local directory and rsync that off? While I
haven't tested it, rsyncing the partial WAL files *should* be fine, I
think...


> Also, a the ability to limit bandwidth would be a nice feature for
> us, preferably in a way which could be changed on the fly.
>
> If you could keep the development "friendly" to such features, I may
> get around to adding them to support our needs....

Would it be enough to have kind of an "archive_command" switch that
says "whenever you've finished a complete wal segment, run this
command on it"? Then that command could clear the tail, compress, and
send off?

And in that case, should it run inline or in the background with the
streaming? I would assume just fork it off and leave it to it's own
business would be best?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
"Kevin Grittner"
Date:
Magnus Hagander <magnus@hagander.net> wrote:
>> If you could keep the development "friendly" to such features, I
>> may get around to adding them to support our needs....
> 
> Would it be enough to have kind of an "archive_command" switch
> that says "whenever you've finished a complete wal segment, run
> this command on it"?
That would allow some nice options.  I've been thinking what would
be the ideal use of this with our backup scheme, and the best I've
thought up would be that each WAL segment file would be a single
output stream, with the option of calling a executable (which could
be a script) with the target file name and then piping the stream to
it.  At 16MB or a forced xlog switch, it would close the stream and
call the executable again with a new file name.  You could have a
default executable for the default behavior, or just build in a
default if no executable is specified.
The reason I like this is that I could pipe the stream through
pg_clearxlogtail and gzip pretty much "as is" to the locations on
the database server currently used for rsync to the two targets, and
the rsync commands would send the incremental changes once per
minute to both targets.  I haven't thought of another solution which
provides incremental transmission of the WAL to the local backup
location, which would be a nice thing to have, since this is most
crucial when the WAN is down and not only is WAL data not coming
back to our central location, but our application framework based
replication stream isn't making back, either.
-Kevin


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Thu, Sep 30, 2010 at 15:45, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Magnus Hagander <magnus@hagander.net> wrote:
>
>>> If you could keep the development "friendly" to such features, I
>>> may get around to adding them to support our needs....
>>
>> Would it be enough to have kind of an "archive_command" switch
>> that says "whenever you've finished a complete wal segment, run
>> this command on it"?
>
> That would allow some nice options.  I've been thinking what would
> be the ideal use of this with our backup scheme, and the best I've
> thought up would be that each WAL segment file would be a single
> output stream, with the option of calling a executable (which could
> be a script) with the target file name and then piping the stream to
> it.  At 16MB or a forced xlog switch, it would close the stream and
> call the executable again with a new file name.  You could have a
> default executable for the default behavior, or just build in a
> default if no executable is specified.

The problem with that one (which I'm sure is solvable somehow) is how
to deal with restarts. Both restarts in the middle of a segment
(happens all the time if you don't have an archive_timeout set), and
really also restarts between segments. How would the tool know where
to begin streaming again? Right now, it looks at the files - but doing
it by your suggestion there are no files to look at. We'd need a
second script/command to call to figure out where to restart from in
that case, no?


> The reason I like this is that I could pipe the stream through
> pg_clearxlogtail and gzip pretty much "as is" to the locations on
> the database server currently used for rsync to the two targets, and
> the rsync commands would send the incremental changes once per
> minute to both targets.  I haven't thought of another solution which
> provides incremental transmission of the WAL to the local backup
> location, which would be a nice thing to have, since this is most
> crucial when the WAN is down and not only is WAL data not coming
> back to our central location, but our application framework based
> replication stream isn't making back, either.

It should be safe to just rsync the archive directory as it's being
written by pg_streamrecv. Doesn't that give you the property you're
looking for - local machine gets data streamed in live, remote machine
gets it rsynced every minute?


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
Aidan Van Dyk
Date:
On Thu, Sep 30, 2010 at 10:24 AM, Magnus Hagander <magnus@hagander.net> wrote:

>> That would allow some nice options.  I've been thinking what would
>> be the ideal use of this with our backup scheme, and the best I've
>> thought up would be that each WAL segment file would be a single
>> output stream, with the option of calling a executable (which could
>> be a script) with the target file name and then piping the stream to
>> it.  At 16MB or a forced xlog switch, it would close the stream and
>> call the executable again with a new file name.  You could have a
>> default executable for the default behavior, or just build in a
>> default if no executable is specified.
>
> The problem with that one (which I'm sure is solvable somehow) is how
> to deal with restarts. Both restarts in the middle of a segment
> (happens all the time if you don't have an archive_timeout set), and
> really also restarts between segments. How would the tool know where
> to begin streaming again? Right now, it looks at the files - but doing
> it by your suggestion there are no files to look at. We'd need a
> second script/command to call to figure out where to restart from in
> that case, no?

And then think of the future, when sync rep is in... I'm hoping to be
able to use something like this to do synchrous replication to my
archive (instead of to a live server).

> It should be safe to just rsync the archive directory as it's being
> written by pg_streamrecv. Doesn't that give you the property you're
> looking for - local machine gets data streamed in live, remote machine
> gets it rsynced every minute?

When the "being written to" segmnt copmletes moves to the final
location, he'll get an extra whole "copy" of the file.  But of the
"move" can be an exec of his scritpt, the compressed/gzipped final
result shouldn't be that bad.  Certainly no worse then what he's
currently getting with archive command ;-)  And he's got the
uncompressed incimental updates as they are happening.

a.


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Thu, Sep 30, 2010 at 16:39, Aidan Van Dyk <aidan@highrise.ca> wrote:
> On Thu, Sep 30, 2010 at 10:24 AM, Magnus Hagander <magnus@hagander.net> wrote:
>
>>> That would allow some nice options.  I've been thinking what would
>>> be the ideal use of this with our backup scheme, and the best I've
>>> thought up would be that each WAL segment file would be a single
>>> output stream, with the option of calling a executable (which could
>>> be a script) with the target file name and then piping the stream to
>>> it.  At 16MB or a forced xlog switch, it would close the stream and
>>> call the executable again with a new file name.  You could have a
>>> default executable for the default behavior, or just build in a
>>> default if no executable is specified.
>>
>> The problem with that one (which I'm sure is solvable somehow) is how
>> to deal with restarts. Both restarts in the middle of a segment
>> (happens all the time if you don't have an archive_timeout set), and
>> really also restarts between segments. How would the tool know where
>> to begin streaming again? Right now, it looks at the files - but doing
>> it by your suggestion there are no files to look at. We'd need a
>> second script/command to call to figure out where to restart from in
>> that case, no?
>
> And then think of the future, when sync rep is in... I'm hoping to be
> able to use something like this to do synchrous replication to my
> archive (instead of to a live server).

Right, that could be a future enhancement. Doesn't mean we shouldn't
still do our best with the async mode of course :P


>> It should be safe to just rsync the archive directory as it's being
>> written by pg_streamrecv. Doesn't that give you the property you're
>> looking for - local machine gets data streamed in live, remote machine
>> gets it rsynced every minute?
>
> When the "being written to" segmnt copmletes moves to the final
> location, he'll get an extra whole "copy" of the file.  But of the

Ah, good point.

> "move" can be an exec of his scritpt, the compressed/gzipped final
> result shouldn't be that bad.  Certainly no worse then what he's
> currently getting with archive command ;-)  And he's got the
> uncompressed incimental updates as they are happening.

Yeah, it would be trivial to replace the rename() call with a call to
a script that gets to do whatever is suitable to the file. Actually,
it'd probably be better to rename() it *and* call the script, so that
we can continue properly if the script fails.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
"Kevin Grittner"
Date:
Magnus Hagander <magnus@hagander.net> wrote:
> We'd need a second script/command to call to figure out where to
> restart from in that case, no?
I see your point; I guess we would need that.
> It should be safe to just rsync the archive directory as it's
> being written by pg_streamrecv. Doesn't that give you the property
> you're looking for - local machine gets data streamed in live,
> remote machine gets it rsynced every minute?
Well the local target is a can't run pg_streamrecv -- it's a backup
machine where we pretty much have rsync and nothing else.  We could
run pg_streamrecv on the database server itself and rsync to the
local machine every minute.
I just checked with the DBA who monitors space issues for such
things, and it would be OK to rsync the uncompressed file to the
local backup as it is written (we have enough space for it without
compression) as long as we compress it before sending it to the
central location.  For that, your idea to fire a script on
completion of the file would work -- we could maintain both raw and
compressed files on the database server for rsync to the two
locations.
You can probably see the appeal of filtering it as it is written,
though, if that is feasible.  :-)
-Kevin


Re: Using streaming replication as log archiving

From
"Kevin Grittner"
Date:
Aidan Van Dyk <aidan@highrise.ca> wrote:
> When the "being written to" segmnt copmletes moves to the final
> location, he'll get an extra whole "copy" of the file.  But of the
> "move" can be an exec of his scritpt, the compressed/gzipped final
> result shouldn't be that bad.  Certainly no worse then what he's
> currently getting with archive command ;-)  And he's got the
> uncompressed incimental updates as they are happening.
Hmmm...  As long as streaming replication doesn't send the "tail" of
an incomplete WAL segment file, the only thing we'd be missing on
the send to the central location is the compression.  That's
typically reducing the size of the transmission by 50% to 75% (e.g.,
the gzipped "full" files are usually in the range of 4MB to 8MB). 
At our WAN speeds, that is significant.  I don't suppose that
streaming replication uses (or offers as an option) a compressed
stream?
-Kevin


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Thu, Sep 30, 2010 at 17:25, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Aidan Van Dyk <aidan@highrise.ca> wrote:
>
>> When the "being written to" segmnt copmletes moves to the final
>> location, he'll get an extra whole "copy" of the file.  But of the
>> "move" can be an exec of his scritpt, the compressed/gzipped final
>> result shouldn't be that bad.  Certainly no worse then what he's
>> currently getting with archive command ;-)  And he's got the
>> uncompressed incimental updates as they are happening.
>
> Hmmm...  As long as streaming replication doesn't send the "tail" of
> an incomplete WAL segment file, the only thing we'd be missing on
> the send to the central location is the compression.  That's
> typically reducing the size of the transmission by 50% to 75% (e.g.,
> the gzipped "full" files are usually in the range of 4MB to 8MB).
> At our WAN speeds, that is significant.  I don't suppose that
> streaming replication uses (or offers as an option) a compressed
> stream?

No, it sends a regular COPY stream with the raw transaction log data.
So the files generated will eventually exactly 16Mb, just like they
are in pg_xlog.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: Using streaming replication as log archiving

From
Fujii Masao
Date:
On Wed, Sep 29, 2010 at 5:47 PM, Magnus Hagander <magnus@hagander.net> wrote:
> It's actually intentional. If we create a file at first, there is no
> way to figure out exactly how far through a partial segment we are
> without parsing the details of the log. This is useful both for the
> admin (who can look at the directory and watch the file grow) and the
> tool itself (to know when the .save file can be rotated away, when
> recovering from a partial segment receive).
>
> My idea was to just have the admin pad the file when it's time to do
> the restore. I could perhaps even add an option to the tool to do it -
> the idea being it's a manual step still.
>
> Do you have another suggestion for how to provide those two things?

My idea is to implement something like xlogdump in contrib and use it
for those two things. Though it's harder to implement that than to do
padding tool.

BTW, implementing something like xlogdump is already in TODO list:

---
Create dump tool for write-ahead logs for use in determining transaction
id for point-in-time recovery. This is useful for checking PITR recovery.
http://wiki.postgresql.org/wiki/TODO#Point-In-Time_Recovery_.28PITR.29
---

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Using streaming replication as log archiving

From
Magnus Hagander
Date:
On Fri, Oct 1, 2010 at 11:13, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Wed, Sep 29, 2010 at 5:47 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> It's actually intentional. If we create a file at first, there is no
>> way to figure out exactly how far through a partial segment we are
>> without parsing the details of the log. This is useful both for the
>> admin (who can look at the directory and watch the file grow) and the
>> tool itself (to know when the .save file can be rotated away, when
>> recovering from a partial segment receive).
>>
>> My idea was to just have the admin pad the file when it's time to do
>> the restore. I could perhaps even add an option to the tool to do it -
>> the idea being it's a manual step still.
>>
>> Do you have another suggestion for how to provide those two things?
>
> My idea is to implement something like xlogdump in contrib and use it
> for those two things. Though it's harder to implement that than to do
> padding tool.

Yes, much harder. I prefer keeping the tool simple when possible ;)
Doesn't mean we couldn't do both, i guess.

> BTW, implementing something like xlogdump is already in TODO list:

Yes, obviously such a tool would be very useful independent of the
streaming thing.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/