Thread: Streaming a base backup from master

Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 08:19:35

It's been discussed before that it would be cool if you could stream a
new base backup from the master server, via libpq. That way you would
not need low-level filesystem access to initialize a new standby.

Magnus mentioned today that he started hacking on that, and
coincidentally I just started experimenting with it yesterday as well
:-). So let's get this out on the mailing list.

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
replication command set. Upon receiving that command, the master starts
a COPY, and streams a tarred copy of the data directory to the client.
The patch includes a simple command-line tool, pg_streambackup, to
connect to a server and request a backup that you can then redirect to a
.tar file or pipe to "tar x".

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

* The streamed backup archive should contain all the necessary WAL files
too, so that you don't need to set up archiving to use this. You could
just point the tiny client tool to the server, and get a backup archive
containing everything that's necessary to restore correctly.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

basebackup-1.patch

Re: Streaming a base backup from master

From

Thom Brown

Date:

03 September 2010, 08:26:31

On 3 September 2010 12:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> TODO:
>
> * We need a smarter way to do pg_start/stop_backup() with this. At the
> moment, you can only have one backup running at a time, but we shouldn't
> have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Streaming a base backup from master

From

Dave Page

Date:

03 September 2010, 08:28:21

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
> command set. Upon receiving that command, the master starts a COPY, and
> streams a tarred copy of the data directory to the client. The patch
> includes a simple command-line tool, pg_streambackup, to connect to a server
> and request a backup that you can then redirect to a .tar file or pipe to
> "tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Magnus Hagander

Date:

03 September 2010, 08:29:00

On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> It's been discussed before that it would be cool if you could stream a new
> base backup from the master server, via libpq. That way you would not need
> low-level filesystem access to initialize a new standby.
>
> Magnus mentioned today that he started hacking on that, and coincidentally I
> just started experimenting with it yesterday as well :-). So let's get this
> out on the mailing list.
>
> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
> command set. Upon receiving that command, the master starts a COPY, and
> streams a tarred copy of the data directory to the client. The patch
> includes a simple command-line tool, pg_streambackup, to connect to a server
> and request a backup that you can then redirect to a .tar file or pipe to
> "tar x".
>
> TODO:
>
> * We need a smarter way to do pg_start/stop_backup() with this. At the
> moment, you can only have one backup running at a time, but we shouldn't
> have that limitation with this built-in mechanism.
>
> * The streamed backup archive should contain all the necessary WAL files
> too, so that you don't need to set up archiving to use this. You could just
> point the tiny client tool to the server, and get a backup archive
> containing everything that's necessary to restore correctly.

For this last point, this should of course be *optional*, but it would
be very good to have that option (and probably on by default).

Couple of quick comments that I saw directly differentiated from the
code I have :-) We chatted some about it already, but it should be
included for others...

* It should be possible to pass the backup label through, not just
hardcode it to basebackup

* Needs support for tablespaces. We should either follow the symlinks
and pick up the files, or throw an error if it's there. Silently
delivering an incomplete backup is not a good thing :-)

* Is there a point in adapting the chunk size to the size of the libpq buffers?

FWIW, my implementation was as a user-defined function, which has the
advantage it can run on 9.0. But most likely this code can be ripped
out and provided as a separate backport project for 9.0 if necessary -
no need to have separate codebases.

Other than that, our code is remarkably similar.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Streaming a base backup from master

From

Magnus Hagander

Date:

03 September 2010, 08:30:16

On Fri, Sep 3, 2010 at 13:25, Thom Brown <thom@linux.com> wrote:
> On 3 September 2010 12:19, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> TODO:
>>
>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>> moment, you can only have one backup running at a time, but we shouldn't
>> have that limitation with this built-in mechanism.
>
> Would it be possible to not require pg_start/stop_backup() for this
> new feature? (yes, I'm probably missing something obvious here)

You don't need to run it *manually*, but the process needs to run it
automatically in the background for you. Which it does already in the
suggested patch.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Streaming a base backup from master

From

Thom Brown

Date:

03 September 2010, 08:33:37

On 3 September 2010 12:30, Magnus Hagander <magnus@hagander.net> wrote:
> On Fri, Sep 3, 2010 at 13:25, Thom Brown <thom@linux.com> wrote:
>> On 3 September 2010 12:19, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> TODO:
>>>
>>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>>> moment, you can only have one backup running at a time, but we shouldn't
>>> have that limitation with this built-in mechanism.
>>
>> Would it be possible to not require pg_start/stop_backup() for this
>> new feature? (yes, I'm probably missing something obvious here)
>
> You don't need to run it *manually*, but the process needs to run it
> automatically in the background for you. Which it does already in the
> suggested patch.

Ah, clearly I didn't read the patch in any detail.  Thanks :)

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 08:44:02

On 03/09/10 14:25, Thom Brown wrote:
> On 3 September 2010 12:19, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> TODO:
>>
>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>> moment, you can only have one backup running at a time, but we shouldn't
>> have that limitation with this built-in mechanism.
>
> Would it be possible to not require pg_start/stop_backup() for this
> new feature? (yes, I'm probably missing something obvious here)

Well, pg_start_backup() does several things:

1. It sets the forceFullPageWrites flag, so that we don't get partial 
pages in the restored database.
2. It performs a checkpoint
3. It creates a backup label file

We certainly need 1 and 2. We don't necessary need to write the backup 
label file to the data directory when we're streaming the backup 
directly to the client, we can just include it in the streamed archive.

pg_stop_backup() also does several things:
1. It clears the forceFullPageWrites flag.
2. It writes an end-of-backup WAL record
3. It switches to new WAL segment, to get the final WAL segment archived.
4. It writes a backup history file
5. It removes the backup label file.
6. It waits for all the required WAL files to be archived.

We need 1, but the rest we could do in a smarter way. When we have more 
control of the backup process, I don't think we need the end-of-backup 
WAL record or the backup label anymore. We can add the pg_control file 
as the last file in the archive, and set minRecoveryPoint in it to the 
last WAL record needed to recover.

So no, we don't really need pg_start/stop_backup() per se, but we'll 
need something similar...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 08:49:09

On 03/09/10 14:28, Dave Page wrote:
> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
>> command set. Upon receiving that command, the master starts a COPY, and
>> streams a tarred copy of the data directory to the client. The patch
>> includes a simple command-line tool, pg_streambackup, to connect to a server
>> and request a backup that you can then redirect to a .tar file or pipe to
>> "tar x".
>
> Cool. Can you add a TODO to build in code to un-tar the archive? tar
> is not usually found on Windows systems, and as we already have tar
> extraction code in pg_restore it could presumably be added relatively
> painlessly.

Ok. Another obvious thing that people will want is to gzip the tar file 
while sending it, to reduce network traffic.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Magnus Hagander

Date:

03 September 2010, 08:50:18

On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 03/09/10 14:28, Dave Page wrote:
>>
>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>
>>> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
>>> replication
>>> command set. Upon receiving that command, the master starts a COPY, and
>>> streams a tarred copy of the data directory to the client. The patch
>>> includes a simple command-line tool, pg_streambackup, to connect to a
>>> server
>>> and request a backup that you can then redirect to a .tar file or pipe to
>>> "tar x".
>>
>> Cool. Can you add a TODO to build in code to un-tar the archive? tar
>> is not usually found on Windows systems, and as we already have tar
>> extraction code in pg_restore it could presumably be added relatively
>> painlessly.
>
> Ok. Another obvious thing that people will want is to gzip the tar file
> while sending it, to reduce network traffic.

Not necessarily obvious, needs to be configurable. There are a lot of
cases where you might not want it.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Streaming a base backup from master

From

Greg Stark

Date:

03 September 2010, 09:16:55

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> * We need a smarter way to do pg_start/stop_backup() with this. At the
> moment, you can only have one backup running at a time, but we shouldn't
> have that limitation with this built-in mechanism.

Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.

-- 
greg

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 09:29:12

On 03/09/10 15:16, Greg Stark wrote:
> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>> moment, you can only have one backup running at a time, but we shouldn't
>> have that limitation with this built-in mechanism.
>
> Well there's no particular reason we couldn't support having multiple
> pg_start_backup() pending either. It's just not usually something
> people have need so far.

The backup label file makes that hard. There can be only one at a time.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Robert Haas

Date:

03 September 2010, 10:24:28

On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:
> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
>> command set. Upon receiving that command, the master starts a COPY, and
>> streams a tarred copy of the data directory to the client. The patch
>> includes a simple command-line tool, pg_streambackup, to connect to a server
>> and request a backup that you can then redirect to a .tar file or pipe to
>> "tar x".
>
> Cool. Can you add a TODO to build in code to un-tar the archive? tar
> is not usually found on Windows systems, and as we already have tar
> extraction code in pg_restore it could presumably be added relatively
> painlessly.

It seems like the elephant in the room here is updating an existing
backup without recopying the entire data directory.  Perhaps that's
phase two, but worth keeping in mind...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Magnus Hagander

Date:

03 September 2010, 10:26:51

On Fri, Sep 3, 2010 at 15:24, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:
>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
>>> command set. Upon receiving that command, the master starts a COPY, and
>>> streams a tarred copy of the data directory to the client. The patch
>>> includes a simple command-line tool, pg_streambackup, to connect to a server
>>> and request a backup that you can then redirect to a .tar file or pipe to
>>> "tar x".
>>
>> Cool. Can you add a TODO to build in code to un-tar the archive? tar
>> is not usually found on Windows systems, and as we already have tar
>> extraction code in pg_restore it could presumably be added relatively
>> painlessly.
>
> It seems like the elephant in the room here is updating an existing
> backup without recopying the entire data directory.  Perhaps that's
> phase two, but worth keeping in mind...

I'd say that's a very different use-case, but still a very useful one
of course. It's probably going to be a lot more complex (it would
require bi-directional traffic, I think)...


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Streaming a base backup from master

From

Dave Page

Date:

03 September 2010, 10:26:59

On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:
>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
>>> command set. Upon receiving that command, the master starts a COPY, and
>>> streams a tarred copy of the data directory to the client. The patch
>>> includes a simple command-line tool, pg_streambackup, to connect to a server
>>> and request a backup that you can then redirect to a .tar file or pipe to
>>> "tar x".
>>
>> Cool. Can you add a TODO to build in code to un-tar the archive? tar
>> is not usually found on Windows systems, and as we already have tar
>> extraction code in pg_restore it could presumably be added relatively
>> painlessly.
>
> It seems like the elephant in the room here is updating an existing
> backup without recopying the entire data directory.  Perhaps that's
> phase two, but worth keeping in mind...

rsync? Might be easier to use that from day 1 (well, day 2) than to
retrofit later.


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Robert Haas

Date:

03 September 2010, 10:29:52

On Fri, Sep 3, 2010 at 9:26 AM, Dave Page <dpage@pgadmin.org> wrote:
> rsync? Might be easier to use that from day 1 (well, day 2) than to
> retrofit later.

I'm not sure we want to depend on an external utility like that,
particularly one that users may not have installed.  And I'm not sure
if that can be made to work over a libpq channel, either.  But
certainly something with that functionality would be nice to have,
whether it ends up sharing code or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Dave Page

Date:

03 September 2010, 10:32:51

On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Sep 3, 2010 at 9:26 AM, Dave Page <dpage@pgadmin.org> wrote:
>> rsync? Might be easier to use that from day 1 (well, day 2) than to
>> retrofit later.
>
> I'm not sure we want to depend on an external utility like that,
> particularly one that users may not have installed.  And I'm not sure
> if that can be made to work over a libpq channel, either.  But
> certainly something with that functionality would be nice to have,
> whether it ends up sharing code or not.

No, I agree we don't want an external dependency (I was just bleating
about needing tar on Windows). I was assuming/hoping there's a
librsync somewhere...


--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Robert Haas

Date:

03 September 2010, 10:43:27

On Fri, Sep 3, 2010 at 9:32 AM, Dave Page <dpage@pgadmin.org> wrote:
> No, I agree we don't want an external dependency (I was just bleating
> about needing tar on Windows). I was assuming/hoping there's a
> librsync somewhere...

The rsync code itself is not modular, I believe.  I think the author
thereof kind of took the approach of placing efficiency before all.
See:

http://www.samba.org/rsync/how-rsync-works.html ... especially the
section on "The Rsync Protocol"

I Googled librsync and got a hit, but that code is a rewrite of the
source base and seems to have little or no activity since 2004.

http://librsync.sourceforge.net/

That page writes: "librsync is not wire-compatible with rsync 2.x, and
is not likely to be in the future."  The current version of rsync is
3.0.7.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Stephen Frost

Date:

03 September 2010, 10:56:21

* Robert Haas (robertmhaas@gmail.com) wrote:
> The rsync code itself is not modular, I believe.  I think the author
> thereof kind of took the approach of placing efficiency before all.

Yeah, I looked into this when discussing this same concept at PGCon with
folks.  There doesn't appear to be a good librsync and, even if there
was, there's a heck of alot of complexity there that we *don't* need.
rsync is a great tool, don't get me wrong, but let's not try to go over
our heads here.

We don't need permissions handling, as an example.  I also don't think
we need the binary diff/partial file transfer capability- we already
break relations into 1G chunks (when/if they reach that size), so you
won't necessairly be copying the entire relation if you're just doing
mtime based or per-file-checksum based detection.  We don't need device
node handling, we don't need auto-ignoring files, or pattern
exclusion/inclusion, we don't really need a progress bar (though it'd be
nice.. :), etc, etc, etc.
Thanks,
    Stephen

Re: Streaming a base backup from master

From

Tom Lane

Date:

03 September 2010, 12:01:49

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 03/09/10 15:16, Greg Stark wrote:
>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>>> moment, you can only have one backup running at a time, but we shouldn't
>>> have that limitation with this built-in mechanism.
>> 
>> Well there's no particular reason we couldn't support having multiple
>> pg_start_backup() pending either. It's just not usually something
>> people have need so far.

> The backup label file makes that hard. There can be only one at a time.

I don't actually see a use-case for streaming multiple concurrent
backups.  How many people are going to be able to afford that kind of
load on the master's I/O bandwidth?

Certainly for version 1, it would be sufficient to throw an error if
someone tries to start a backup while another one is in progress.
*Maybe*, down the road, we'd want to relax it.
        regards, tom lane

Re: Streaming a base backup from master

From

"Kevin Grittner"

Date:

03 September 2010, 12:02:19

Stephen Frost <sfrost@snowman.net> wrote:
> there's a heck of alot of complexity there that we *don't* need.
> rsync is a great tool, don't get me wrong, but let's not try to go
> over our heads here.
Right -- among other things, it checks for portions of a new file
which match the old file at a different location.  For example, if
you have a very large text file, and insert a line or two at the
start, it will wind up only sending the new lines.  (Well, that and
all the checksums which help it determine that the rest of the file
matches at a shifted location.)  I would think that PostgreSQL could
just check whether *corresponding* portions of a file matched, which
is much simpler.
> we already break relations into 1G chunks (when/if they reach that
> size), so you won't necessairly be copying the entire relation if
> you're just doing mtime based or per-file-checksum based
> detection.
While 1GB granularity would be OK, I doubt it's optimal; I think CRC
checks for smaller chunks might be worthwhile.  My gut feel is that
somewhere in the 64kB to 1MB range would probably be optimal for us,
although the "sweet spot" will depend on how the database is used.
A configurable or self-adjusting size would be cool.
-Kevin

Re: Streaming a base backup from master

From

Stephen Frost

Date:

03 September 2010, 12:20:17

Kevin,

* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
> While 1GB granularity would be OK, I doubt it's optimal; I think CRC
> checks for smaller chunks might be worthwhile.  My gut feel is that
> somewhere in the 64kB to 1MB range would probably be optimal for us,
> although the "sweet spot" will depend on how the database is used.
> A configurable or self-adjusting size would be cool.

We have something much better, called WAL.  If people want to keep their
backup current, they should use that after getting the base backup up
and working.  We don't need to support this for the base backup, imv.

In any case, it's certainly not something required for an initial
implementation..
Thanks,
    Stephen

Re: Streaming a base backup from master

From

Thom Brown

Date:

03 September 2010, 12:24:02

On 3 September 2010 16:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> On 03/09/10 15:16, Greg Stark wrote:
>>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>  wrote:
>>>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>>>> moment, you can only have one backup running at a time, but we shouldn't
>>>> have that limitation with this built-in mechanism.
>>>
>>> Well there's no particular reason we couldn't support having multiple
>>> pg_start_backup() pending either. It's just not usually something
>>> people have need so far.
>
>> The backup label file makes that hard. There can be only one at a time.
>
> I don't actually see a use-case for streaming multiple concurrent
> backups.  How many people are going to be able to afford that kind of
> load on the master's I/O bandwidth?

To make it affordable, could functionality be added to allow slaves to
become chainable? (i.e. master streams to standby 1, which streams to
standby 2 etc)  This would help reduce bandwidth for normal streaming
replication too, which would be useful on particularly busy databases.
Obviously in synchronous replication this would be horribly slow so
not feasible for that.
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Streaming a base backup from master

From

"Kevin Grittner"

Date:

03 September 2010, 12:32:29

>Stephen Frost <sfrost@snowman.net> wrote:
> We have something much better, called WAL.  If people want to keep
> their backup current, they should use that after getting the base
> backup up and working.
Unless you want to provide support for Point In Time Recovery
without excessive recovery times.
> We don't need to support this for the base backup, imv.
We found that making a hard-link copy of the previous base backup
and using rsync to bring it up to date used 1% the WAN bandwidth as
sending a complete, compressed base backup.  Just sending modified
files in their entirety would have bought the first order of
magnitude; recognizing the unchanged portions buys the second order
of magnitude.
> In any case, it's certainly not something required for an initial
> implementation..
No disagreement there; but sometimes it pays to know where you might
want to go, so you don't do something to make further development in
that direction unnecessarily difficult.
-Kevin

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 12:38:48

On 03/09/10 18:01, Tom Lane wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> On 03/09/10 15:16, Greg Stark wrote:
>>> On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
>>> <heikki.linnakangas@enterprisedb.com>   wrote:
>>>> * We need a smarter way to do pg_start/stop_backup() with this. At the
>>>> moment, you can only have one backup running at a time, but we shouldn't
>>>> have that limitation with this built-in mechanism.
>>>
>>> Well there's no particular reason we couldn't support having multiple
>>> pg_start_backup() pending either. It's just not usually something
>>> people have need so far.
>
>> The backup label file makes that hard. There can be only one at a time.
>
> I don't actually see a use-case for streaming multiple concurrent
> backups.  How many people are going to be able to afford that kind of
> load on the master's I/O bandwidth?

It's more a matter of convenience when you're setting up test 
environments with small databases or something like that. I don't see 
many people regularly using the streaming backup for anything larger 
than a few hundred gigabytes anyway. At that point you'll most likely 
want to use something more efficient.

> Certainly for version 1, it would be sufficient to throw an error if
> someone tries to start a backup while another one is in progress.
> *Maybe*, down the road, we'd want to relax it.

Yeah, it's OK for 1st version.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Robert Haas

Date:

03 September 2010, 12:41:41

On Fri, Sep 3, 2010 at 11:20 AM, Stephen Frost <sfrost@snowman.net> wrote:
> Kevin,
>
> * Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote:
>> While 1GB granularity would be OK, I doubt it's optimal; I think CRC
>> checks for smaller chunks might be worthwhile.  My gut feel is that
>> somewhere in the 64kB to 1MB range would probably be optimal for us,
>> although the "sweet spot" will depend on how the database is used.
>> A configurable or self-adjusting size would be cool.
>
> We have something much better, called WAL.  If people want to keep their
> backup current, they should use that after getting the base backup up
> and working.  We don't need to support this for the base backup, imv.
>
> In any case, it's certainly not something required for an initial
> implementation..

While I'm certainly not knocking WAL, it's not difficult to think of
cases where being able to incrementally update a backup saves you an
awful lot of bandwidth.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Tom Lane

Date:

03 September 2010, 12:48:00

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Stephen Frost <sfrost@snowman.net> wrote:
>> In any case, it's certainly not something required for an initial
>> implementation..
> No disagreement there; but sometimes it pays to know where you might
> want to go, so you don't do something to make further development in
> that direction unnecessarily difficult.

I think that setting out to reimplement rsync, or to go down a design
path where we're likely to do a lot of that eventually, is the height
of folly.  We should be standing on the shoulders of other projects,
not rolling our own because of misguided ideas about people not having
those projects installed.

IOW, what I'd like to see is protocol extensions that allow an external
copy of rsync to be invoked; not build in rsync, or tar, or anything
else that we could get off-the-shelf.
        regards, tom lane

Re: Streaming a base backup from master

From

Robert Haas

Date:

03 September 2010, 12:53:16

On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>>> Stephen Frost <sfrost@snowman.net> wrote:
>>> In any case, it's certainly not something required for an initial
>>> implementation..
>
>> No disagreement there; but sometimes it pays to know where you might
>> want to go, so you don't do something to make further development in
>> that direction unnecessarily difficult.
>
> I think that setting out to reimplement rsync, or to go down a design
> path where we're likely to do a lot of that eventually, is the height
> of folly.  We should be standing on the shoulders of other projects,
> not rolling our own because of misguided ideas about people not having
> those projects installed.
>
> IOW, what I'd like to see is protocol extensions that allow an external
> copy of rsync to be invoked; not build in rsync, or tar, or anything
> else that we could get off-the-shelf.

We used to use "cp" to create databases.  Should we go back to that system?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

David Blewett

Date:

03 September 2010, 13:01:33

On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> IOW, what I'd like to see is protocol extensions that allow an external
> copy of rsync to be invoked; not build in rsync, or tar, or anything
> else that we could get off-the-shelf.

Personally, I would love to see protocol-level compression added.
(Yes, going over a compressed SSH tunnel works well, but in general
isn't user-friendly.)

Josh: we talked on IRC awhile back and you mentioned that CMD had
added this in Mammoth? Would you be interested in having someone get
that integrated back into the community?

David Blewett

Re: Streaming a base backup from master

From

Stephen Frost

Date:

03 September 2010, 13:09:12

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> IOW, what I'd like to see is protocol extensions that allow an external
> copy of rsync to be invoked; not build in rsync, or tar, or anything
> else that we could get off-the-shelf.

I'd much rather use an existing library to implement it than call out to
some external utility.  That said, I'm about as thrilled with libtar as
librsync after a bit of googling around. :/
Thanks,
    Stephen

Re: Streaming a base backup from master

From

"Kevin Grittner"

Date:

03 September 2010, 13:11:51

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> what I'd like to see is protocol extensions that allow an external
> copy of rsync to be invoked; not build in rsync, or tar, or
> anything else that we could get off-the-shelf.
The complexities of dealing with properly invoking rsync externally
could well require more code and be considerably more fragile than
passing the data through the existing SR connection; particularly
since to get the full benefits of rsync you need to be dealing with
a daemon which has the appropriate modules configured -- the
location of which you wouldn't easily know.
If we were talking about re-implementing rsync, or doing more than a
rough approximation, kinda, of 5% of what rsync does, I'd be with
you.
-Kevin

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 13:22:24

On 03/09/10 19:09, Stephen Frost wrote:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> IOW, what I'd like to see is protocol extensions that allow an external
>> copy of rsync to be invoked; not build in rsync, or tar, or anything
>> else that we could get off-the-shelf.
>
> I'd much rather use an existing library to implement it than call out to
> some external utility.  That said, I'm about as thrilled with libtar as
> librsync after a bit of googling around. :/

The code to build a tar archive is about 200 lines of code. The amount 
of code for untar is about the same. That's about the amount of effort 
We could add zlib compression since we already link with that, but 
that's about it. I'm not interested in adding more infrastructure for 
more tools. For more complicated scenarios, you can still use 
pg_start/stop_backup() as usual, there's nothing wrong with that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Heikki Linnakangas

Date:

03 September 2010, 13:26:39

On 03/09/10 18:53, David Blewett wrote:
> On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>> IOW, what I'd like to see is protocol extensions that allow an external
>> copy of rsync to be invoked; not build in rsync, or tar, or anything
>> else that we could get off-the-shelf.
>
> Personally, I would love to see protocol-level compression added.
> (Yes, going over a compressed SSH tunnel works well, but in general
> isn't user-friendly.)
>
> Josh: we talked on IRC awhile back and you mentioned that CMD had
> added this in Mammoth? Would you be interested in having someone get
> that integrated back into the community?

There's a recent thread on pgsql-general about just that:
http://archives.postgresql.org/pgsql-general/2010-08/msg00003.php

I agree with Tom's comments there, I'd like to have something to 
enable/disable SSL compression rather than implement our own. There was 
some discussion that it might not be available on JDBC SSL 
implementations, but if it's done in our protocol, you'll need changes 
to the client to make it work anyway.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Streaming a base backup from master

From

Martijn van Oosterhout

Date:

03 September 2010, 16:31:10

On Fri, Sep 03, 2010 at 09:56:12AM -0400, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
> > The rsync code itself is not modular, I believe.  I think the author
> > thereof kind of took the approach of placing efficiency before all.
>
> Yeah, I looked into this when discussing this same concept at PGCon with
> folks.  There doesn't appear to be a good librsync and, even if there
> was, there's a heck of alot of complexity there that we *don't* need.
> rsync is a great tool, don't get me wrong, but let's not try to go over
> our heads here.

rsync is not rocket science. All you need is for the receiving end to
send a checksum for each block it has. The server side does the same
checksum and for each block sends back "same" or "new data".

The client and the server don't need to synchronise at all. If the
client sends nothing, the server sends everything.

The tricky part of rsync (finding block that have moved) is not needed
here.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: Streaming a base backup from master

From

David Blewett

Date:

03 September 2010, 17:56:54

On Fri, Sep 3, 2010 at 12:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 03/09/10 18:53, David Blewett wrote:
>>
>> On Fri, Sep 3, 2010 at 11:47 AM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>>
>>> IOW, what I'd like to see is protocol extensions that allow an external
>>> copy of rsync to be invoked; not build in rsync, or tar, or anything
>>> else that we could get off-the-shelf.
>>
>> Personally, I would love to see protocol-level compression added.
>> (Yes, going over a compressed SSH tunnel works well, but in general
>> isn't user-friendly.)
>>
>> Josh: we talked on IRC awhile back and you mentioned that CMD had
>> added this in Mammoth? Would you be interested in having someone get
>> that integrated back into the community?
>
> There's a recent thread on pgsql-general about just that:
> http://archives.postgresql.org/pgsql-general/2010-08/msg00003.php
>
> I agree with Tom's comments there, I'd like to have something to
> enable/disable SSL compression rather than implement our own. There was some
> discussion that it might not be available on JDBC SSL implementations, but
> if it's done in our protocol, you'll need changes to the client to make it
> work anyway.

While I agree that combining SSL with compression is a great win, I'm
not sold on Tom's argument that compression is only needed in WAN
situations. I've seen great benefit to using an SSH tunnel with
compression over LAN connections (100 and 1000 mbps). At work, we do
have a private WAN that it would be nice to be able to use compression
with no encryption on. I think it's a general-use thing. While I know
it's not the best argument, MySQL does provide compression at the
connection level.

David Blewett

Re: Streaming a base backup from master

From

Greg Stark

Date:

04 September 2010, 10:43:10

On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
>
> rsync is not rocket science. All you need is for the receiving end to
> send a checksum for each block it has. The server side does the same
> checksum and for each block sends back "same" or "new data".

Well rsync is closer to rocket science than that. It does rolling
checksums and can handle data being moved around, which vacuum does do
so it's probably worthwhile.

*However* I tihnk you're all headed in the wrong direction here. I
don't think rsync is what anyone should be doing with their backups at
all. It still requires scanning through *all* your data even if you've
only changed a small percentage (which it seems is the use case you're
concerned about) and it results in corrupting your backup while the
rsync is in progress and having a window with no usable backup. You
could address that with rsync --compare-dest but then you're back to
needing space and i/o for whole backups every time even if you're only
changing small parts of the database.

The industry standard solution that we're missing that we *should* be
figuring out how to implement is incremental backups.

I've actually been thinking about this recently and I think we could
do it fairly easily with our existing infrastructure. I was planning
on doing it as an external utility but it would be tempting to be able
to request an external backup via the streaming protocol so maybe it
would be better a bit more integrated.

The way I see it there are two alternatives. You need to start by
figuring out which blocks have been modified since the last backup (or
selected reference point). You can do this either by scanning every
data file and picking every block with an LSN > the reference LSN. Or
you can do it by scanning the WAL since that point and accumulating a
list of block numbers.

Either way you then need to archive all those blocks into a special
file format which includes meta-information to dictate which file and
what block number each block represents. Also it would be useful to
include the reference LSN and the beginning and ending LSN of the
backup so that we can verify when restoring it that we're starting
with a recent enough database and that we've replayed the right range
of WAL to bring it to a consistent state.

It's tempting to make the incremental backup file format just a
regular WAL file with a series of special WAL records which just
contain a backup block. That might be a bit confusing since it would
be a second unrelated LSN series but I like the idea of being able to
use the same bits of code to handle the "holes" and maybe other code.
On the whole I think it would be just a little too weird though.

-- 
greg

Re: Streaming a base backup from master

From

Thom Brown

Date:

04 September 2010, 10:47:52

On 4 September 2010 14:42, Greg Stark <gsstark@mit.edu> wrote:
> The industry standard solution that we're missing that we *should* be
> figuring out how to implement is incremental backups.

I'll buy you a crate of beer if this gets implemented... although
you're in Dublin so would be like buying Willy Wonka a Mars bar.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Streaming a base backup from master

From

Robert Haas

Date:

04 September 2010, 23:58:35

On Sat, Sep 4, 2010 at 9:42 AM, Greg Stark <gsstark@mit.edu> wrote:
> *However* I tihnk you're all headed in the wrong direction here. I
> don't think rsync is what anyone should be doing with their backups at
> all. It still requires scanning through *all* your data even if you've
> only changed a small percentage (which it seems is the use case you're
> concerned about) and it results in corrupting your backup while the
> rsync is in progress and having a window with no usable backup. You
> could address that with rsync --compare-dest but then you're back to
> needing space and i/o for whole backups every time even if you're only
> changing small parts of the database.

It depends.  If the use case is "I accidentally (or purposefully but
temporarily) started up my slave as a master, and now I want it to go
back to having it be the master" or "I lost the WAL files I need to
roll this base backup forward (perhaps because wal_keep_segments
wasn't set high enough)", rsync is what you need.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Martijn van Oosterhout

Date:

05 September 2010, 12:52:06

On Sat, Sep 04, 2010 at 02:42:40PM +0100, Greg Stark wrote:
> On Fri, Sep 3, 2010 at 8:30 PM, Martijn van Oosterhout
> <kleptog@svana.org> wrote:
> >
> > rsync is not rocket science. All you need is for the receiving end to
> > send a checksum for each block it has. The server side does the same
> > checksum and for each block sends back "same" or "new data".
>
> Well rsync is closer to rocket science than that. It does rolling
> checksums and can handle data being moved around, which vacuum does do
> so it's probably worthwhile.

Not sure. When vacuum moves rows around the chance that it will move
rows as a block and that the line pointers will be the same is
practically nil. I don't think rsync will pick up on blocks the size of
a typical row. Vacuum changes the headers so you never have a copied
block.

> *However* I tihnk you're all headed in the wrong direction here. I
> don't think rsync is what anyone should be doing with their backups at
> all. It still requires scanning through *all* your data even if you've
> only changed a small percentage (which it seems is the use case you're
> concerned about) and it results in corrupting your backup while the
> rsync is in progress and having a window with no usable backup. You
> could address that with rsync --compare-dest but then you're back to
> needing space and i/o for whole backups every time even if you're only
> changing small parts of the database.

If you're working from a known good version of the database at some
point, yes you are right you have more interesting options. If you
don't you want something that will fix it.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Re: Streaming a base backup from master

From

Greg Stark

Date:

06 September 2010, 11:18:55

On Sun, Sep 5, 2010 at 4:51 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:

> If you're working from a known good version of the database at some
> point, yes you are right you have more interesting options. If you
> don't you want something that will fix it.

Sure, in that case you want to restore from backup. Whatever you use
to do that is the same net result. I'm not sure rsync is actually
going to be much faster though since it still has to read all of the
existing database which a normal restore doesn't have to. If the
database has changed significantly that's a lot of extra I/O and
you're probably on a local network with a lot of bandwidth available.

What I'm talking about is how you *take* backups. Currently you have
to take a full backup which if you have a large data warehouse could
be a big job. If only a small percentage of the database is changing
then you could use rsync to reduce the network bandwidth to transfer
your backup but you still have to read the entire database and write
out the entire backup.

Incremental backups mean being able to read just the data blocks that
have been modified and write out a backup file with just those blocks.
When it comes time to restore then you restore the last full backup,
then any incremental backups since then, then replay any logs needed
to bring it to a consistent state.

I think that description pretty much settles the question in my mind.
The implementation choice of scanning the WAL to find all the changed
blocks is more relevant to the use cases where incremental backups are
useful. If you still have to read the entire database then there's not
all that much to be gained except storage space. If you scan the WAL
then you can avoid reading most of your large data warehouse to
generate the incremental and only read the busy portion.

In the use case where the database is extremely busy but writing and
rewriting the same small number of blocks over and over even scanning
the WAL might not be ideal. For that use case it might be more useful
to generate a kind of wal-summary which lists all the blocks touched
since the last checkpoint every checkpoint. But that could be a later
optimization.

-- 
greg

Re: Streaming a base backup from master

From

Robert Haas

Date:

06 September 2010, 19:10:01

On Mon, Sep 6, 2010 at 10:07 AM, Greg Stark <gsstark@mit.edu> wrote:
> I think that description pretty much settles the question in my mind.
> The implementation choice of scanning the WAL to find all the changed
> blocks is more relevant to the use cases where incremental backups are
> useful. If you still have to read the entire database then there's not
> all that much to be gained except storage space. If you scan the WAL
> then you can avoid reading most of your large data warehouse to
> generate the incremental and only read the busy portion.

If you can scan the WAL, why wouldn't you just replay it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Streaming a base backup from master

From

Bruce Momjian

Date:

07 September 2010, 22:29:23

Greg Stark wrote:
> The industry standard solution that we're missing that we *should* be
> figuring out how to implement is incremental backups.
> 
> I've actually been thinking about this recently and I think we could
> do it fairly easily with our existing infrastructure. I was planning
> on doing it as an external utility but it would be tempting to be able
> to request an external backup via the streaming protocol so maybe it
> would be better a bit more integrated.
> 
> The way I see it there are two alternatives. You need to start by
> figuring out which blocks have been modified since the last backup (or
> selected reference point). You can do this either by scanning every
> data file and picking every block with an LSN > the reference LSN. Or
> you can do it by scanning the WAL since that point and accumulating a
> list of block numbers.

That's what pgrman does already:
http://code.google.com/p/pg-rman/

Are you saying you want to do that over the libpq connection?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +