Re: Streaming a base backup from master - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Streaming a base backup from master
Date
Msg-id 4C80C79E0200002500035171@gw.wicourts.gov
Whole thread Raw
In response to Re: Streaming a base backup from master  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Streaming a base backup from master  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
Stephen Frost <sfrost@snowman.net> wrote:
> there's a heck of alot of complexity there that we *don't* need.
> rsync is a great tool, don't get me wrong, but let's not try to go
> over our heads here.
Right -- among other things, it checks for portions of a new file
which match the old file at a different location.  For example, if
you have a very large text file, and insert a line or two at the
start, it will wind up only sending the new lines.  (Well, that and
all the checksums which help it determine that the rest of the file
matches at a shifted location.)  I would think that PostgreSQL could
just check whether *corresponding* portions of a file matched, which
is much simpler.
> we already break relations into 1G chunks (when/if they reach that
> size), so you won't necessairly be copying the entire relation if
> you're just doing mtime based or per-file-checksum based
> detection.
While 1GB granularity would be OK, I doubt it's optimal; I think CRC
checks for smaller chunks might be worthwhile.  My gut feel is that
somewhere in the 64kB to 1MB range would probably be optimal for us,
although the "sweet spot" will depend on how the database is used.
A configurable or self-adjusting size would be cool.
-Kevin


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Streaming a base backup from master
Next
From: David Fetter
Date:
Subject: Windows Tools