On Jul 16, 2009, at 11:09 AM, Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:41 PM, Heikki
> Linnakangas<heikki.linnakangas@enterprisedb.com> wrote:
>> Rick Gigger wrote:
>>> If you use an rsync like algorithm for doing the base backups
>>> wouldn't
>>> that increase the size of the database for which it would still be
>>> practical to just re-sync? Couldn't you in fact sync a very large
>>> database if the amount of actual change in the files was a small
>>> percentage of the total size?
>>
>> It would certainly help to reduce the network traffic, though you'd
>> still have to scan all the data to see what has changed.
>
> The fundamental problem with pushing users to start over with a new
> base backup is that there's no relationship between the size of the
> WAL and the size of the database.
>
> You can plausibly have a system with extremely high transaction rate
> generating WAL very quickly, but where the whole database fits in a
> few hundred megabytes. In that case you could be behind by only a few
> minutes and have it be faster to take a new base backup.
>
> Or you could have a petabyte database which is rarely updated. In
> which case it might be faster to apply weeks' worth of logs than to
> try to take a base backup.
>
> Only the sysadmin is actually going to know which makes more sense.
> Unless we start tieing WAL parameters to the database size or
> something like that.
Once again wouldn't an rsync like algorithm help here. Couldn't you
have the default be to just create a new base backup for them , but
then allow you to specify an existing base backup if you've already
got one?