Re: File based Incremental backup v8 - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: File based Incremental backup v8
Date
Msg-id CAHGQGwHbya2tCg6r0gPPERjJQKWAYdTgS52H2gqEDe14p1_5ig@mail.gmail.com
Whole thread Raw
In response to Re: File based Incremental backup v8  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Responses Re: File based Incremental backup v8
List pgsql-hackers
On Tue, Mar 3, 2015 at 12:36 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> Il 02/03/15 14:21, Fujii Masao ha scritto:
>> On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> Hi,
>>>
>>> I've attached an updated version of the patch.
>>
>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>> int', but argument 8 has type '__off_t'
>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>> int', but argument 8 has type '__off_t'
>> pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code
>>
>
> I'll add the an explicit cast at that two lines.
>
>> When I applied three patches and compiled the code, I got the above warnings.
>>
>> How can we get the full backup that we can use for the archive recovery, from
>> the first full backup and subsequent incremental backups? What commands should
>> we use for that, for example? It's better to document that.
>>
>
> I've sent a python PoC that supports the plain format only (not the tar one).
> I'm currently rewriting it in C (with also the tar support) and I'll send a new patch containing it ASAP.

Yeah, if special tool is required for that purpose, the patch should include it.

>> What does "1" of the heading line in backup_profile mean?
>>
>
> Nothing. It's a version number. If you think it's misleading I will remove it.

A version number of file format of backup profile? If it's required for
the validation of backup profile file as a safe-guard, it should be included
in the profile file. For example, it might be useful to check whether
pg_basebackup executable is compatible with the "source" backup that
you specify. But more info might be needed for such validation.

>> Sorry if this has been already discussed so far. Why is a backup profile file
>> necessary? Maybe it's necessary in the future, but currently seems not.
>
> It's necessary because it's the only way to detect deleted files.

Maybe I'm missing something. Seems we can detect that even without a profile.
For example, please imagine the case where the file has been deleted since
the last full backup and then the incremental backup is taken. In this case,
that deleted file exists only in the full backup. We can detect the deletion of
the file by checking both full and incremental backups.

>> We've really gotten the consensus about the current design, especially that
>> every files basically need to be read to check whether they have been modified
>> since last backup even when *no* modification happens since last backup?
>
> The real problem here is that there is currently no way to detect that a file is not changed since the last backup.
Weagreed to not use file system timestamps as they are not reliable for that purpose.
 

TBH I prefer timestamp-based approach in the first version of incremental backup
even if's less reliable than LSN-based one. I think that some users who are
using timestamp-based rsync (i.e., default mode) for the backup would be
satisfied with timestamp-based one.

> Using LSN have a significant advantage over using checksum, as we can start the full copy as soon as we found a block
whitha LSN greater than the threshold.
 
> There are two cases: 1) the file is changed, so we can assume that we detect it after reading 50% of the file, then
wesend it taking advantage of file system cache; 2) the file is not changed, so we read it without sending anything.
 
> It will end up producing an I/O comparable to a normal backup.

Yeah, it might make the situation better than today. But I'm afraid that
many users might get disappointed about that behavior of an incremental
backup after the release...

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Patch: raise default for max_wal_segments to 1GB
Next
From: Kouhei Kaigai
Date:
Subject: Re: Join push-down support for foreign tables