Re: File based Incremental backup v8 - Mailing list pgsql-hackers

From Marco Nenciarini
Subject Re: File based Incremental backup v8
Date
Msg-id 54F48371.7050107@2ndquadrant.it
Whole thread Raw
In response to Re: File based Incremental backup v8  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: File based Incremental backup v8
List pgsql-hackers
Il 02/03/15 14:21, Fujii Masao ha scritto:
> On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
>> Hi,
>>
>> I've attached an updated version of the patch.
>
> basebackup.c:1565: warning: format '%lld' expects type 'long long
> int', but argument 8 has type '__off_t'
> basebackup.c:1565: warning: format '%lld' expects type 'long long
> int', but argument 8 has type '__off_t'
> pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code
>

I'll add the an explicit cast at that two lines.

> When I applied three patches and compiled the code, I got the above warnings.
>
> How can we get the full backup that we can use for the archive recovery, from
> the first full backup and subsequent incremental backups? What commands should
> we use for that, for example? It's better to document that.
>

I've sent a python PoC that supports the plain format only (not the tar one).
I'm currently rewriting it in C (with also the tar support) and I'll send a new patch containing it ASAP.

> What does "1" of the heading line in backup_profile mean?
>

Nothing. It's a version number. If you think it's misleading I will remove it.

> Sorry if this has been already discussed so far. Why is a backup profile file
> necessary? Maybe it's necessary in the future, but currently seems not.

It's necessary because it's the only way to detect deleted files.

> Several infos like LSN, modification time, size, etc are tracked in a backup
> profile file for every backup files, but they are not used for now. If it's now
> not required, I'm inclined to remove it to simplify the code.

I've put LSN there mainly for debugging purpose, but it can also be used to check the file during pg_restorebackup
execution.The sent field is probably redundant (if sent = False and LSN is not set, we should probably simply avoid to
writea line about that file) and I'll remove it in the next patch. 

>
> We've really gotten the consensus about the current design, especially that
> every files basically need to be read to check whether they have been modified
> since last backup even when *no* modification happens since last backup?

The real problem here is that there is currently no way to detect that a file is not changed since the last backup. We
agreedto not use file system timestamps as they are not reliable for that purpose. 
Using LSN have a significant advantage over using checksum, as we can start the full copy as soon as we found a block
whitha LSN greater than the threshold. 
There are two cases: 1) the file is changed, so we can assume that we detect it after reading 50% of the file, then we
sendit taking advantage of file system cache; 2) the file is not changed, so we read it without sending anything. 
It will end up producing an I/O comparable to a normal backup.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


pgsql-hackers by date:

Previous
From: Adam Brightwell
Date:
Subject: Re: Additional role attributes && superuser review
Next
From: Alvaro Herrera
Date:
Subject: Re: Additional role attributes && superuser review