Thread: Proposal: Incremental Backup

Proposal: Incremental Backup

From
Marco Nenciarini
Date:
0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.

1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.

This way of operating has also some advantages over using rsync to take
a physical backup: It does not require the files from the previous
backup to be checksummed again, and they could even reside on some form
of long-term, not-directly-accessible storage, like a tape cartridge or
somewhere in the cloud (e.g. Amazon S3 or Amazon Glacier).

It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.

The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.

We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backup

2. Goals
=================================
The main goal of incremental backup is to reduce the size of the backup.
A secondary goal is to reduce backup time also.

3. Development plan
=================================
Our development plan proposal is articulated in four phases:

Phase 1: Add ‘PROFILE’ option to ‘BASE_BACKUP’
Phase 2: Add ‘INCREMENTAL’ option to ‘BASE_BACKUP’
Phase 3: Support of PROFILE and INCREMENTAL for pg_basebackup
Phase 4: pg_restorebackup

We are willing to get consensus over our design here before to start
implementing it.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Michael Paquier
Date:
On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> 0. Introduction:
> =================================
> This is a proposal for adding incremental backup support to streaming
> protocol and hence to pg_basebackup command.
Not sure that incremental is a right word as the existing backup
methods using WAL archives are already like that. I recall others
calling that differential backup from some previous threads. Would
that sound better?

> 1. Proposal
> =================================
> Our proposal is to introduce the concept of a backup profile.
Sounds good. Thanks for looking at that.

> The backup
> profile consists of a file with one line per file detailing tablespace,
> path, modification time, size and checksum.
> Using that file the BASE_BACKUP command can decide which file needs to
> be sent again and which is not changed. The algorithm should be very
> similar to rsync, but since our files are never bigger than 1 GB per
> file that is probably granular enough not to worry about copying parts
> of files, just whole files.
There are actually two levels of differential backups: file-level,
which is the approach you are taking, and block level. Block level
backup makes necessary a scan of all the blocks of all the relations
and take only the data from the blocks newer than the LSN given by the
BASE_BACKUP command. In the case of file-level approach, you could
already backup the relation file after finding at least one block
already modified. Btw, the size of relation files depends on the size
defined by --with-segsize when running configure. 1GB is the default
though, and the value usually used. Differential backups can reduce
the size of overall backups depending on the application, at the cost
of some CPU to analyze the relation blocks that need to be included in
the backup.

> It could also be used in 'refresh' mode, by allowing the pg_basebackup
> command to 'refresh' an old backup directory with a new backup.
I am not sure this is really helpful...

> The final piece of this architecture is a new program called
> pg_restorebackup which is able to operate on a "chain of incremental
> backups", allowing the user to build an usable PGDATA from them or
> executing maintenance operations like verify the checksums or estimate
> the final size of recovered PGDATA.
Yes, right. Taking a differential backup is not difficult, but
rebuilding a constant base backup with a full based backup and a set
of differential ones is the tricky part, but you need to be sure that
all the pieces of the puzzle are here.

> We created a wiki page with all implementation details at
> https://wiki.postgresql.org/wiki/Incremental_backup
I had a look at that, and I think that you are missing the shot in the
way differential backups should be taken. What would be necessary is
to pass a WAL position (or LSN, logical sequence number like
0/2000060) with a new clause called DIFFERENTIAL (INCREMENTAL in your
first proposal) in the BASE BACKUP command, and then have the server
report back to client all the files that contain blocks newer than the
given LSN position given for file-level backup, or the blocks newer
than the given LSN for the block-level differential backup.
Note that we would need a way to identify the type of the backup taken
in backup_label, with the LSN position sent with DIFFERENTIAL clause
of BASE_BACKUP, by adding a new field in it.

When taking a differential backup, the LSN position necessary would be
simply the value of START WAL LOCATION of the last differential or
full backup taken. This results as well in a new option for
pg_basebackup of the type --differential='0/2000060' to take directly
a differential backup.

Then, for the utility pg_restorebackup, what you would need to do is
simply to pass a list of backups to it, then validate if they can
build a consistent backup, and build it.

Btw, the file-based method would be simpler to implement, especially
for rebuilding the backups.

Regards,
-- 
Michael



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> 1. Proposal
> =================================
> Our proposal is to introduce the concept of a backup profile. The backup
> profile consists of a file with one line per file detailing tablespace,
> path, modification time, size and checksum.
> Using that file the BASE_BACKUP command can decide which file needs to
> be sent again and which is not changed. The algorithm should be very
> similar to rsync, but since our files are never bigger than 1 GB per
> file that is probably granular enough not to worry about copying parts
> of files, just whole files.

That wouldn't nearly as useful as the LSN-based approach mentioned before.

I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.

There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.

I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.



Re: Proposal: Incremental Backup

From
Robert Haas
Date:
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
>> 1. Proposal
>> =================================
>> Our proposal is to introduce the concept of a backup profile. The backup
>> profile consists of a file with one line per file detailing tablespace,
>> path, modification time, size and checksum.
>> Using that file the BASE_BACKUP command can decide which file needs to
>> be sent again and which is not changed. The algorithm should be very
>> similar to rsync, but since our files are never bigger than 1 GB per
>> file that is probably granular enough not to worry about copying parts
>> of files, just whole files.
>
> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>
> I've had my share of rsyncing live databases (when resizing
> filesystems, not for backup, but the anecdotal evidence applies
> anyhow) and with moderately write-heavy databases, even if you only
> modify a tiny portion of the records, you end up modifying a huge
> portion of the segments, because the free space choice is random.
>
> There have been patches going around to change the random nature of
> that choice, but none are very likely to make a huge difference for
> this application. In essence, file-level comparisons get you only a
> mild speed-up, and are not worth the effort.
>
> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
> the I/O of inspecting the LSN of entire segments (necessary
> optimization for huge multi-TB databases) and backups only the
> portions modified when segments do contain changes, so it's the best
> of both worlds. Any partial implementation would either require lots
> of I/O (LSN only) or save very little (file only) unless it's an
> almost read-only database.

I agree with much of that.  However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations.  I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Fri, Jul 25, 2014 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> 1. Proposal
>>> =================================
>>> Our proposal is to introduce the concept of a backup profile. The backup
>>> profile consists of a file with one line per file detailing tablespace,
>>> path, modification time, size and checksum.
>>> Using that file the BASE_BACKUP command can decide which file needs to
>>> be sent again and which is not changed. The algorithm should be very
>>> similar to rsync, but since our files are never bigger than 1 GB per
>>> file that is probably granular enough not to worry about copying parts
>>> of files, just whole files.
>>
>> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>>
>> I've had my share of rsyncing live databases (when resizing
>> filesystems, not for backup, but the anecdotal evidence applies
>> anyhow) and with moderately write-heavy databases, even if you only
>> modify a tiny portion of the records, you end up modifying a huge
>> portion of the segments, because the free space choice is random.
>>
>> There have been patches going around to change the random nature of
>> that choice, but none are very likely to make a huge difference for
>> this application. In essence, file-level comparisons get you only a
>> mild speed-up, and are not worth the effort.
>>
>> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
>> the I/O of inspecting the LSN of entire segments (necessary
>> optimization for huge multi-TB databases) and backups only the
>> portions modified when segments do contain changes, so it's the best
>> of both worlds. Any partial implementation would either require lots
>> of I/O (LSN only) or save very little (file only) unless it's an
>> almost read-only database.
>
> I agree with much of that.  However, I'd question whether we can
> really seriously expect to rely on file modification times for
> critical data-integrity operations.  I wouldn't like it if somebody
> ran ntpdate to fix the time while the base backup was running, and it
> set the time backward, and the next differential backup consequently
> omitted some blocks that had been modified during the base backup.

I was thinking the same. But that timestamp could be saved on the file
itself, or some other catalog, like a "trusted metadata" implemented
by pg itself, and it could be an LSN range instead of a timestamp
really.



Re: Proposal: Incremental Backup

From
Josh Berkus
Date:
On 07/25/2014 11:49 AM, Claudio Freire wrote:
>> I agree with much of that.  However, I'd question whether we can
>> > really seriously expect to rely on file modification times for
>> > critical data-integrity operations.  I wouldn't like it if somebody
>> > ran ntpdate to fix the time while the base backup was running, and it
>> > set the time backward, and the next differential backup consequently
>> > omitted some blocks that had been modified during the base backup.
> I was thinking the same. But that timestamp could be saved on the file
> itself, or some other catalog, like a "trusted metadata" implemented
> by pg itself, and it could be an LSN range instead of a timestamp
> really.

What about requiring checksums to be on instead, and checking the
file-level checksums?   Hmmm, wait, do we have file-level checksums?  Or
just page-level?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Fri, Jul 25, 2014 at 7:38 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 07/25/2014 11:49 AM, Claudio Freire wrote:
>>> I agree with much of that.  However, I'd question whether we can
>>> > really seriously expect to rely on file modification times for
>>> > critical data-integrity operations.  I wouldn't like it if somebody
>>> > ran ntpdate to fix the time while the base backup was running, and it
>>> > set the time backward, and the next differential backup consequently
>>> > omitted some blocks that had been modified during the base backup.
>> I was thinking the same. But that timestamp could be saved on the file
>> itself, or some other catalog, like a "trusted metadata" implemented
>> by pg itself, and it could be an LSN range instead of a timestamp
>> really.
>
> What about requiring checksums to be on instead, and checking the
> file-level checksums?   Hmmm, wait, do we have file-level checksums?  Or
> just page-level?

It would be very computationally expensive to have up-to-date
file-level checksums, so I highly doubt it.



Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 25/07/14 16:15, Michael Paquier ha scritto:
> On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
>> 0. Introduction:
>> =================================
>> This is a proposal for adding incremental backup support to streaming
>> protocol and hence to pg_basebackup command.
> Not sure that incremental is a right word as the existing backup
> methods using WAL archives are already like that. I recall others
> calling that differential backup from some previous threads. Would
> that sound better?
>

"differential backup" is widely used to refer to a backup that is always
based on a "full backup". An "incremental backup" can be based either on
a "full backup" or on a previous "incremental backup". We picked that
name to emphasize this property.

>> 1. Proposal
>> =================================
>> Our proposal is to introduce the concept of a backup profile.
> Sounds good. Thanks for looking at that.
>
>> The backup
>> profile consists of a file with one line per file detailing tablespace,
>> path, modification time, size and checksum.
>> Using that file the BASE_BACKUP command can decide which file needs to
>> be sent again and which is not changed. The algorithm should be very
>> similar to rsync, but since our files are never bigger than 1 GB per
>> file that is probably granular enough not to worry about copying parts
>> of files, just whole files.
> There are actually two levels of differential backups: file-level,
> which is the approach you are taking, and block level. Block level
> backup makes necessary a scan of all the blocks of all the relations
> and take only the data from the blocks newer than the LSN given by the
> BASE_BACKUP command. In the case of file-level approach, you could
> already backup the relation file after finding at least one block
> already modified.

I like the idea of shortcutting the checksum when you find a block with
a LSN newer than the previous backup START WAL LOCATION, however I see
it as a further optimization. In any case, it is worth storing the
backup start LSN in the header section of the backup_profile together
with other useful information about the backup starting position.

As a first step we would have a simple and robust method to produce a
file-level incremental backup.

> Btw, the size of relation files depends on the size
> defined by --with-segsize when running configure. 1GB is the default
> though, and the value usually used. Differential backups can reduce
> the size of overall backups depending on the application, at the cost
> of some CPU to analyze the relation blocks that need to be included in
> the backup.

We tested the idea on several multi-terabyte installations using a
custom deduplication script which follows this approach. The result is
that it can reduce the backup size of more than 50%. Also most of
databases in the range 50GB - 1TB can take a big advantage of it.

>
>> It could also be used in 'refresh' mode, by allowing the pg_basebackup
>> command to 'refresh' an old backup directory with a new backup.
> I am not sure this is really helpful...

Could you please elaborate the last sentence?

>
>> The final piece of this architecture is a new program called
>> pg_restorebackup which is able to operate on a "chain of incremental
>> backups", allowing the user to build an usable PGDATA from them or
>> executing maintenance operations like verify the checksums or estimate
>> the final size of recovered PGDATA.
> Yes, right. Taking a differential backup is not difficult, but
> rebuilding a constant base backup with a full based backup and a set
> of differential ones is the tricky part, but you need to be sure that
> all the pieces of the puzzle are here.

If we limit it to be file-based, the recover procedure is conceptually
simple. Read every involved manifest from the start and take the latest
available version of any file (or mark it for deletion, if the last time
it is named is in a backup_exceptions file). Keeping the algorithm as
simple as possible is in our opinion the best way to go.

>
>> We created a wiki page with all implementation details at
>> https://wiki.postgresql.org/wiki/Incremental_backup
> I had a look at that, and I think that you are missing the shot in the
> way differential backups should be taken. What would be necessary is
> to pass a WAL position (or LSN, logical sequence number like
> 0/2000060) with a new clause called DIFFERENTIAL (INCREMENTAL in your
> first proposal) in the BASE BACKUP command, and then have the server
> report back to client all the files that contain blocks newer than the
> given LSN position given for file-level backup, or the blocks newer
> than the given LSN for the block-level differential backup.

In our proposal a file is skipped only, and only if it has the same
size, the same mtime and *the same checksum* of the original file. We
intentionally want to keep it simple, easily supporting also files that
are stored in $PGDATA but don't follow any format known by Postgres.
However, even with more complex algorithms, all the required information
should be stored in the header part of the backup_profile file.

> Note that we would need a way to identify the type of the backup taken
> in backup_label, with the LSN position sent with DIFFERENTIAL clause
> of BASE_BACKUP, by adding a new field in it.

Good point, It has to be definitely reported in the backup_label file.

>
> When taking a differential backup, the LSN position necessary would be
> simply the value of START WAL LOCATION of the last differential or
> full backup taken. This results as well in a new option for
> pg_basebackup of the type --differential='0/2000060' to take directly
> a differential backup.

It's possible to use this approach, but I feel that relying on checksums
is more robust. In any case I'd want to have a file with all the
checksums to be able to validate it later.

>
> Then, for the utility pg_restorebackup, what you would need to do is
> simply to pass a list of backups to it, then validate if they can
> build a consistent backup, and build it.
>
> Btw, the file-based method would be simpler to implement, especially
> for rebuilding the backups.
>
> Regards,
>

Exactly. This is the bare minimum. More options can be added later.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 25/07/14 20:21, Claudio Freire ha scritto:
> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
>> 1. Proposal
>> =================================
>> Our proposal is to introduce the concept of a backup profile. The backup
>> profile consists of a file with one line per file detailing tablespace,
>> path, modification time, size and checksum.
>> Using that file the BASE_BACKUP command can decide which file needs to
>> be sent again and which is not changed. The algorithm should be very
>> similar to rsync, but since our files are never bigger than 1 GB per
>> file that is probably granular enough not to worry about copying parts
>> of files, just whole files.
>
> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>
> I've had my share of rsyncing live databases (when resizing
> filesystems, not for backup, but the anecdotal evidence applies
> anyhow) and with moderately write-heavy databases, even if you only
> modify a tiny portion of the records, you end up modifying a huge
> portion of the segments, because the free space choice is random.
>
> There have been patches going around to change the random nature of
> that choice, but none are very likely to make a huge difference for
> this application. In essence, file-level comparisons get you only a
> mild speed-up, and are not worth the effort.
>
> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
> the I/O of inspecting the LSN of entire segments (necessary
> optimization for huge multi-TB databases) and backups only the
> portions modified when segments do contain changes, so it's the best
> of both worlds. Any partial implementation would either require lots
> of I/O (LSN only) or save very little (file only) unless it's an
> almost read-only database.
>

From my experience, if a database is big enough and there is any kind of
historical data in the database, the "file only" approach works well.
Moreover it has the advantage of being simple and easily verifiable.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Jul 29, 2014 at 1:24 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
>> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> 1. Proposal
>>> =================================
>>> Our proposal is to introduce the concept of a backup profile. The backup
>>> profile consists of a file with one line per file detailing tablespace,
>>> path, modification time, size and checksum.
>>> Using that file the BASE_BACKUP command can decide which file needs to
>>> be sent again and which is not changed. The algorithm should be very
>>> similar to rsync, but since our files are never bigger than 1 GB per
>>> file that is probably granular enough not to worry about copying parts
>>> of files, just whole files.
>>
>> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>>
>> I've had my share of rsyncing live databases (when resizing
>> filesystems, not for backup, but the anecdotal evidence applies
>> anyhow) and with moderately write-heavy databases, even if you only
>> modify a tiny portion of the records, you end up modifying a huge
>> portion of the segments, because the free space choice is random.
>>
>> There have been patches going around to change the random nature of
>> that choice, but none are very likely to make a huge difference for
>> this application. In essence, file-level comparisons get you only a
>> mild speed-up, and are not worth the effort.
>>
>> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
>> the I/O of inspecting the LSN of entire segments (necessary
>> optimization for huge multi-TB databases) and backups only the
>> portions modified when segments do contain changes, so it's the best
>> of both worlds. Any partial implementation would either require lots
>> of I/O (LSN only) or save very little (file only) unless it's an
>> almost read-only database.
>>
>
> From my experience, if a database is big enough and there is any kind of
> historical data in the database, the "file only" approach works well.
> Moreover it has the advantage of being simple and easily verifiable.

I don't see how that would be true if it's not full of read-only or
append-only tables.

Furthermore, even in that case, you need to have the database locked
while performing the file-level backup, and computing all the
checksums means processing the whole thing. That's a huge amount of
time to be locked for multi-TB databases, so how is that good enough?



Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 25/07/14 20:44, Robert Haas ha scritto:
> On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> 1. Proposal
>>> =================================
>>> Our proposal is to introduce the concept of a backup profile. The backup
>>> profile consists of a file with one line per file detailing tablespace,
>>> path, modification time, size and checksum.
>>> Using that file the BASE_BACKUP command can decide which file needs to
>>> be sent again and which is not changed. The algorithm should be very
>>> similar to rsync, but since our files are never bigger than 1 GB per
>>> file that is probably granular enough not to worry about copying parts
>>> of files, just whole files.
>>
>> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>>
>> I've had my share of rsyncing live databases (when resizing
>> filesystems, not for backup, but the anecdotal evidence applies
>> anyhow) and with moderately write-heavy databases, even if you only
>> modify a tiny portion of the records, you end up modifying a huge
>> portion of the segments, because the free space choice is random.
>>
>> There have been patches going around to change the random nature of
>> that choice, but none are very likely to make a huge difference for
>> this application. In essence, file-level comparisons get you only a
>> mild speed-up, and are not worth the effort.
>>
>> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
>> the I/O of inspecting the LSN of entire segments (necessary
>> optimization for huge multi-TB databases) and backups only the
>> portions modified when segments do contain changes, so it's the best
>> of both worlds. Any partial implementation would either require lots
>> of I/O (LSN only) or save very little (file only) unless it's an
>> almost read-only database.
>
> I agree with much of that.  However, I'd question whether we can
> really seriously expect to rely on file modification times for
> critical data-integrity operations.  I wouldn't like it if somebody
> ran ntpdate to fix the time while the base backup was running, and it
> set the time backward, and the next differential backup consequently
> omitted some blocks that had been modified during the base backup.
>

Our proposal doesn't rely on file modification times for data integrity.

We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.

In "SMART MODE" we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the "SMART MODE" isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Michael Paquier
Date:
On Wed, Jul 30, 2014 at 1:11 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> "differential backup" is widely used to refer to a backup that is always
> based on a "full backup". An "incremental backup" can be based either on
> a "full backup" or on a previous "incremental backup". We picked that
> name to emphasize this property.

You can refer to this email:
http://www.postgresql.org/message-id/CABUevExZ-2NH6jxB5sjs_dsS7qbmoF0NOYpEEyayBKbUfKPbqw@mail.gmail.com

> As a first step we would have a simple and robust method to produce a
> file-level incremental backup.
An approach using Postgres internals, which we are sure we can rely
on, is more robust. A LSN is similar to a timestamp in pg internals as
it refers to the point in time where a block was lastly modified.

>>> It could also be used in 'refresh' mode, by allowing the pg_basebackup
>>> command to 'refresh' an old backup directory with a new backup.
>> I am not sure this is really helpful...
>
> Could you please elaborate the last sentence?
This overlaps with the features you are proposing with
pg_restorebackup, where a backup is rebuilt. Why implementing two
interfaces for the same things?
-- 
Michael



Re: Proposal: Incremental Backup

From
desmodemone
Date:



2014-07-29 18:35 GMT+02:00 Marco Nenciarini <marco.nenciarini@2ndquadrant.it>:
Il 25/07/14 20:44, Robert Haas ha scritto:
> On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> 1. Proposal
>>> =================================
>>> Our proposal is to introduce the concept of a backup profile. The backup
>>> profile consists of a file with one line per file detailing tablespace,
>>> path, modification time, size and checksum.
>>> Using that file the BASE_BACKUP command can decide which file needs to
>>> be sent again and which is not changed. The algorithm should be very
>>> similar to rsync, but since our files are never bigger than 1 GB per
>>> file that is probably granular enough not to worry about copying parts
>>> of files, just whole files.
>>
>> That wouldn't nearly as useful as the LSN-based approach mentioned before.
>>
>> I've had my share of rsyncing live databases (when resizing
>> filesystems, not for backup, but the anecdotal evidence applies
>> anyhow) and with moderately write-heavy databases, even if you only
>> modify a tiny portion of the records, you end up modifying a huge
>> portion of the segments, because the free space choice is random.
>>
>> There have been patches going around to change the random nature of
>> that choice, but none are very likely to make a huge difference for
>> this application. In essence, file-level comparisons get you only a
>> mild speed-up, and are not worth the effort.
>>
>> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
>> the I/O of inspecting the LSN of entire segments (necessary
>> optimization for huge multi-TB databases) and backups only the
>> portions modified when segments do contain changes, so it's the best
>> of both worlds. Any partial implementation would either require lots
>> of I/O (LSN only) or save very little (file only) unless it's an
>> almost read-only database.
>
> I agree with much of that.  However, I'd question whether we can
> really seriously expect to rely on file modification times for
> critical data-integrity operations.  I wouldn't like it if somebody
> ran ntpdate to fix the time while the base backup was running, and it
> set the time backward, and the next differential backup consequently
> omitted some blocks that had been modified during the base backup.
>

Our proposal doesn't rely on file modification times for data integrity.

We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.

In "SMART MODE" we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the "SMART MODE" isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it



Hello,
            I think it's very useful an incremental/differential backup method, by the way
the method has two drawbacks:
1)  In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are "random". If some tables are static, probably they are lookup tables or something like a registry, and  normally these  tables are small .
2)  every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part.

In my opinion to solve these problems we need a different implementation of incremental backup.
I will try to show my idea about it.

I think we need a bitmap map in memory to track the changed "chunks" of the file/s/table [ for "chunk" I mean an X number of tracked pages , to divide the every  tracked files in "chunks" ], so we could send only the changed blocks  from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple.

So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are  125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by   1 and 0 . This is a very simple idea, but it shows that the map  does not need too much memory if we track groups of blocks i.e. "chunk", obviously the problem is more complex, and probably there are better and more robust solutions.
Probably we need  more space for the header of map to track the informations about file and the last backup and so on.

I think the map must be updated by the bgwriter , i.e. when it flushes the dirty buffers, fortunately  we don't  need this map for consistence of database, so we could create and manage it in memory to limit the impact on performance.
The drawback is that If the db crashes or someone closes it , the next incremental backup will be full , we could think to flush the map to disk if the PostgreSQL will receive a signal of closing process or something similar.



In this way we obtain :
1) we read only small part of a database ( the probability of a changed chunk are less the the changed of the whole file )
2) we do not need to calculate the checksum, saving cpu
3) we save i/o in reading and writing ( we will send only the changed block from last incremental backup )
4) we save network
5) we save time during backup. if we read and write less data, we reduce the time to do an incremental backup.
6) I think the bitmap map in memory will not impact too much on the performance of the bgwriter.

What do you think about?

Kind Regards

Mat

Re: Proposal: Incremental Backup

From
Robert Haas
Date:
On Tue, Jul 29, 2014 at 12:35 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
>> I agree with much of that.  However, I'd question whether we can
>> really seriously expect to rely on file modification times for
>> critical data-integrity operations.  I wouldn't like it if somebody
>> ran ntpdate to fix the time while the base backup was running, and it
>> set the time backward, and the next differential backup consequently
>> omitted some blocks that had been modified during the base backup.
>
> Our proposal doesn't rely on file modification times for data integrity.

Good.

> We are using the file mtime only as a fast indication that the file has
> changed, and transfer it again without performing the checksum.
> If timestamp and size match we rely on *checksums* to decide if it has
> to be sent.

So an incremental backup reads every block in the database and
transfers only those that have changed?  (BTW, I'm just asking.
That's OK with me for a first version; we can make improve it, shall
we say, incrementally.)

Why checksums (which have an arbitrarily-small chance of indicating a
match that doesn't really exist) rather than LSNs (which have no
chance of making that mistake)?

> In "SMART MODE" we would use the file mtime to skip the checksum check
> in some cases, but it wouldn't be the default operation mode and it will
> have all the necessary warnings attached. However the "SMART MODE" isn't
> a core part of our proposal, and can be delayed until we agree on the
> safest way to bring it to the end user.

That's not a mode I'd feel comfortable calling "smart".  More like
"roulette mode".

IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.  Then we can send a precisely
accurate backup without relying on either modification times or
reading the full database.  If Heikki's patch to standardize the way
this kind of information is represented in WAL gets committed, this
should get a lot easier to implement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Incremental Backup

From
Amit Kapila
Date:
On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> IMV, the way to eventually make this efficient is to have a background
> process that reads the WAL and figures out which data blocks have been
> modified, and tracks that someplace.

Nice idea, however I think to make this happen we need to ensure
that WAL doesn't get deleted/overwritten before this process reads
it (may be by using some existing param or mechanism) and 
wal_level has to be archive or more.

One more thing, what will happen for unlogged tables with such a 
mechanism?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Incremental Backup

From
Amit Kapila
Date:
On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodemone@gmail.com> wrote:
> Hello,
>             I think it's very useful an incremental/differential backup method, by the way
> the method has two drawbacks:
> 1)  In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are "random". If some tables are static, probably they are lookup tables or something like a registry, and  normally these  tables are small .
> 2)  every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part.
>
> In my opinion to solve these problems we need a different implementation of incremental backup.
> I will try to show my idea about it.
>
> I think we need a bitmap map in memory to track the changed "chunks" of the file/s/table [ for "chunk" I mean an X number of tracked pages , to divide the every  tracked files in "chunks" ], so we could send only the changed blocks  from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple.
>
> So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are  125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by   1 and 0 . This is a very simple idea, but it shows that the map  does not need too much memory if we track groups of blocks i.e. "chunk", obviously the problem is more complex, and probably there are better and more robust solutions.
> Probably we need  more space for the header of map to track the informations about file and the last backup and so on.
>
> I think the map must be updated by the bgwriter , i.e. when it flushes the dirty buffers,

Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers.  Also there are some writes which are
done outside shared buffers, you need to track those separately.

Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.  

> fortunately  we don't  need this map for consistence of database, so we could create and manage it in memory to limit the impact on performance.
> The drawback is that If the db crashes or someone closes it , the next incremental backup will be full , we could think to flush the map to disk if the PostgreSQL will receive a signal of closing process or something similar.
>
>
>
> In this way we obtain :
> 1) we read only small part of a database ( the probability of a changed chunk are less the the changed of the whole file )
> 2) we do not need to calculate the checksum, saving cpu
> 3) we save i/o in reading and writing ( we will send only the changed block from last incremental backup )
> 4) we save network
> 5) we save time during backup. if we read and write less data, we reduce the time to do an incremental backup.
> 6) I think the bitmap map in memory will not impact too much on the performance of the bgwriter.
>
> What do you think about?

I think with this method has 3 drawbacks compare to method
proposed
a.  either enable checksum or wal_log_hints, so it will incur extra
     I/O if you enable wal_log_hints
b.  backends also need to update the map which though a small
     cost, but still ...
c.  map is not crash safe, due to which sometimes full back up
     is needed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Incremental Backup

From
Michael Paquier
Date:
On Thu, Jul 31, 2014 at 3:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> One more thing, what will happen for unlogged tables with such a
> mechanism?
I imagine that you can safely bypass them as they are not accessible
during recovery and will start with empty relation files once recovery
ends. The same applies to temporary relations. Also this bgworker will
need access to the catalogs to look at the relation relkind.
Regards,
-- 
Michael



Re: Proposal: Incremental Backup

From
desmodemone
Date:



2014-07-31 8:26 GMT+02:00 Amit Kapila <amit.kapila16@gmail.com>:
On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodemone@gmail.com> wrote:
> Hello,
>             I think it's very useful an incremental/differential backup method, by the way
> the method has two drawbacks:
> 1)  In a database normally, even if the percent of modify rows is small compared to total rows, the probability to change only some files /tables is small, because the rows are normally not ordered inside a tables and the update are "random". If some tables are static, probably they are lookup tables or something like a registry, and  normally these  tables are small .
> 2)  every time a file changed require every time to read all file. So if the point A is true, probably you are reading a large part of the databases and then send that part , instead of sending a small part.
>
> In my opinion to solve these problems we need a different implementation of incremental backup.
> I will try to show my idea about it.
>
> I think we need a bitmap map in memory to track the changed "chunks" of the file/s/table [ for "chunk" I mean an X number of tracked pages , to divide the every  tracked files in "chunks" ], so we could send only the changed blocks  from last incremental backup ( that could be a full for incremental backup ).The map could have one submaps for every tracked files, so it's more simple.
>
> So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of 8 block is only an example]  , If  we use one map of 1Mbit ( 1Mbit are  125KB of memory ) we could track a table with a total size of 64Gb, probably we could use a compression algorithm because the map is done by   1 and 0 . This is a very simple idea, but it shows that the map  does not need too much memory if we track groups of blocks i.e. "chunk", obviously the problem is more complex, and probably there are better and more robust solutions.
> Probably we need  more space for the header of map to track the informations about file and the last backup and so on.
>
> I think the map must be updated by the bgwriter , i.e. when it flushes the dirty buffers,

Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers.  Also there are some writes which are
done outside shared buffers, you need to track those separately.

Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.  

> fortunately  we don't  need this map for consistence of database, so we could create and manage it in memory to limit the impact on performance.
> The drawback is that If the db crashes or someone closes it , the next incremental backup will be full , we could think to flush the map to disk if the PostgreSQL will receive a signal of closing process or something similar.
>
>
>
> In this way we obtain :
> 1) we read only small part of a database ( the probability of a changed chunk are less the the changed of the whole file )
> 2) we do not need to calculate the checksum, saving cpu
> 3) we save i/o in reading and writing ( we will send only the changed block from last incremental backup )
> 4) we save network
> 5) we save time during backup. if we read and write less data, we reduce the time to do an incremental backup.
> 6) I think the bitmap map in memory will not impact too much on the performance of the bgwriter.
>
> What do you think about?

I think with this method has 3 drawbacks compare to method
proposed
a.  either enable checksum or wal_log_hints, so it will incur extra
     I/O if you enable wal_log_hints
b.  backends also need to update the map which though a small
     cost, but still ...
c.  map is not crash safe, due to which sometimes full back up
     is needed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Hi Amit, thank you for your comments .
However , about drawbacks:
a) It's not clear to me why the method needs checksum enable, I mean, if the bgwriter or another process flushes a dirty buffer, it's only have to signal in the map that the blocks are changed with an update of the value from 0 to 1.They not need to verify the checksum of the block, we could assume that when a dirty buffers is flushed, the block is changed [ or better in my idea, the chunk of N blocks ].
We could think an advanced setting that verify the checksum, but I think will be heavier.
b) yes the backends need to update the map, but it's in memory, and as I show, could be very small if we you chunk of blocks.If we not compress the map, I not think could be a bottleneck.
c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush  on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: "if your db will crash, you need a fullbackup ", and probably it's better to do it, almost the dba will verify if something go wrong during the crash, no?corrupted block or something else.




Kind Regards


Mat

Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Thu, Jul 31, 2014 at 11:30:52AM +0530, Amit Kapila wrote:
> On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > IMV, the way to eventually make this efficient is to have a background
> > process that reads the WAL and figures out which data blocks have been
> > modified, and tracks that someplace.
> 
> Nice idea, however I think to make this happen we need to ensure
> that WAL doesn't get deleted/overwritten before this process reads
> it (may be by using some existing param or mechanism) and 
> wal_level has to be archive or more.

Well, you probably are going to have all the WAL files available because
you have not taken an incremental backup yet, and therefore you would
have no PITR backup at all.  Once the incremental backup is done, you
can delete the old WAL files if you don't need fine-grained restore
points.

Robert also suggested reading the block numbers from the WAL as they are
created and not needing them at incremental backup time.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Robert Haas
Date:
On Thu, Jul 31, 2014 at 2:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> IMV, the way to eventually make this efficient is to have a background
>> process that reads the WAL and figures out which data blocks have been
>> modified, and tracks that someplace.
>
> Nice idea, however I think to make this happen we need to ensure
> that WAL doesn't get deleted/overwritten before this process reads
> it (may be by using some existing param or mechanism) and
> wal_level has to be archive or more.

That should be a problem; logical decoding added a mechanism for
retaining WAL until decoding is done with it, and if it needs to be
extended a bit further, so be it.

> One more thing, what will happen for unlogged tables with such a
> mechanism?

As Michael Paquier points out, it doesn't matter, because that data
will be gone anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Thu, Jul 31, 2014 at 5:26 AM, desmodemone <desmodemone@gmail.com> wrote:
> b) yes the backends need to update the map, but it's in memory, and as I
> show, could be very small if we you chunk of blocks.If we not compress the
> map, I not think could be a bottleneck.

If it's in memory, it's not crash-safe. For something aimed at
backups, I think crash safety is a requirement. So it's at least one
extra I/O per commit, maybe less if many can be coalesced at
checkpoints, but I wouldn't count on it too much, because worst cases
are easy to come by (sparse enough updates).

I think this could be pegged on WAL replay / checkpoint stuff alone,
so it would be very asynchronous, but not free.



Re: Proposal: Incremental Backup

From
Amit Kapila
Date:
On Thu, Jul 31, 2014 at 1:56 PM, desmodemone <desmodemone@gmail.com> wrote:
>
> Hi Amit, thank you for your comments .
> However , about drawbacks:
> a) It's not clear to me why the method needs checksum enable, I mean, if the bgwriter or another process flushes a dirty buffer, it's only have to signal in the map that the blocks are changed with an update of the value from 0 to 1.They not need to verify the checksum of the block, we could assume that when a dirty buffers is flushed, the block is changed [ or better in my idea, the chunk of N blocks ].
> We could think an advanced setting that verify the checksum, but I think will be heavier.

I was thinking of enabling it for hint bit updates, if any operation
changes the page due to hint bit, then it will not mark the buffer
dirty unless wal_log_hints or checksum is enabled.  Now I think
if we don't want to track page changes due to hint bit updates, then
this will not be required.


> b) yes the backends need to update the map, but it's in memory, and as I show, could be very small if we you chunk of blocks.If we not compress the map, I not think could be a bottleneck.

This map has to reside in shared memory, so how will you
estimate the size of this map during startup and even if you
have some way to do that, I think still you need to detail out
the idea how your chunk scheme will work incase multiple
backends are trying to flush pages which are part of same chunk.

Also as I mentioned previously there are some operations which
are done without use of shared buffers, so you need to think
how to track the changes done by those operations.

> c) the map is not crash safe by design, because it needs only for incremental backup to track what blocks needs to be backuped, not for consistency or recovery of the whole cluster, so it's not an heavy cost for the whole cluster to maintain it. we could think an option (but it's heavy) to write it at every flush  on file to have crash-safe map, but I not think it's so usefull . I think it's acceptable, and probably it's better to force that, to say: "if your db will crash, you need a fullbackup ",

I am not sure if your this assumption is right/acceptable, how can
we say that in such a case users will be okay to have a fullbackup?
In general, taking fullbackup is very heavy operation and we should
try to avoid such a situation.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> c) the map is not crash safe by design, because it needs only for
>> incremental backup to track what blocks needs to be backuped, not for
>> consistency or recovery of the whole cluster, so it's not an heavy cost for
>> the whole cluster to maintain it. we could think an option (but it's heavy)
>> to write it at every flush  on file to have crash-safe map, but I not think
>> it's so usefull . I think it's acceptable, and probably it's better to force
>> that, to say: "if your db will crash, you need a fullbackup ",
>
> I am not sure if your this assumption is right/acceptable, how can
> we say that in such a case users will be okay to have a fullbackup?
> In general, taking fullbackup is very heavy operation and we should
> try to avoid such a situation.


Besides, the one taking the backup (ie: script) may not be aware of
the need to take a full one.

It's a bad design to allow broken backups at all, IMNSHO.



Re: Proposal: Incremental Backup

From
desmodemone
Date:
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">2014-08-01 18:20 GMT+02:00 Claudio
Freire<span dir="ltr"><<a href="mailto:klaussfreire@gmail.com"
target="_blank">klaussfreire@gmail.com</a>></span>:<br/><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div class="">On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <<a
href="mailto:amit.kapila16@gmail.com">amit.kapila16@gmail.com</a>>wrote:<br /> >> c) the map is not crash safe
bydesign, because it needs only for<br /> >> incremental backup to track what blocks needs to be backuped, not
for<br/> >> consistency or recovery of the whole cluster, so it's not an heavy cost for<br /> >> the whole
clusterto maintain it. we could think an option (but it's heavy)<br /> >> to write it at every flush  on file to
havecrash-safe map, but I not think<br /> >> it's so usefull . I think it's acceptable, and probably it's better
toforce<br /> >> that, to say: "if your db will crash, you need a fullbackup ",<br /> ><br /> > I am not
sureif your this assumption is right/acceptable, how can<br /> > we say that in such a case users will be okay to
havea fullbackup?<br /> > In general, taking fullbackup is very heavy operation and we should<br /> > try to
avoidsuch a situation.<br /><br /><br /></div>Besides, the one taking the backup (ie: script) may not be aware of<br />
theneed to take a full one.<br /><br /> It's a bad design to allow broken backups at all, IMNSHO.<br
/></blockquote></div><br/></div><div class="gmail_extra">Hi Claudio, <br /></div><div
class="gmail_extra">                thanks for your observation<br /></div><div class="gmail_extra">First: the case
it'safter a crash of a database, and it's not something happens every day or every week. It's something that happens in
rareconditions, or almost my experience is so. If it happens very often probably there are other problems.<br
/></div><divclass="gmail_extra">Second: to avoid the problem to know if the db needed to have a full backup to rebuild
themap we could think to write in the map header the backup reference (with an id and LSN reference for example ) so 
ifthe someone/something try to do an incremental backup after a crash, the map header will not have noone full backup
listed[because it will be empty] , and automaticcaly switch to a full one. I think after a crash it's a good practice
todo a full backup, to see if there are some problems on files or on filesystems, but if I am wrong I am happy to know
:). <br /><br /></div><div class="gmail_extra">Remember that  I propose a map in ram to reduce the impact on
performances,but we could create an option to leave the choose to the user, if you want a crash safe map, at every
flushwill be updated also a map file , if not, the map will be in ram.<br /><br /></div><div class="gmail_extra">Kind
Regards<br/><br /><br /></div><div class="gmail_extra">Mat<br /></div></div> 

Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Fri, Aug 1, 2014 at 1:43 PM, desmodemone <desmodemone@gmail.com> wrote:
>
>
>
> 2014-08-01 18:20 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
>
>> On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> >> c) the map is not crash safe by design, because it needs only for
>> >> incremental backup to track what blocks needs to be backuped, not for
>> >> consistency or recovery of the whole cluster, so it's not an heavy cost
>> >> for
>> >> the whole cluster to maintain it. we could think an option (but it's
>> >> heavy)
>> >> to write it at every flush  on file to have crash-safe map, but I not
>> >> think
>> >> it's so usefull . I think it's acceptable, and probably it's better to
>> >> force
>> >> that, to say: "if your db will crash, you need a fullbackup ",
>> >
>> > I am not sure if your this assumption is right/acceptable, how can
>> > we say that in such a case users will be okay to have a fullbackup?
>> > In general, taking fullbackup is very heavy operation and we should
>> > try to avoid such a situation.
>>
>>
>> Besides, the one taking the backup (ie: script) may not be aware of
>> the need to take a full one.
>>
>> It's a bad design to allow broken backups at all, IMNSHO.
>
>
> Hi Claudio,
>                  thanks for your observation
> First: the case it's after a crash of a database, and it's not something
> happens every day or every week. It's something that happens in rare
> conditions, or almost my experience is so. If it happens very often probably
> there are other problems.

Not so much. In this case, the software design isn't software-crash
safe, it's not that it's not hardware-crash safe.

What I mean, is that an in-memory bitmap will also be out of sync if
you kill -9 (or if one of the backends is killed by the OOM), or if it
runs out of disk space too.

Normally, a simple restart fixes it because pg will do crash recovery
just fine, but now the bitmap is out of sync, and further backups are
broken. It's not a situation I want to face unless there's a huge
reason to go for such design.

If you make it so that the commit includes flipping the bitmap, it can
be done cleverly enough to avoid too much overhead (though it will
have some), and you now have it so that any to-be-touched block is now
part of the backup. You just apply all the bitmap changes in batch
after a checkpoint, before syncing to disk, and before erasing the WAL
segments. Simple, relatively efficient, and far more robust than an
in-memory thing.

Still, it *can* double checkpoint I/O on the worst case, and it's not
an unfathomable case either.

> Second: to avoid the problem to know if the db needed to have a full backup
> to rebuild the map we could think to write in the map header the backup
> reference (with an id and LSN reference for example ) so  if the
> someone/something try to do an incremental backup after a crash, the map
> header will not have noone full backup listed [because it will be empty] ,
> and automaticcaly switch to a full one. I think after a crash it's a good
> practice to do a full backup, to see if there are some problems on files or
> on filesystems, but if I am wrong I am happy to know :) .

After a crash I do not do a backup, I do a verification of the data
(VACUUM and some data consistency checks usually), lest you have a
useless backup. The backup goes after that.

But, I'm not DBA guru.

> Remember that I propose a map in ram to reduce the impact on performances,
> but we could create an option to leave the choose to the user, if you want a
> crash safe map, at every flush will be updated also a map file , if not, the
> map will be in ram.

I think the performance impact of a WAL-linked map isn't so big as to
prefer the possibility of broken backups. I wouldn't even allow it.

It's not free, making it crash safe, but it's not that expensive
either. If you want to support incremental backups, you really really
need to make sure those backups are correct and usable, and IMV
anything short of full crash safety will be too fragile for that
purpose. I don't want to be in a position of needing the backup and
finding out it's inconsistent after the fact, and I don't want to
encourage people to set themselves up for that by adding that "faster
but unsafe backups" flag.

I'd do it either safe, or not at all.



Re: Proposal: Incremental Backup

From
Gabriele Bartolini
Date:
Hi guys,
 sorry if I jump in the middle of the conversation. I have been
reading with much interest all that's been said above. However, the
goal of this patch is to give users another possibility while
performing backups. Especially when large databases are in use.
  I really like the proposal of working on a block level incremental
backup feature and the idea of considering LSN. However, I'd suggest
to see block level as a second step and a goal to keep in mind while
working on the first step. I believe that file-level incremental
backup will bring a lot of benefits to our community and users anyway.
 I base this sentence on our daily experience. We have to honour (and
the duty) to manage - probably - some of the largest Postgres
databases in the world. We currently rely on rsync to copy database
pages. Performing a full backup in 2 days instead of 9 days completely
changes disaster recovery policies in a company. Or even 2 hours
instead of 6.

My 2 cents,
Gabriele
--Gabriele Bartolini - 2ndQuadrant ItaliaPostgreSQL Training, Services and Supportgabriele.bartolini@2ndQuadrant.it |
www.2ndQuadrant.it


2014-08-01 19:05 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
> On Fri, Aug 1, 2014 at 1:43 PM, desmodemone <desmodemone@gmail.com> wrote:
>>
>>
>>
>> 2014-08-01 18:20 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
>>
>>> On Fri, Aug 1, 2014 at 12:35 AM, Amit Kapila <amit.kapila16@gmail.com>
>>> wrote:
>>> >> c) the map is not crash safe by design, because it needs only for
>>> >> incremental backup to track what blocks needs to be backuped, not for
>>> >> consistency or recovery of the whole cluster, so it's not an heavy cost
>>> >> for
>>> >> the whole cluster to maintain it. we could think an option (but it's
>>> >> heavy)
>>> >> to write it at every flush  on file to have crash-safe map, but I not
>>> >> think
>>> >> it's so usefull . I think it's acceptable, and probably it's better to
>>> >> force
>>> >> that, to say: "if your db will crash, you need a fullbackup ",
>>> >
>>> > I am not sure if your this assumption is right/acceptable, how can
>>> > we say that in such a case users will be okay to have a fullbackup?
>>> > In general, taking fullbackup is very heavy operation and we should
>>> > try to avoid such a situation.
>>>
>>>
>>> Besides, the one taking the backup (ie: script) may not be aware of
>>> the need to take a full one.
>>>
>>> It's a bad design to allow broken backups at all, IMNSHO.
>>
>>
>> Hi Claudio,
>>                  thanks for your observation
>> First: the case it's after a crash of a database, and it's not something
>> happens every day or every week. It's something that happens in rare
>> conditions, or almost my experience is so. If it happens very often probably
>> there are other problems.
>
> Not so much. In this case, the software design isn't software-crash
> safe, it's not that it's not hardware-crash safe.
>
> What I mean, is that an in-memory bitmap will also be out of sync if
> you kill -9 (or if one of the backends is killed by the OOM), or if it
> runs out of disk space too.
>
> Normally, a simple restart fixes it because pg will do crash recovery
> just fine, but now the bitmap is out of sync, and further backups are
> broken. It's not a situation I want to face unless there's a huge
> reason to go for such design.
>
> If you make it so that the commit includes flipping the bitmap, it can
> be done cleverly enough to avoid too much overhead (though it will
> have some), and you now have it so that any to-be-touched block is now
> part of the backup. You just apply all the bitmap changes in batch
> after a checkpoint, before syncing to disk, and before erasing the WAL
> segments. Simple, relatively efficient, and far more robust than an
> in-memory thing.
>
> Still, it *can* double checkpoint I/O on the worst case, and it's not
> an unfathomable case either.
>
>> Second: to avoid the problem to know if the db needed to have a full backup
>> to rebuild the map we could think to write in the map header the backup
>> reference (with an id and LSN reference for example ) so  if the
>> someone/something try to do an incremental backup after a crash, the map
>> header will not have noone full backup listed [because it will be empty] ,
>> and automaticcaly switch to a full one. I think after a crash it's a good
>> practice to do a full backup, to see if there are some problems on files or
>> on filesystems, but if I am wrong I am happy to know :) .
>
> After a crash I do not do a backup, I do a verification of the data
> (VACUUM and some data consistency checks usually), lest you have a
> useless backup. The backup goes after that.
>
> But, I'm not DBA guru.
>
>> Remember that I propose a map in ram to reduce the impact on performances,
>> but we could create an option to leave the choose to the user, if you want a
>> crash safe map, at every flush will be updated also a map file , if not, the
>> map will be in ram.
>
> I think the performance impact of a WAL-linked map isn't so big as to
> prefer the possibility of broken backups. I wouldn't even allow it.
>
> It's not free, making it crash safe, but it's not that expensive
> either. If you want to support incremental backups, you really really
> need to make sure those backups are correct and usable, and IMV
> anything short of full crash safety will be too fragile for that
> purpose. I don't want to be in a position of needing the backup and
> finding out it's inconsistent after the fact, and I don't want to
> encourage people to set themselves up for that by adding that "faster
> but unsafe backups" flag.
>
> I'd do it either safe, or not at all.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
<gabriele.bartolini@2ndquadrant.it> wrote:
>    I really like the proposal of working on a block level incremental
> backup feature and the idea of considering LSN. However, I'd suggest
> to see block level as a second step and a goal to keep in mind while
> working on the first step. I believe that file-level incremental
> backup will bring a lot of benefits to our community and users anyway.

Thing is, I don't see how the LSN method is that much harder than an
on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.

Keeping a last-updated-LSN for each segment (or group of blocks) is
just as easy as keeping a bitmap, and far more flexible and robust.

The complexity and cost of safely keeping the map up-to-date is what's
in question here, but as was pointed before, there's no really safe
alternative. Nor modification times nor checksums (nor in-memory
bitmaps IMV) are really safe enough for backups, so you really want to
use something like the LSN. It's extra work, but opens up a world of
possibilities.



Re: Proposal: Incremental Backup

From
Gabriele Bartolini
Date:
Hi Claudio,
 I think there has been a misunderstanding. I agree with you (and I
think also Marco) that LSN is definitely a component to consider in
this process. We will come up with an alternate proposal which
considers LSNS either today or tomorrow. ;)

Thanks,
Gabriele
--Gabriele Bartolini - 2ndQuadrant ItaliaPostgreSQL Training, Services and Supportgabriele.bartolini@2ndQuadrant.it |
www.2ndQuadrant.it


2014-08-04 20:30 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
> <gabriele.bartolini@2ndquadrant.it> wrote:
>>    I really like the proposal of working on a block level incremental
>> backup feature and the idea of considering LSN. However, I'd suggest
>> to see block level as a second step and a goal to keep in mind while
>> working on the first step. I believe that file-level incremental
>> backup will bring a lot of benefits to our community and users anyway.
>
> Thing is, I don't see how the LSN method is that much harder than an
> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>
> Keeping a last-updated-LSN for each segment (or group of blocks) is
> just as easy as keeping a bitmap, and far more flexible and robust.
>
> The complexity and cost of safely keeping the map up-to-date is what's
> in question here, but as was pointed before, there's no really safe
> alternative. Nor modification times nor checksums (nor in-memory
> bitmaps IMV) are really safe enough for backups, so you really want to
> use something like the LSN. It's extra work, but opens up a world of
> possibilities.



Re: Proposal: Incremental Backup

From
Simon Riggs
Date:
On 4 August 2014 19:30, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
> <gabriele.bartolini@2ndquadrant.it> wrote:
>>    I really like the proposal of working on a block level incremental
>> backup feature and the idea of considering LSN. However, I'd suggest
>> to see block level as a second step and a goal to keep in mind while
>> working on the first step. I believe that file-level incremental
>> backup will bring a lot of benefits to our community and users anyway.
>
> Thing is, I don't see how the LSN method is that much harder than an
> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>
> Keeping a last-updated-LSN for each segment (or group of blocks) is
> just as easy as keeping a bitmap, and far more flexible and robust.
>
> The complexity and cost of safely keeping the map up-to-date is what's
> in question here, but as was pointed before, there's no really safe
> alternative. Nor modification times nor checksums (nor in-memory
> bitmaps IMV) are really safe enough for backups, so you really want to
> use something like the LSN. It's extra work, but opens up a world of
> possibilities.

OK, some comments on all of this.

* Wikipedia thinks the style of backup envisaged should be called "Incremental"
https://en.wikipedia.org/wiki/Differential_backup

* Base backups are worthless without WAL right up to the *last* LSN
seen during the backup, which is why pg_stop_backup() returns an LSN.
This is the LSN that is the effective viewpoint of the whole base
backup. So if we wish to get all changes since the last backup, we
must re-quote this LSN. (Or put another way - file level LSNs don't
make sense - we just need one LSN for the whole backup).

* When we take an incremental backup we need the WAL from the backup
start LSN through to the backup stop LSN. We do not need the WAL
between the last backup stop LSN and the new incremental start LSN.
That is a huge amount of WAL in many cases and we'd like to avoid
that, I would imagine. (So the space savings aren't just the delta
from the main data files, we should also look at WAL savings).

* For me, file based incremental is a useful and robust feature.
Block-level incremental is possible, but requires either significant
persistent metadata (1 MB per GB file) or access to the original
backup. One important objective here is to make sure we do NOT have to
re-read the last backup when taking the next backup; this helps us to
optimize the storage costs for backups. Plus, block-level recovery
requires us to have a program that correctly re-writes data into the
correct locations in a file, which seems likely to be a slow and bug
ridden process to me. Nice, safe, solid file-level incremental backup
first please. Fancy, bug prone, block-level stuff much later.

* One purpose of this could be to verify the backup. rsync provides a
checksum, pg_basebackup does not. However, checksums are frequently
prohibitively expensive, so perhaps asking for that is impractical and
maybe only a secondary objective.

* If we don't want/have file checksums, then we don't need a profile
file and using just the LSN seems fine. I don't think we should
specify that manually - the correct LSN is written to the backup_label
file in a base backup and we should read it back from there. We should
also write a backup_label file to incremental base backups, then we
can have additional lines saying what the source backups were. So full
base backup backup_labels remain as they are now, but we add one
additional line per increment, so we have the full set of increments,
much like a history file.

Normal backup_label files look like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: ....
LABEL: foo

so we would have a file that looks like this

START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: ....
LABEL: foo
INCREMENTAL 1
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: ....
LABEL: foo incremental 1
INCREMENTAL 2
START WAL LOCATION: %X/%X
CHECKPOINT LOCATION: %X/%X
BACKUP METHOD: streamed
BACKUP FROM: standby
START TIME: ....
LABEL: foo incremental 2
... etc ...

which we interpret as showing the original base backup, then the first
increment, then the second increment etc.. which allows us to recover
the backups in the correct sequence.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 5, 2014 at 3:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 4 August 2014 19:30, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Aug 4, 2014 at 5:15 AM, Gabriele Bartolini
>> <gabriele.bartolini@2ndquadrant.it> wrote:
>>>    I really like the proposal of working on a block level incremental
>>> backup feature and the idea of considering LSN. However, I'd suggest
>>> to see block level as a second step and a goal to keep in mind while
>>> working on the first step. I believe that file-level incremental
>>> backup will bring a lot of benefits to our community and users anyway.
>>
>> Thing is, I don't see how the LSN method is that much harder than an
>> on-disk bitmap. In-memory bitmap IMO is just a recipe for disaster.
>>
>> Keeping a last-updated-LSN for each segment (or group of blocks) is
>> just as easy as keeping a bitmap, and far more flexible and robust.
>>
>> The complexity and cost of safely keeping the map up-to-date is what's
>> in question here, but as was pointed before, there's no really safe
>> alternative. Nor modification times nor checksums (nor in-memory
>> bitmaps IMV) are really safe enough for backups, so you really want to
>> use something like the LSN. It's extra work, but opens up a world of
>> possibilities.
>
> OK, some comments on all of this.
>
> * Wikipedia thinks the style of backup envisaged should be called "Incremental"
> https://en.wikipedia.org/wiki/Differential_backup
>
> * Base backups are worthless without WAL right up to the *last* LSN
> seen during the backup, which is why pg_stop_backup() returns an LSN.
> This is the LSN that is the effective viewpoint of the whole base
> backup. So if we wish to get all changes since the last backup, we
> must re-quote this LSN. (Or put another way - file level LSNs don't
> make sense - we just need one LSN for the whole backup).

File-level LSNs are an optimization. When you want to backup all files
modified since the last base or incremental backup (yes, you need the
previous backup label at least), you check the file-level LSN range.
That tells you which "changesets" touched that file, so you know
whether to process it or not.

Block-level LSNs (or, rather, block-segment-level) are just a
refinement of that.

> * When we take an incremental backup we need the WAL from the backup
> start LSN through to the backup stop LSN. We do not need the WAL
> between the last backup stop LSN and the new incremental start LSN.
> That is a huge amount of WAL in many cases and we'd like to avoid
> that, I would imagine. (So the space savings aren't just the delta
> from the main data files, we should also look at WAL savings).

Yes, probably something along the lines of removing redundant FPW and
stuff like that.

> * For me, file based incremental is a useful and robust feature.
> Block-level incremental is possible, but requires either significant
> persistent metadata (1 MB per GB file) or access to the original
> backup. One important objective here is to make sure we do NOT have to
> re-read the last backup when taking the next backup; this helps us to
> optimize the storage costs for backups. Plus, block-level recovery
> requires us to have a program that correctly re-writes data into the
> correct locations in a file, which seems likely to be a slow and bug
> ridden process to me. Nice, safe, solid file-level incremental backup
> first please. Fancy, bug prone, block-level stuff much later.

Ok. You could do incremental first without any kind of optimization,
then file-level optimization by keeping a file-level LSN range, and
then extend that to block-segment-level LSN ranges. That sounds like a
plan to me.

But, I don't see how you'd do the one without optimization without
reading the previous backup for comparing deltas. Remember checksums
are deemed not trustworthy, not just by me, so that (which was the
original proposition) doesn't work.

> * If we don't want/have file checksums, then we don't need a profile
> file and using just the LSN seems fine. I don't think we should
> specify that manually - the correct LSN is written to the backup_label
> file in a base backup and we should read it back from there.

Agreed



Re: Proposal: Incremental Backup

From
Simon Riggs
Date:
On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:

>> * When we take an incremental backup we need the WAL from the backup
>> start LSN through to the backup stop LSN. We do not need the WAL
>> between the last backup stop LSN and the new incremental start LSN.
>> That is a huge amount of WAL in many cases and we'd like to avoid
>> that, I would imagine. (So the space savings aren't just the delta
>> from the main data files, we should also look at WAL savings).
>
> Yes, probably something along the lines of removing redundant FPW and
> stuff like that.

Not what I mean at all, sorry for confusing.

Each backup has a start LSN and a stop LSN. You need all the WAL
between those two points (-X option)

But if you have an incremental backup (b2), it depends upon an earlier
backup (b1).

You don't need the WAL between b1.stop_lsn and b2.start_lsn.

In typical cases, start to stop will be a few hours or less, whereas
we'd be doing backups at most daily. Which would mean we'd only need
to store at most 10% of the WAL files because we don't need WAL
between backups.

>> * For me, file based incremental is a useful and robust feature.
>> Block-level incremental is possible, but requires either significant
>> persistent metadata (1 MB per GB file) or access to the original
>> backup. One important objective here is to make sure we do NOT have to
>> re-read the last backup when taking the next backup; this helps us to
>> optimize the storage costs for backups. Plus, block-level recovery
>> requires us to have a program that correctly re-writes data into the
>> correct locations in a file, which seems likely to be a slow and bug
>> ridden process to me. Nice, safe, solid file-level incremental backup
>> first please. Fancy, bug prone, block-level stuff much later.
>
> Ok. You could do incremental first without any kind of optimization,

Yes, that is what makes sense to me. Fast, simple, robust and most of
the benefit.

We should call this INCREMENTAL FILE LEVEL

> then file-level optimization by keeping a file-level LSN range, and
> then extend that to block-segment-level LSN ranges. That sounds like a
> plan to me.

Thinking some more, there seems like this whole store-multiple-LSNs
thing is too much. We can still do block-level incrementals just by
using a single LSN as the reference point. We'd still need a complex
file format and a complex file reconstruction program, so I think that
is still "next release". We can call that INCREMENTAL BLOCK LEVEL

> But, I don't see how you'd do the one without optimization without
> reading the previous backup for comparing deltas. Remember checksums
> are deemed not trustworthy, not just by me, so that (which was the
> original proposition) doesn't work.

Every incremental backup refers to an earlier backup as a reference
point, which may then refer to an earlier one, in a chain.

Each backup has a single LSN associated with it, as stored in the
backup_label. (So we don't need the profile stage now, AFAICS)

To decide whether we need to re-copy the file, you read the file until
we find a block with a later LSN. If we read the whole file without
finding a later LSN then we don't need to re-copy. That means we read
each file twice, which is slower, but the file is at most 1GB in size,
we we can assume will be mostly in memory for the second read.

As Marco says, that can be optimized using filesystem timestamps instead.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: Incremental Backup

From
Michael Paquier
Date:
On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
> Thinking some more, there seems like this whole store-multiple-LSNs
> thing is too much. We can still do block-level incrementals just by
> using a single LSN as the reference point. We'd still need a complex
> file format and a complex file reconstruction program, so I think that
> is still "next release". We can call that INCREMENTAL BLOCK LEVEL.

Yes, that's the approach taken by pg_rman for its block-level
incremental backup. Btw, I don't think that the CPU cost to scan all
the relation files added to the one to rebuild the backups is worth
doing it on large instances. File-level backup would cover most of the
use cases that people face, and simplify footprint on core code. With
a single LSN as reference position of course to determine if a file
needs to be backup up of course, if it has at least one block that has
been modified with a LSN newer than the reference point.

Regards,
-- 
Michael



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 5, 2014 at 9:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>
>> On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Thinking some more, there seems like this whole store-multiple-LSNs
>> thing is too much. We can still do block-level incrementals just by
>> using a single LSN as the reference point. We'd still need a complex
>> file format and a complex file reconstruction program, so I think that
>> is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>
> Yes, that's the approach taken by pg_rman for its block-level
> incremental backup. Btw, I don't think that the CPU cost to scan all
> the relation files added to the one to rebuild the backups is worth
> doing it on large instances. File-level backup would cover most of the
> use cases that people face, and simplify footprint on core code. With
> a single LSN as reference position of course to determine if a file
> needs to be backup up of course, if it has at least one block that has
> been modified with a LSN newer than the reference point.


It's the finding of that block that begs optimizing IMO.



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 5, 2014 at 9:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
>
>>> * When we take an incremental backup we need the WAL from the backup
>>> start LSN through to the backup stop LSN. We do not need the WAL
>>> between the last backup stop LSN and the new incremental start LSN.
>>> That is a huge amount of WAL in many cases and we'd like to avoid
>>> that, I would imagine. (So the space savings aren't just the delta
>>> from the main data files, we should also look at WAL savings).
>>
>> Yes, probably something along the lines of removing redundant FPW and
>> stuff like that.
>
> Not what I mean at all, sorry for confusing.
>
> Each backup has a start LSN and a stop LSN. You need all the WAL
> between those two points (-X option)
>
> But if you have an incremental backup (b2), it depends upon an earlier
> backup (b1).
>
> You don't need the WAL between b1.stop_lsn and b2.start_lsn.
>
> In typical cases, start to stop will be a few hours or less, whereas
> we'd be doing backups at most daily. Which would mean we'd only need
> to store at most 10% of the WAL files because we don't need WAL
> between backups.

I was assuming you wouldn't store that WAL. You might not even have it.



Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >
> > On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
> > Thinking some more, there seems like this whole store-multiple-LSNs
> > thing is too much. We can still do block-level incrementals just by
> > using a single LSN as the reference point. We'd still need a complex
> > file format and a complex file reconstruction program, so I think that
> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
> 
> Yes, that's the approach taken by pg_rman for its block-level
> incremental backup. Btw, I don't think that the CPU cost to scan all
> the relation files added to the one to rebuild the backups is worth
> doing it on large instances. File-level backup would cover most of the

Well, if you scan the WAL files from the previous backup, that will tell
you what pages that need incremental backup.

I am thinking we need a wiki page to outline all these options.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Simon Riggs
Date:
On 6 August 2014 03:16, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
>> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >
>> > On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
>> > Thinking some more, there seems like this whole store-multiple-LSNs
>> > thing is too much. We can still do block-level incrementals just by
>> > using a single LSN as the reference point. We'd still need a complex
>> > file format and a complex file reconstruction program, so I think that
>> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>>
>> Yes, that's the approach taken by pg_rman for its block-level
>> incremental backup. Btw, I don't think that the CPU cost to scan all
>> the relation files added to the one to rebuild the backups is worth
>> doing it on large instances. File-level backup would cover most of the
>
> Well, if you scan the WAL files from the previous backup, that will tell
> you what pages that need incremental backup.

That would require you to store that WAL, which is something we hope
to avoid. Plus if you did store it, you'd need to retrieve it from
long term storage, which is what we hope to avoid.

> I am thinking we need a wiki page to outline all these options.

There is a Wiki page.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
> On 6 August 2014 03:16, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> >
> >> > On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
> >> > Thinking some more, there seems like this whole store-multiple-LSNs
> >> > thing is too much. We can still do block-level incrementals just by
> >> > using a single LSN as the reference point. We'd still need a complex
> >> > file format and a complex file reconstruction program, so I think that
> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
> >>
> >> Yes, that's the approach taken by pg_rman for its block-level
> >> incremental backup. Btw, I don't think that the CPU cost to scan all
> >> the relation files added to the one to rebuild the backups is worth
> >> doing it on large instances. File-level backup would cover most of the
> >
> > Well, if you scan the WAL files from the previous backup, that will tell
> > you what pages that need incremental backup.
> 
> That would require you to store that WAL, which is something we hope
> to avoid. Plus if you did store it, you'd need to retrieve it from
> long term storage, which is what we hope to avoid.

Well, for file-level backups we have:
1) use file modtime (possibly inaccurate)2) use file modtime and checksums (heavy read load)

For block-level backups we have:
3) accumulate block numbers as WAL is written4) read previous WAL at incremental backup time5) read data page LSNs
(highread load)
 

The question is which of these do we want to implement?  #1 is very easy
to implement, but incremental _file_ backups are larger than block-level
backups.  If we have #5, would we ever want #2?  If we have #3, would we
ever want #4 or #5?

> > I am thinking we need a wiki page to outline all these options.
> 
> There is a Wiki page.

I would like to see that wiki page have a more open approach to
implementations.

I do think this is a very important topic for us.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
>
> Well, for file-level backups we have:
>
>         1) use file modtime (possibly inaccurate)
>         2) use file modtime and checksums (heavy read load)
>
> For block-level backups we have:
>
>         3) accumulate block numbers as WAL is written
>         4) read previous WAL at incremental backup time
>         5) read data page LSNs (high read load)
>
> The question is which of these do we want to implement?  #1 is very easy
> to implement, but incremental _file_ backups are larger than block-level
> backups.  If we have #5, would we ever want #2?  If we have #3, would we
> ever want #4 or #5?

You may want to implement both #3 and #2. #3 would need a config
switch to enable updating the bitmap. That would make it optional to
incur the I/O cost of updating the bitmap. When the bitmap isn't
there, the backup would use #2. Slow, but effective. If slowness is a
problem for you, you enable the bitmap and do #3.

Sounds reasonable IMO, and it means you can start by implementing #2.



Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
> On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > Well, for file-level backups we have:
> >
> >         1) use file modtime (possibly inaccurate)
> >         2) use file modtime and checksums (heavy read load)
> >
> > For block-level backups we have:
> >
> >         3) accumulate block numbers as WAL is written
> >         4) read previous WAL at incremental backup time
> >         5) read data page LSNs (high read load)
> >
> > The question is which of these do we want to implement?  #1 is very easy
> > to implement, but incremental _file_ backups are larger than block-level
> > backups.  If we have #5, would we ever want #2?  If we have #3, would we
> > ever want #4 or #5?
> 
> You may want to implement both #3 and #2. #3 would need a config
> switch to enable updating the bitmap. That would make it optional to
> incur the I/O cost of updating the bitmap. When the bitmap isn't
> there, the backup would use #2. Slow, but effective. If slowness is a
> problem for you, you enable the bitmap and do #3.
> 
> Sounds reasonable IMO, and it means you can start by implementing #2.

Well, Robert Haas had the idea of a separate process that accumulates
the changed WAL block numbers, making it low overhead.  I question
whether we need #2 just to handle cases where they didn't enable #3
accounting earlier.  If that is the case, just do a full backup and
enable #3.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Simon Riggs
Date:
On 6 August 2014 17:27, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Aug  6, 2014 at 01:15:32PM -0300, Claudio Freire wrote:
>> On Wed, Aug 6, 2014 at 12:20 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> >
>> > Well, for file-level backups we have:
>> >
>> >         1) use file modtime (possibly inaccurate)
>> >         2) use file modtime and checksums (heavy read load)
>> >
>> > For block-level backups we have:
>> >
>> >         3) accumulate block numbers as WAL is written
>> >         4) read previous WAL at incremental backup time
>> >         5) read data page LSNs (high read load)
>> >
>> > The question is which of these do we want to implement?  #1 is very easy
>> > to implement, but incremental _file_ backups are larger than block-level
>> > backups.  If we have #5, would we ever want #2?  If we have #3, would we
>> > ever want #4 or #5?
>>
>> You may want to implement both #3 and #2. #3 would need a config
>> switch to enable updating the bitmap. That would make it optional to
>> incur the I/O cost of updating the bitmap. When the bitmap isn't
>> there, the backup would use #2. Slow, but effective. If slowness is a
>> problem for you, you enable the bitmap and do #3.
>>
>> Sounds reasonable IMO, and it means you can start by implementing #2.
>
> Well, Robert Haas had the idea of a separate process that accumulates
> the changed WAL block numbers, making it low overhead.  I question
> whether we need #2 just to handle cases where they didn't enable #3
> accounting earlier.  If that is the case, just do a full backup and
> enable #3.

Well, there is a huge difference between file-level and block-level backup.

Designing, writing and verifying block-level backup to the point that
it is acceptable is a huge effort. (Plus, I don't think accumulating
block numbers as they are written will be "low overhead". Perhaps
there was a misunderstanding there and what is being suggested is to
accumulate file names that change as they are written, since we
already do that in the checkpointer process, which would be an option
between 2 and 3 on the above list).

What is being proposed here is file-level incremental backup that
works in a general way for various backup management tools. It's the
80/20 first step on the road. We get most of the benefit, it can be
delivered in this release as robust, verifiable code. Plus, that is
all we have budget for, a fairly critical consideration.

Big features need to be designed incrementally across multiple
releases, delivering incremental benefit (or at least that is what I
have learned). Yes, working block-level backup would be wonderful, but
if we hold out for that as the first step then we'll get nothing
anytime soon.

I would also point out that the more specific we make our backup
solution the less likely it is to integrate with external backup
providers. Oracle's RMAN requires specific support in external
software. 10 years after Postgres PITR we still see many vendors
showing "PostgreSQL Backup Supported" as meaning pg_dump only.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: Incremental Backup

From
Fujii Masao
Date:
On Thu, Aug 7, 2014 at 12:20 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Aug  6, 2014 at 06:48:55AM +0100, Simon Riggs wrote:
>> On 6 August 2014 03:16, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Wed, Aug  6, 2014 at 09:17:35AM +0900, Michael Paquier wrote:
>> >> On Wed, Aug 6, 2014 at 9:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> >> >
>> >> > On 5 August 2014 22:38, Claudio Freire <klaussfreire@gmail.com> wrote:
>> >> > Thinking some more, there seems like this whole store-multiple-LSNs
>> >> > thing is too much. We can still do block-level incrementals just by
>> >> > using a single LSN as the reference point. We'd still need a complex
>> >> > file format and a complex file reconstruction program, so I think that
>> >> > is still "next release". We can call that INCREMENTAL BLOCK LEVEL.
>> >>
>> >> Yes, that's the approach taken by pg_rman for its block-level
>> >> incremental backup. Btw, I don't think that the CPU cost to scan all
>> >> the relation files added to the one to rebuild the backups is worth
>> >> doing it on large instances. File-level backup would cover most of the
>> >
>> > Well, if you scan the WAL files from the previous backup, that will tell
>> > you what pages that need incremental backup.
>>
>> That would require you to store that WAL, which is something we hope
>> to avoid. Plus if you did store it, you'd need to retrieve it from
>> long term storage, which is what we hope to avoid.
>
> Well, for file-level backups we have:
>
>         1) use file modtime (possibly inaccurate)
>         2) use file modtime and checksums (heavy read load)
>
> For block-level backups we have:
>
>         3) accumulate block numbers as WAL is written
>         4) read previous WAL at incremental backup time
>         5) read data page LSNs (high read load)
>
> The question is which of these do we want to implement?

There are some data which don't have LSN, for example, postgresql.conf.
When such data has been modified since last backup, they also need to
be included in incremental backup? Probably yes. So implementing only
block-level backup seems not complete solution. It needs file-level backup as
an infrastructure for such data. This makes me think that it's more reasonable
to implement file-level backup first.

Regards,

-- 
Fujii Masao



Re: Proposal: Incremental Backup

From
Michael Paquier
Date:
On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> There are some data which don't have LSN, for example, postgresql.conf.
> When such data has been modified since last backup, they also need to
> be included in incremental backup? Probably yes.
Definitely yes. That's as well the case of paths like pg_clog,
pg_subtrans and pg_twophase.
-- 
Michael



Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
> On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> > There are some data which don't have LSN, for example, postgresql.conf.
> > When such data has been modified since last backup, they also need to
> > be included in incremental backup? Probably yes.
> Definitely yes. That's as well the case of paths like pg_clog,
> pg_subtrans and pg_twophase.

I assumed these would be unconditionally backed up during an incremental
backup because they relatively small and you don't want to make a mistake.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Bruce Momjian
Date:
On Thu, Aug  7, 2014 at 11:03:40AM +0100, Simon Riggs wrote:
> Well, there is a huge difference between file-level and block-level backup.
> 
> Designing, writing and verifying block-level backup to the point that
> it is acceptable is a huge effort. (Plus, I don't think accumulating
> block numbers as they are written will be "low overhead". Perhaps
> there was a misunderstanding there and what is being suggested is to
> accumulate file names that change as they are written, since we
> already do that in the checkpointer process, which would be an option
> between 2 and 3 on the above list).
> 
> What is being proposed here is file-level incremental backup that
> works in a general way for various backup management tools. It's the
> 80/20 first step on the road. We get most of the benefit, it can be
> delivered in this release as robust, verifiable code. Plus, that is
> all we have budget for, a fairly critical consideration.
> 
> Big features need to be designed incrementally across multiple
> releases, delivering incremental benefit (or at least that is what I
> have learned). Yes, working block-level backup would be wonderful, but
> if we hold out for that as the first step then we'll get nothing
> anytime soon.

That is fine.  I just wanted to point out that as features are added,
file-level incremental backups might not be useful.  In fact, I think
there are a lot of users for which file-level incremental backups will
never be useful, i.e. you have to have a lot of frozen/static data for
file-level incremental backups to be useful.  

I am a little worried that many users will not realize this until they
try it and are disappointed, e.g. "Why is PG writing to my static data
so often?" --- then we get beaten up about our hint bits and freezing
behavior.  :-(

I am just trying to set realistic expectations.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 07/08/14 17:29, Bruce Momjian ha scritto:
> I am a little worried that many users will not realize this until they
> try it and are disappointed, e.g. "Why is PG writing to my static data
> so often?" --- then we get beaten up about our hint bits and freezing
> behavior.  :-(
>
> I am just trying to set realistic expectations.
>

Our experience is that for big databases (size over about 50GB) the
file-level approach is often enough to halve the size of the backup.

Users which run Postgres as Data Warehouse surely will benefit from it,
so we could present it as a DWH oriented feature.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 07/08/14 17:25, Bruce Momjian ha scritto:
> On Thu, Aug  7, 2014 at 08:35:53PM +0900, Michael Paquier wrote:
>> On Thu, Aug 7, 2014 at 8:11 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>> There are some data which don't have LSN, for example, postgresql.conf.
>>> When such data has been modified since last backup, they also need to
>>> be included in incremental backup? Probably yes.
>> Definitely yes. That's as well the case of paths like pg_clog,
>> pg_subtrans and pg_twophase.
>
> I assumed these would be unconditionally backed up during an incremental
> backup because they relatively small and you don't want to make a mistake.
>

You could decide to always copy files which doesn't have LSN, but you
don't know what the user could put inside PGDATA. I would avoid any
assumption on files which are not owned by Postgres.

With the current full backup procedure they are backed up, so I think
that having them backed up with a rsync-like algorithm is what an user
would expect for an incremental backup.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Gabriele Bartolini
Date:
Hi Marco,

> With the current full backup procedure they are backed up, so I think
> that having them backed up with a rsync-like algorithm is what an user
> would expect for an incremental backup.

Exactly. I think a simple, flexible and robust method for file based
incremental backup is all we need. I am confident it could be done for
9.5.

I would like to quote every single word Simon said. Block level
incremental backup (with Robert's proposal) is definitely the ultimate
goal for effective and efficient physical backups. I see file level
incremental backup as a very good "compromise", a sort of intermediate
release which could nonetheless produce a lot of benefits to our user
base, for years to come too.

Thanks,
Gabriele



Re: Proposal: Incremental Backup

From
Benedikt Grundmann
Date:



On Thu, Aug 7, 2014 at 6:29 PM, Gabriele Bartolini <gabriele.bartolini@2ndquadrant.it> wrote:
Hi Marco,

> With the current full backup procedure they are backed up, so I think
> that having them backed up with a rsync-like algorithm is what an user
> would expect for an incremental backup.

Exactly. I think a simple, flexible and robust method for file based
incremental backup is all we need. I am confident it could be done for
9.5.

I would like to quote every single word Simon said. Block level
incremental backup (with Robert's proposal) is definitely the ultimate
goal for effective and efficient physical backups. I see file level
incremental backup as a very good "compromise", a sort of intermediate
release which could nonetheless produce a lot of benefits to our user
base, for years to come too.

Thanks,
Gabriele

I haven't been following this discussion closely at all. But at Janestreet we have been using pg_start_backup together with rsync --link-dest (onto a big NFS) to achieve incremental stored backup.  In our experience this works very well, it is however advisable to look into whatever is used to serve the NFS as we had to set some options to increase the maximum number of hardlinks.

Cheers,

Bene



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Proposal: Incremental Backup

From
Robert Haas
Date:
On Tue, Aug 5, 2014 at 8:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> To decide whether we need to re-copy the file, you read the file until
> we find a block with a later LSN. If we read the whole file without
> finding a later LSN then we don't need to re-copy. That means we read
> each file twice, which is slower, but the file is at most 1GB in size,
> we we can assume will be mostly in memory for the second read.

That seems reasonable, although copying only the changed blocks
doesn't seem like it would be a whole lot harder.  Yes, you'd need a
tool to copy those blocks back into the places where they need to go,
but that's probably not a lot of work and the disk savings, in many
cases, would be enormous.

> As Marco says, that can be optimized using filesystem timestamps instead.

The idea of using filesystem timestamps gives me the creeps.  Those
aren't always very granular, and I don't know that (for example) they
are crash-safe.  Does every filesystem on every platform make sure
that the mtime update hits the disk before the data?  What about clock
changes made manually by users, or automatically by ntpd? I recognize
that there are people doing this today, because it's what we have, and
it must not suck too much, because people are still doing it ... but I
worry that if we do it this way, we'll end up with people saying
"PostgreSQL corrupted my data" and will have no way of tracking the
problem back to the filesystem or system clock event that was the true
cause of the problem, so they'll just blame the database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Mon, Aug 11, 2014 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> As Marco says, that can be optimized using filesystem timestamps instead.
>
> The idea of using filesystem timestamps gives me the creeps.  Those
> aren't always very granular, and I don't know that (for example) they
> are crash-safe.  Does every filesystem on every platform make sure
> that the mtime update hits the disk before the data?  What about clock
> changes made manually by users, or automatically by ntpd? I recognize
> that there are people doing this today, because it's what we have, and
> it must not suck too much, because people are still doing it ... but I
> worry that if we do it this way, we'll end up with people saying
> "PostgreSQL corrupted my data" and will have no way of tracking the
> problem back to the filesystem or system clock event that was the true
> cause of the problem, so they'll just blame the database.

I have the same creeps. I only do it on a live system, after a first
full rsync, where mtime persistence is not an issue, and where I know
ntp updates have not happened.

I had a problem once where a differential rsync with timestamps didn't
work as expected, and corrupted a slave. It was a test system so I
didn't care much at the time, but if it were a backup, I'd be quite
pissed.

Basically, mtimes aren't trustworthy across reboots. Granted, this was
a very old system, debian 5 when it was new, IIRC, so it may be better
now. But it does illustrate just how bad things can get when one
trusts timestamps. This case was an old out-of-sync slave on a test
set up that got de-synchronized, and I tried to re-synchronize it with
a delta rsync to avoid the hours it would take to actually compare
everything (about a day). One segment that was modified after the sync
loss was not transfered, causing trouble at the slave, so I was forced
to re-synchronize with a full rsync (delta, but without timestamps).
This was either before pg_basebackup or before I heard of it ;-), but
in any case, if it happened on a test system with little activity, you
can be certain it can happen on a production system.

So I now only trust mtime when there has been neither a reboot nor an
ntpd running since the last mtime-less rsync. On those cases, the
optimization works and helps a lot. But I doubt you'll take many
incremental backups matching those conditions.

Say what you will of anecdotal evidence, but the issue is quite clear
theoretically as well: modifications to file segments that aren't
reflected within mtime granularity. There are many reasons why mtime
could lose precision. Being an old filesystem with second-precision
timestamps is just one, but not the only one.



Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
As I already stated, timestamps will be only used to early detect
changed files. To declared two files identical they must have same size,
same mtime and same *checksum*.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> To declared two files identical they must have same size,
> same mtime and same *checksum*.

Still not safe. Checksum collisions do happen, especially in big data sets.



Re: Proposal: Incremental Backup

From
Gabriele Bartolini
Date:
Hi Claudio,

2014-08-12 15:25 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
> Still not safe. Checksum collisions do happen, especially in big data sets.

Can I ask you what you are currently using for backing up large data
sets with Postgres?

Thanks,
Gabriele



Re: Proposal: Incremental Backup

From
Marco Nenciarini
Date:
Il 12/08/14 15:25, Claudio Freire ha scritto:
> On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
>> To declared two files identical they must have same size,
>> same mtime and same *checksum*.
>
> Still not safe. Checksum collisions do happen, especially in big data sets.
>

IMHO it is still good-enough. We are not trying to protect from a
malicious attack, we are using it to protect against some *casual* event.

Even cosmic rays have a not null probability of corrupting your database
in a not-noticeable way. And you can probably notice it better with a
checksum than with a LSN :-)

Given that, I think that whatever solution we choose, we should includechecksums in it.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it


Re: Proposal: Incremental Backup

From
Andres Freund
Date:
On 2014-08-12 10:25:21 -0300, Claudio Freire wrote:
> On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
> <marco.nenciarini@2ndquadrant.it> wrote:
> > To declared two files identical they must have same size,
> > same mtime and same *checksum*.
> 
> Still not safe. Checksum collisions do happen, especially in big data sets.

If you use an appropriate algorithm for appropriate amounts of data
that's not a relevant concern. You can easily do different checksums for
every 1GB segment of data. If you do it right the likelihood of
conflicts doing that is so low it doesn't matter at all.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini
<gabriele.bartolini@2ndquadrant.it> wrote:
>
> 2014-08-12 15:25 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>
> Can I ask you what you are currently using for backing up large data
> sets with Postgres?

Currently, a time-delayed WAL archive hot standby, pg_dump sparingly,
filesystem snapshots (incremental) of the standby more often, with the
standby down.

When I didn't have the standby, I did online filesystem snapshots of
the master with WAL archiving to prevent inconsistency due to
snapshots not being atomic.

On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> Il 12/08/14 15:25, Claudio Freire ha scritto:
>> On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> To declared two files identical they must have same size,
>>> same mtime and same *checksum*.
>>
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>>
>
> IMHO it is still good-enough. We are not trying to protect from a
> malicious attack, we are using it to protect against some *casual* event.

I'm not talking about malicious attacks, with big enough data sets,
checksum collisions are much more likely to happen than with smaller
ones, and incremental backups are supposed to work for the big sets.

You could use strong cryptographic checksums, but such strong
checksums still aren't perfect, and even if you accept the slim chance
of collision, they are quite expensive to compute, so it's bound to be
a bottleneck with good I/O subsystems. Checking the LSN is much
cheaper.

Still, do as you will. As everybody keeps saying it's better than
nothing, lets let usage have the final word.



Re: Proposal: Incremental Backup

From
Robert Haas
Date:
On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>
> If you use an appropriate algorithm for appropriate amounts of data
> that's not a relevant concern. You can easily do different checksums for
> every 1GB segment of data. If you do it right the likelihood of
> conflicts doing that is so low it doesn't matter at all.

True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
also most likely a heck of a lot faster than checksumming the entire
page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Proposal: Incremental Backup

From
Fujii Masao
Date:
On Wed, Aug 13, 2014 at 12:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Aug 12, 2014 at 10:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> Still not safe. Checksum collisions do happen, especially in big data sets.
>>
>> If you use an appropriate algorithm for appropriate amounts of data
>> that's not a relevant concern. You can easily do different checksums for
>> every 1GB segment of data. If you do it right the likelihood of
>> conflicts doing that is so low it doesn't matter at all.
>
> True, but if you use LSNs the likelihood is 0.  Comparing the LSN is
> also most likely a heck of a lot faster than checksumming the entire
> page.

If we use LSN, the strong safeguard seems to be required to prevent a user
from taking the incremental backup against "wrong" instance. For example,
it's the case where the first full backup is taken, PITR to a certain
past location,
then the incremental backup is taken between that first full backup and
the current database after PITR. PITR rewinds LSN, so such incremental
backup might be corrupted. If so, the safeguard for those problematic cases
should be needed. Otherwise, I'm afraid that a user can easily mistake the
incremental backup.

Regards,

-- 
Fujii Masao



Re: Proposal: Incremental Backup

From
Stephen Frost
Date:
Claudio,

* Claudio Freire (klaussfreire@gmail.com) wrote:
> I'm not talking about malicious attacks, with big enough data sets,
> checksum collisions are much more likely to happen than with smaller
> ones, and incremental backups are supposed to work for the big sets.

This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes.  The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory.  Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

> You could use strong cryptographic checksums, but such strong
> checksums still aren't perfect, and even if you accept the slim chance
> of collision, they are quite expensive to compute, so it's bound to be
> a bottleneck with good I/O subsystems. Checking the LSN is much
> cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums.  If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.
Thanks,
    Stephen

Re: Proposal: Incremental Backup

From
Claudio Freire
Date:
On Tue, Aug 12, 2014 at 8:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
>> I'm not talking about malicious attacks, with big enough data sets,
>> checksum collisions are much more likely to happen than with smaller
>> ones, and incremental backups are supposed to work for the big sets.
>
> This is an issue when you're talking about de-duplication, not when
> you're talking about testing if two files are the same or not for
> incremental backup purposes.  The size of the overall data set in this
> case is not relevant as you're only ever looking at the same (at most
> 1G) specific file in the PostgreSQL data directory.  Were you able to
> actually produce a file with a colliding checksum as an existing PG
> file, the chance that you'd be able to construct one which *also* has
> a valid page layout sufficient that it wouldn't be obviously massivly
> corrupted is very quickly approaching zero.

True, but only with a strong hash, not an adler32 or something like that.