Re: Proposal: Incremental Backup - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: Proposal: Incremental Backup
Date
Msg-id CAGTBQpY6rcrcurDtCcOGc7Ac8zjrizgz4tyNH4vyYLjXxNQ_0Q@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Incremental Backup  (Gabriele Bartolini <gabriele.bartolini@2ndquadrant.it>)
Responses Re: Proposal: Incremental Backup
List pgsql-hackers
On Tue, Aug 12, 2014 at 11:17 AM, Gabriele Bartolini
<gabriele.bartolini@2ndquadrant.it> wrote:
>
> 2014-08-12 15:25 GMT+02:00 Claudio Freire <klaussfreire@gmail.com>:
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>
> Can I ask you what you are currently using for backing up large data
> sets with Postgres?

Currently, a time-delayed WAL archive hot standby, pg_dump sparingly,
filesystem snapshots (incremental) of the standby more often, with the
standby down.

When I didn't have the standby, I did online filesystem snapshots of
the master with WAL archiving to prevent inconsistency due to
snapshots not being atomic.

On Tue, Aug 12, 2014 at 11:25 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> Il 12/08/14 15:25, Claudio Freire ha scritto:
>> On Tue, Aug 12, 2014 at 6:41 AM, Marco Nenciarini
>> <marco.nenciarini@2ndquadrant.it> wrote:
>>> To declared two files identical they must have same size,
>>> same mtime and same *checksum*.
>>
>> Still not safe. Checksum collisions do happen, especially in big data sets.
>>
>
> IMHO it is still good-enough. We are not trying to protect from a
> malicious attack, we are using it to protect against some *casual* event.

I'm not talking about malicious attacks, with big enough data sets,
checksum collisions are much more likely to happen than with smaller
ones, and incremental backups are supposed to work for the big sets.

You could use strong cryptographic checksums, but such strong
checksums still aren't perfect, and even if you accept the slim chance
of collision, they are quite expensive to compute, so it's bound to be
a bottleneck with good I/O subsystems. Checking the LSN is much
cheaper.

Still, do as you will. As everybody keeps saying it's better than
nothing, lets let usage have the final word.



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: delta relations in AFTER triggers
Next
From: "MauMau"
Date:
Subject: Re: Improvement of versioning on Windows, take two