Re: Proposal: Incremental Backup - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Proposal: Incremental Backup
Date
Msg-id 20140812232651.GG16422@tamriel.snowman.net
Whole thread Raw
In response to Re: Proposal: Incremental Backup  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Proposal: Incremental Backup
List pgsql-hackers
Claudio,

* Claudio Freire (klaussfreire@gmail.com) wrote:
> I'm not talking about malicious attacks, with big enough data sets,
> checksum collisions are much more likely to happen than with smaller
> ones, and incremental backups are supposed to work for the big sets.

This is an issue when you're talking about de-duplication, not when
you're talking about testing if two files are the same or not for
incremental backup purposes.  The size of the overall data set in this
case is not relevant as you're only ever looking at the same (at most
1G) specific file in the PostgreSQL data directory.  Were you able to
actually produce a file with a colliding checksum as an existing PG
file, the chance that you'd be able to construct one which *also* has
a valid page layout sufficient that it wouldn't be obviously massivly
corrupted is very quickly approaching zero.

> You could use strong cryptographic checksums, but such strong
> checksums still aren't perfect, and even if you accept the slim chance
> of collision, they are quite expensive to compute, so it's bound to be
> a bottleneck with good I/O subsystems. Checking the LSN is much
> cheaper.

For my 2c on this- I'm actually behind the idea of using the LSN (though
I have not followed this thread in any detail), but there's plenty of
existing incremental backup solutions (PG specific and not) which work
just fine by doing checksums.  If you truely feel that this is a real
concern, I'd suggest you review the rsync binary diff protocol which is
used extensively around the world and show reports of it failing in the
field.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...
Next
From: Heikki Linnakangas
Date:
Subject: Re: WAL format and API changes (9.5)