On Tue, Aug 12, 2014 at 8:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
>> I'm not talking about malicious attacks, with big enough data sets,
>> checksum collisions are much more likely to happen than with smaller
>> ones, and incremental backups are supposed to work for the big sets.
>
> This is an issue when you're talking about de-duplication, not when
> you're talking about testing if two files are the same or not for
> incremental backup purposes. The size of the overall data set in this
> case is not relevant as you're only ever looking at the same (at most
> 1G) specific file in the PostgreSQL data directory. Were you able to
> actually produce a file with a colliding checksum as an existing PG
> file, the chance that you'd be able to construct one which *also* has
> a valid page layout sufficient that it wouldn't be obviously massivly
> corrupted is very quickly approaching zero.
True, but only with a strong hash, not an adler32 or something like that.