On 08/18/2014 07:33 PM, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> On 08/18/2014 08:05 AM, Alvaro Herrera wrote:
>
>>> We already have the FNV checksum implementation in the backend --
>>> can't we use that one for this and avoid messing with MD5?
>>>
>>> (I don't think we're looking for a cryptographic hash here. Am I
>>> wrong?)
>>
>> Hmm. Any user that can update a table can craft such an update
>> that its checksum matches an older backup. That may seem like an
>> onerous task; to correctly calculate the checksum of a file in a
>> previous, you need to know the LSNs and the exact data, including
>> deleted data, on every block in the table, and then construct a
>> suitable INSERT or UPDATE that modifies the table such that you get
>> a collision. But for some tables it could be trivial; you might
>> know that a table was bulk-loaded with a particular LSN and there
>> are no dead tuples.
>
> What would anybody obtain by doing that? The only benefit is that
> the file you so carefully crafted is not included in the next
> incremental backup. How is this of any interest?
You're not thinking evil enough ;-). Let's say that you have a table
that stores bank transfers. You can do a bank transfer to pay a
merchant, get the goods delivered to you, and then a second transfer to
yourself with a specially crafted message attached to it that makes the
checksum match the state before the first transfer. If the backup is
restored (e.g. by a daily batch job to a reporting system), it will
appear as if neither transfer happened, and you get to keep your money.
Far-fetched? Well, how about this scenario: a vandal just wants to cause
damage. Creating a situation where a restore from backup causes the
system to be inconsistent will certainly cause headaches to the admins,
and leave them wondering what else is corrupt.
Or how about this: you can do the trick to a system catalog, say
pg_attribute, to make it look like a column is of type varlena, when
it's actually since been ALTERed to be an integer. Now you can access
arbitrary memory in the server, and take over the whole system.
I'm sure any or all of those scenarios are highly inpractical when you
actually sit down and try to do it, but you don't want to rely on that.
You have to be able to trust your backups.
- Heikki