Re: Using bytea field... - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Using bytea field...
Date
Msg-id AANLkTi=CeuzdS20hESup0EeLAPaJpKrbSyDzTEW4q=Qa@mail.gmail.com
Whole thread Raw
In response to Re: Using bytea field...  (Sim Zacks <sim@compulab.co.il>)
List pgsql-general
On Thu, Mar 10, 2011 at 2:09 AM, Sim Zacks <sim@compulab.co.il> wrote:
>
>> > The question is, if it screws up and says that an image already exists
>> > and then returns a different image when querying for it, how bad would
>> > that be.
>> >
>>
>>
>> It'll never happen:
>>
>>
>> http://stackoverflow.com/questions/862346/how-do-i-assess-the-hash-collision-probability
>>
>>
>> Sure you CAN go out of your way to generate collisions, but I'd bet
>> money you never see one from your setup.
>>
>> The probability is extremely slim.  And if thats too much of a chance,
>> use sha2, its mind numbingly slim.
>>
>> If you were doing cryptography it would be a problem, yes, but not
>> checking file equality.
>>
>> -Andy
>
> Never is a long time. The question that I asked is precisely: how much money
> you would bet that you'll never hit a collision. It depends on the use case.
> If you are talking about privacy issues, which can include lawsuits, loss of
> reputation and/or damages, then I wouldn't take that risk, even on sha2.
> Especially not with all the publicly available documentation explaining why
> not to do it.  If you are talking about a minor inconvenience or
> professional pride because the wrong image showed up, or the right image was
> never stored, then it may be worth the risk.

Regardless of the intended use, I would bet every dollar I've ever
made, will make, could borrow, beg steal, etc vs 1 of your dollars and
happily collect it when I won the bet.  See  here:
(http://en.wikipedia.org/wiki/Birthday_attack) and look at the table
of odds vs population size...your statement is not in line with
mathematical reality, and from a risk standpoint there is a large
number of things to be looking at before sha2 collision such as drive
bit error rates, spontaneous combustion, etc.

AFAIK, even sha1 collisions have never been found in the wild, and the
zfs deduplication system uses sha1 to deduplicate disk blocks, as does
bit torrent. In fact many computing systems you rely on make hash
safety assumptions weaker than sha2.

Schneier speculates that we may see a collision soon here:
http://blog.valerieaurora.org/2009/06/25/sha-1-collision-expected-within-a-year/.
A small number of duplicate accidental md5 hashes have been found in
the wild.

merlin

pgsql-general by date:

Previous
From: Vlad Romascanu
Date:
Subject: Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT
Next
From: Rich Shepard
Date:
Subject: Post-Upgrade Question: 9.0.1 -> 9.0.3