storing checksums in a database - Mailing list pgsql-general

From Reece Hart
Subject storing checksums in a database
Date
Msg-id 1176233057.4904.55.camel@snafu.site
Whole thread Raw
List pgsql-general
I'd like to store several kinds of checksums in a database and would appreciate some advice about the best way to do this. I will use these checksums for uniqueness constraints and for searching.

I currently store md5s as text and I could store other checksums as text also, but it occurs to me that there are at least two other options.

One option is to store binary blobs instead. This would save 2x on storage over hex. For example, sha1 is a 160 bit hash; that's 20 bytes as binary or 40 (8-bit) chars as hex. As longer hashes are used, I imagine that the space savings might have significant impact on index building and query optimization decisions. The downside is that one would need to encode/decode for lookups based on the hash.

Another option is to create a custom type for hex digests that would provide type conversion as necessary. These would provide the space benefit of storing as a binary and the usability of having postgresql Do The Right Thing depending on the type of the representations being compared. I can't think of any downsides.

So, the questions:

1) Am I worrying about nothing, or is the size of a hash significant? In my case, the rows consist of a protein sequence (~350 chars, as text), a timestamp, an integer PK, and an integer length. The text md5 is 32 text chars, of course, to which I might add sha1 (40 text chars). Thus, the size of hashes is a sizable fraction of the rest of the row.

2) Does anyone know of postgresql digest data types that provide these conversions? If not, I'll do it myself... but I'd rather steal of course.

Thanks,
Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
./universe -G 6.672e-11 -e 1.602e-19 -protonmass 1.673e-27 -uspres bush
kernel warning: universe consuming too many resources. Killing.
universe killed due to catastrophic leadership. Try -uspres carter.

pgsql-general by date:

Previous
From: "Gary Winslow"
Date:
Subject: Re: Postgres Crash Running PLPGSQL Function on 8.2.3
Next
From: "sami jan"
Date:
Subject: Re: plperl "set-valued function" problem