Yet another binary vs. text question (with a twist?) - Mailing list pgsql-novice

From Jan Danielsson
Subject Yet another binary vs. text question (with a twist?)
Date
Msg-id 5490F4B2.5030405@gmail.com
Whole thread Raw
Responses Re: Yet another binary vs. text question (with a twist?)  (David G Johnston <david.g.johnston@gmail.com>)
List pgsql-novice
Hello,

   I searched google for some answers but the questions and answers I
found related to "files", while I have potentially binary *non-files*
I'd like to store in a database.

   Background:  I'm building a system which tracks files using their
hashes, and there's a database which contains all the hashes (the actual
files are stored in the filesystem outside the database).  Systems need
to connect to each other and synchronize content and does so by
comparing the hashes.

   The hashes really have no reason to be stored in any other format
than their raw binary format, except I've read that SQL99 doesn't allow
binary columns to be indexed -- which would be a problem because the
hashes most definitely will need to be indexed.  Is it correct that
blob's can't be indexed?

   On a more general note:  Considering this isn't about files the in
the range of a few K or M, this is fixed length 32 byte buffers --
always.  While I would prefer to store them as binary entities in the
database, I'd like to know if there are compelling reasons to convert
them to some other format, like base64, and store them as strings.

   In some installations there be a *huge* number of hashes, which is
why they'll need to be indexed but also why I'd prefer to stick with the
non-transformed binary hashes (the extra space it takes will add up).

   Before I started searching around I was essentially looking to
validate my assertion that raw binary is definitely better (fixed
length, no risk of the db attempting to recode character sets/encodings,
less data to transfer, etc), but the reason I'm writing this is because
what I found got me to start thinking about base64-encoding the hashes
instead.

   Any input is appreciated; what are the pros and cons of using binary
vs text?  And I'm not referring to storing large objects, these are only
fixed-length hashes.

--
Kind Regards,
Jan


pgsql-novice by date:

Previous
From: Scott Robertson
Date:
Subject: work copying over to closed servers and databases
Next
From: David G Johnston
Date:
Subject: Re: Yet another binary vs. text question (with a twist?)