Thread: Calculating a hash / fingerprint for a row / tuple

Calculating a hash / fingerprint for a row / tuple

From
Randall Lucas
Date:
I'm looking for a way to calculate a hashcode or fingerprint for a row /
tuple of arbitrary width.  The goal is to be able to store, in an audit
table, a fingerprint which could be compared against a later fingerprint
to detect changes (this application does not call for cryptographic
level security, so a simple checksum would be adequate for now).
Something like:

SELECT id, row_hashcode(*) FROM my_table;

id   |   row_hashcode
 1    a3843587d893589ef321d31
 2    934983497f9083298cb0843

etc.

I've considered implementing it in a procedural language, but the
inability of passing in ROW datatypes seems to make it impossible for
dynamic tuples (I suppose for static tables it could be done by a
process of iteration over the system catalog column lists for a given
table and PK).  Also, for static tables, it seems like for a 1 .. 32
column table, creating a multicolumn index would have this effect, but I
can't seem to find a function to get at the calculated value of the index.

It seems like implementing this in C would be feasible, but I don't want
to reimplement it if it exists.  Merely concatenating the values would
fail for nulls, so it seems to me the best thing to do would be to take
the effectively-serialized "COPY" output and run MD5 (or your
less-collision-prone hash of choice) on that.

I've done a fairly thorough search of the docs, contrib, and mailing
list archives (although the frequency in other contexts of the terms
"hash," "fingerprint," and "hashcode" are lowering the S/N).  Does
anybody have any pointers or ideas either on existing implementations,
or writing a new C function (such as generating COPY output from an
arbitrary tuple)?

Best,

Randall

Re: Calculating a hash / fingerprint for a row / tuple

From
Tom Lane
Date:
Randall Lucas <rlucas@tercent.com> writes:
> I'm looking for a way to calculate a hashcode or fingerprint for a row /
> tuple of arbitrary width.  The goal is to be able to store, in an audit
> table, a fingerprint which could be compared against a later fingerprint
> to detect changes (this application does not call for cryptographic
> level security, so a simple checksum would be adequate for now).
> Something like:

> SELECT id, row_hashcode(*) FROM my_table;

> id   |   row_hashcode
>  1    a3843587d893589ef321d31
>  2    934983497f9083298cb0843

Try "select md5(textin(record_out(mytbl.*))) from mytbl;" ... a bit
klugy, but it works.

            regards, tom lane