Thread: md5 hash on table row

md5 hash on table row

From
Jon Lapham
Date:
Hello all,

I would like a generic way to generate an md5 hash for each row of a
table.  Currently I do it thusly:

select id, md5(col_a || col_b || col_c || col_d) from mytable;
   id  |               md5
------+----------------------------------
  1    | 75acee3133f19d1a81ab2e7c1c32eb29
  2    | 496f5e8bc945a922fcdd487e1ddde5c5
  3    | ace10f4b1408d179da2e93267b300108
  4    | bd029a826a98c21ec4c3661cc34657f8
  5    | 4bacd2b0f34213a32f911ed5c1240c09

As you can see, I place each field inside the md5() function call.
This, however, requires that I know what the table fields are, and it is
not very robust (ie: boolean and any other column type that can't use
"||" concatenation fails).  Is there a better, robust, or more generic
way to do this?  I have tables with columns of type bytea, boolean,
binary, etc... all the difficult column types.

I would love something like this:
select id, md5(*) from mytable;

...which of course does not work.

Thanks for any ideas!
-Jon

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: md5 hash on table row

From
Michael Fuhr
Date:
On Wed, Nov 02, 2005 at 11:38:46AM -0200, Jon Lapham wrote:
> I would love something like this:
> select id, md5(*) from mytable;

Is it acceptable to have some decoration around the data being
hashed?  If so then this example might be useful:

test=> SELECT * FROM foo;
 id | integer |      text      |    date    | boolean |    bytea
----+---------+----------------+------------+---------+--------------
  1 |     123 | this is a test | 2005-11-02 | t       | \000\001\002
(1 row)

test=> SELECT id, foo FROM foo;
 id |                           foo
----+---------------------------------------------------------
  1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002")
(1 row)

test=> SELECT id, md5(textin(record_out(foo))) FROM foo;
 id |               md5
----+----------------------------------
  1 | b1cbe3d5ed304f31da57b85258f20c8f
(1 row)

--
Michael Fuhr

Re: md5 hash on table row

From
Jon Lapham
Date:
Michael Fuhr wrote:
> test=> SELECT id, foo FROM foo;
>  id |                           foo
> ----+---------------------------------------------------------
>   1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002")

Perfect!  Wow, in all these years of using PostgreSQL, I did not know
you can "SELECT tablename FROM tablename".

Thanks Michael.

PS: Does anyone know if this very portable to other databases?

-Jon

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


Re: md5 hash on table row

From
Michael Fuhr
Date:
On Wed, Nov 02, 2005 at 09:35:33AM -0700, Michael Fuhr wrote:
> test=> SELECT id, md5(textin(record_out(foo))) FROM foo;
>  id |               md5
> ----+----------------------------------
>   1 | b1cbe3d5ed304f31da57b85258f20c8f

I just noticed that record_out(foo) works only in 8.1.  When I have
more time I'll see if it's possible in earlier versions.

--
Michael Fuhr

Re: md5 hash on table row

From
"A. Kretschmer"
Date:
am  02.11.2005, um  9:35:33 -0700 mailte Michael Fuhr folgendes:
> test=> SELECT id, md5(textin(record_out(foo))) FROM foo;

is 'record_out()' new in 8.1?


Regards, Andreas
--
Andreas Kretschmer    (Kontakt: siehe Header)
Heynitz:  035242/47212,      D1: 0160/7141639
GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net
 ===    Schollglas Unternehmensgruppe    ===

Re: md5 hash on table row

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> I just noticed that record_out(foo) works only in 8.1.  When I have
> more time I'll see if it's possible in earlier versions.

Probably not :-(

2005-05-04 20:19  tgl

    * src/backend/parser/parse_coerce.c: Allow implicit cast from any
    named composite type to RECORD.  At the moment this has no
    particular use except to allow table rows to be passed to
    record_out(), but that case seems to be useful in itself per recent
    example from Elein.  Further down the road we could look at letting
    PL functions be declared to accept RECORD parameters.

            regards, tom lane

Re: md5 hash on table row

From
Michael Fuhr
Date:
On Wed, Nov 02, 2005 at 02:49:57PM -0200, Jon Lapham wrote:
> Michael Fuhr wrote:
> >test=> SELECT id, foo FROM foo;
> > id |                           foo
> >----+---------------------------------------------------------
> >  1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002")
>
> Perfect!  Wow, in all these years of using PostgreSQL, I did not know
> you can "SELECT tablename FROM tablename".

It only works in 8.0 and later.

> PS: Does anyone know if this very portable to other databases?

Dunno, but it doesn't work in MySQL 5.0.15.

--
Michael Fuhr

Re: md5 hash on table row

From
Michael Fuhr
Date:
On Wed, Nov 02, 2005 at 12:18:15PM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > I just noticed that record_out(foo) works only in 8.1.  When I have
> > more time I'll see if it's possible in earlier versions.
>
> Probably not :-(

This works in 8.0.4, although it gives even more decoration:

test=> SELECT id, textin(record_out(row(foo), tableoid)) FROM foo;
 id |                                textin
----+-----------------------------------------------------------------------
  1 | ("(1,123,""this is a test"",2005-11-02,t,""\\\\000\\\\001\\\\002"")")
(1 row)

test=> SELECT id, md5(textin(record_out(row(foo), tableoid))) FROM foo;
 id |               md5
----+----------------------------------
  1 | 3e66ee01b83eeb1a2444df326b75ffe0
(1 row)

--
Michael Fuhr

Re: md5 hash on table row

From
Michael Fuhr
Date:
On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote:
> am  02.11.2005, um  9:35:33 -0700 mailte Michael Fuhr folgendes:
> > test=> SELECT id, md5(textin(record_out(foo))) FROM foo;
>
> is 'record_out()' new in 8.1?

The signature has changed over time:

7.3  record_out(record)
7.4  record_out(record)
8.0  record_out(record,oid)
8.1  record_out(record)

As I already discovered and as Tom pointed out, the cast from foo
to record works only in 8.1.  In 8.0 you can use row(foo), but that
adds even more decoration to the output.

--
Michael Fuhr

Re: md5 hash on table row

From
Bruce Momjian
Date:
Michael Fuhr wrote:
> On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote:
> > am  02.11.2005, um  9:35:33 -0700 mailte Michael Fuhr folgendes:
> > > test=> SELECT id, md5(textin(record_out(foo))) FROM foo;
> >
> > is 'record_out()' new in 8.1?
>
> The signature has changed over time:
>
> 7.3  record_out(record)
> 7.4  record_out(record)
> 8.0  record_out(record,oid)
> 8.1  record_out(record)
>
> As I already discovered and as Tom pointed out, the cast from foo
> to record works only in 8.1.  In 8.0 you can use row(foo), but that
> adds even more decoration to the output.

Where should we document this?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: md5 hash on table row

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote:
>> is 'record_out()' new in 8.1?

> The signature has changed over time:

> 7.3  record_out(record)
> 7.4  record_out(record)
> 8.0  record_out(record,oid)
> 8.1  record_out(record)

BTW, the addition of OID to the signature was a bad idea (read security
hole), and the more recent 8.0.* subreleases ignore it.  So you can just
pass a zero instead of worrying about figuring out the right type OID.
Indeed, Michael's example is formally wrong:

SELECT id, textin(record_out(row(foo), tableoid)) FROM foo;

What record_out is getting here is not a foo; it's a record type that
contains one column that is a foo.  Since tableoid is the type of foo,
it does not correctly describe the record.  This example would probably
crash an 8.0.0 server, because it would believe the OID argument :-(

Here's the CVS log entry:

2005-04-30 16:04  tgl

    * src/backend/utils/adt/rowtypes.c (REL8_0_STABLE): Make record_out
    and record_send extract type information from the passed record
    object itself, rather than relying on a second OID argument to be
    correct.  This patch just changes the function behavior and not the
    catalogs, so it's OK to back-patch to 8.0.  Will remove the
    now-redundant second argument in pg_proc in a separate patch in
    HEAD only.

and for 8.1 we did this:

2005-05-01 14:56  tgl

    * doc/src/sgml/ref/create_type.sgml,
    src/backend/access/common/printtup.c,
    src/backend/bootstrap/bootstrap.c, src/backend/commands/copy.c,
    src/backend/commands/typecmds.c, src/backend/executor/spi.c,
    src/backend/nodes/print.c, src/backend/tcop/fastpath.c,
    src/backend/utils/adt/arrayfuncs.c,
    src/backend/utils/adt/rowtypes.c,
    src/backend/utils/adt/ruleutils.c, src/backend/utils/adt/varlena.c,
    src/backend/utils/cache/lsyscache.c, src/backend/utils/misc/guc.c,
    src/include/utils/lsyscache.h, src/pl/plperl/plperl.c,
    src/pl/plpgsql/src/pl_exec.c, src/pl/tcl/pltcl.c: Change CREATE
    TYPE to require datatype output and send functions to have only one
    argument.  (Per recent discussion, the option to accept multiple
    arguments is pretty useless for user-defined types, and would be a
    likely source of security holes if it was used.)  Simplify call
    sites of output/send functions to not bother passing more than one
    argument.

            regards, tom lane