Thread: md5 hash on table row
Hello all, I would like a generic way to generate an md5 hash for each row of a table. Currently I do it thusly: select id, md5(col_a || col_b || col_c || col_d) from mytable; id | md5 ------+---------------------------------- 1 | 75acee3133f19d1a81ab2e7c1c32eb29 2 | 496f5e8bc945a922fcdd487e1ddde5c5 3 | ace10f4b1408d179da2e93267b300108 4 | bd029a826a98c21ec4c3661cc34657f8 5 | 4bacd2b0f34213a32f911ed5c1240c09 As you can see, I place each field inside the md5() function call. This, however, requires that I know what the table fields are, and it is not very robust (ie: boolean and any other column type that can't use "||" concatenation fails). Is there a better, robust, or more generic way to do this? I have tables with columns of type bytea, boolean, binary, etc... all the difficult column types. I would love something like this: select id, md5(*) from mytable; ...which of course does not work. Thanks for any ideas! -Jon -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
On Wed, Nov 02, 2005 at 11:38:46AM -0200, Jon Lapham wrote: > I would love something like this: > select id, md5(*) from mytable; Is it acceptable to have some decoration around the data being hashed? If so then this example might be useful: test=> SELECT * FROM foo; id | integer | text | date | boolean | bytea ----+---------+----------------+------------+---------+-------------- 1 | 123 | this is a test | 2005-11-02 | t | \000\001\002 (1 row) test=> SELECT id, foo FROM foo; id | foo ----+--------------------------------------------------------- 1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002") (1 row) test=> SELECT id, md5(textin(record_out(foo))) FROM foo; id | md5 ----+---------------------------------- 1 | b1cbe3d5ed304f31da57b85258f20c8f (1 row) -- Michael Fuhr
Michael Fuhr wrote: > test=> SELECT id, foo FROM foo; > id | foo > ----+--------------------------------------------------------- > 1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002") Perfect! Wow, in all these years of using PostgreSQL, I did not know you can "SELECT tablename FROM tablename". Thanks Michael. PS: Does anyone know if this very portable to other databases? -Jon -- -**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*--- Jon Lapham <lapham@jandr.org> Rio de Janeiro, Brasil Personal: http://www.jandr.org/ ***-*--*----*-------*------------*--------------------*---------------
On Wed, Nov 02, 2005 at 09:35:33AM -0700, Michael Fuhr wrote: > test=> SELECT id, md5(textin(record_out(foo))) FROM foo; > id | md5 > ----+---------------------------------- > 1 | b1cbe3d5ed304f31da57b85258f20c8f I just noticed that record_out(foo) works only in 8.1. When I have more time I'll see if it's possible in earlier versions. -- Michael Fuhr
am 02.11.2005, um 9:35:33 -0700 mailte Michael Fuhr folgendes: > test=> SELECT id, md5(textin(record_out(foo))) FROM foo; is 'record_out()' new in 8.1? Regards, Andreas -- Andreas Kretschmer (Kontakt: siehe Header) Heynitz: 035242/47212, D1: 0160/7141639 GnuPG-ID 0x3FFF606C http://wwwkeys.de.pgp.net === Schollglas Unternehmensgruppe ===
Michael Fuhr <mike@fuhr.org> writes: > I just noticed that record_out(foo) works only in 8.1. When I have > more time I'll see if it's possible in earlier versions. Probably not :-( 2005-05-04 20:19 tgl * src/backend/parser/parse_coerce.c: Allow implicit cast from any named composite type to RECORD. At the moment this has no particular use except to allow table rows to be passed to record_out(), but that case seems to be useful in itself per recent example from Elein. Further down the road we could look at letting PL functions be declared to accept RECORD parameters. regards, tom lane
On Wed, Nov 02, 2005 at 02:49:57PM -0200, Jon Lapham wrote: > Michael Fuhr wrote: > >test=> SELECT id, foo FROM foo; > > id | foo > >----+--------------------------------------------------------- > > 1 | (1,123,"this is a test",2005-11-02,t,"\\000\\001\\002") > > Perfect! Wow, in all these years of using PostgreSQL, I did not know > you can "SELECT tablename FROM tablename". It only works in 8.0 and later. > PS: Does anyone know if this very portable to other databases? Dunno, but it doesn't work in MySQL 5.0.15. -- Michael Fuhr
On Wed, Nov 02, 2005 at 12:18:15PM -0500, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > I just noticed that record_out(foo) works only in 8.1. When I have > > more time I'll see if it's possible in earlier versions. > > Probably not :-( This works in 8.0.4, although it gives even more decoration: test=> SELECT id, textin(record_out(row(foo), tableoid)) FROM foo; id | textin ----+----------------------------------------------------------------------- 1 | ("(1,123,""this is a test"",2005-11-02,t,""\\\\000\\\\001\\\\002"")") (1 row) test=> SELECT id, md5(textin(record_out(row(foo), tableoid))) FROM foo; id | md5 ----+---------------------------------- 1 | 3e66ee01b83eeb1a2444df326b75ffe0 (1 row) -- Michael Fuhr
On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote: > am 02.11.2005, um 9:35:33 -0700 mailte Michael Fuhr folgendes: > > test=> SELECT id, md5(textin(record_out(foo))) FROM foo; > > is 'record_out()' new in 8.1? The signature has changed over time: 7.3 record_out(record) 7.4 record_out(record) 8.0 record_out(record,oid) 8.1 record_out(record) As I already discovered and as Tom pointed out, the cast from foo to record works only in 8.1. In 8.0 you can use row(foo), but that adds even more decoration to the output. -- Michael Fuhr
Michael Fuhr wrote: > On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote: > > am 02.11.2005, um 9:35:33 -0700 mailte Michael Fuhr folgendes: > > > test=> SELECT id, md5(textin(record_out(foo))) FROM foo; > > > > is 'record_out()' new in 8.1? > > The signature has changed over time: > > 7.3 record_out(record) > 7.4 record_out(record) > 8.0 record_out(record,oid) > 8.1 record_out(record) > > As I already discovered and as Tom pointed out, the cast from foo > to record works only in 8.1. In 8.0 you can use row(foo), but that > adds even more decoration to the output. Where should we document this? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Michael Fuhr <mike@fuhr.org> writes: > On Wed, Nov 02, 2005 at 06:07:45PM +0100, A. Kretschmer wrote: >> is 'record_out()' new in 8.1? > The signature has changed over time: > 7.3 record_out(record) > 7.4 record_out(record) > 8.0 record_out(record,oid) > 8.1 record_out(record) BTW, the addition of OID to the signature was a bad idea (read security hole), and the more recent 8.0.* subreleases ignore it. So you can just pass a zero instead of worrying about figuring out the right type OID. Indeed, Michael's example is formally wrong: SELECT id, textin(record_out(row(foo), tableoid)) FROM foo; What record_out is getting here is not a foo; it's a record type that contains one column that is a foo. Since tableoid is the type of foo, it does not correctly describe the record. This example would probably crash an 8.0.0 server, because it would believe the OID argument :-( Here's the CVS log entry: 2005-04-30 16:04 tgl * src/backend/utils/adt/rowtypes.c (REL8_0_STABLE): Make record_out and record_send extract type information from the passed record object itself, rather than relying on a second OID argument to be correct. This patch just changes the function behavior and not the catalogs, so it's OK to back-patch to 8.0. Will remove the now-redundant second argument in pg_proc in a separate patch in HEAD only. and for 8.1 we did this: 2005-05-01 14:56 tgl * doc/src/sgml/ref/create_type.sgml, src/backend/access/common/printtup.c, src/backend/bootstrap/bootstrap.c, src/backend/commands/copy.c, src/backend/commands/typecmds.c, src/backend/executor/spi.c, src/backend/nodes/print.c, src/backend/tcop/fastpath.c, src/backend/utils/adt/arrayfuncs.c, src/backend/utils/adt/rowtypes.c, src/backend/utils/adt/ruleutils.c, src/backend/utils/adt/varlena.c, src/backend/utils/cache/lsyscache.c, src/backend/utils/misc/guc.c, src/include/utils/lsyscache.h, src/pl/plperl/plperl.c, src/pl/plpgsql/src/pl_exec.c, src/pl/tcl/pltcl.c: Change CREATE TYPE to require datatype output and send functions to have only one argument. (Per recent discussion, the option to accept multiple arguments is pretty useless for user-defined types, and would be a likely source of security holes if it was used.) Simplify call sites of output/send functions to not bother passing more than one argument. regards, tom lane