Re: Is there an md5sum for tables? - Mailing list pgsql-general

From Steve Atkins
Subject Re: Is there an md5sum for tables?
Date
Msg-id 818D0EEB-0F3C-4C08-B096-B0539528A4E2@blighty.com
Whole thread Raw
In response to Is there an md5sum for tables?  (Michael Enke <michael.enke@wincor-nixdorf.com>)
List pgsql-general
On Apr 1, 2008, at 8:56 PM, Michael Enke wrote:
> Hi all,
> I need to know if multiple tables have (may have most probably)
> identical content.
> Since I want a fast solution (which means not comparing tables row
> by row),
> I thought it would be a good idea to have an sql function operating
> on a table or view
> similar to md5sum on a file and only compare the generated hashes.
> I did not find such a function in the documentation.
>
> Is such a function already available?

No, but you could probably build it fairly easily from parts that
are already there.

>
> If not, does it make no sense to use such an approach?

Possibly, but generating the hash might be time consuming,
as you'd need to iterate through the whole table, probably
in a specific order, to generate the hash. So every time you
modified the table you'd have to do something like
"select * from table order by <something unique>" to
maintain a current hash. (If you used a weak hash, something
like XOR of the md5 of each row, then you could do it cheaply
incrementally - and that might even be a usable hash if there's
a unique constraint on some field).

Overall, though, this is a very non-database thing to want to
do. Iit sounds like you may have made some bad
decisions mapping your data into the database. Usually
you'd abstract out common data and refer to it via a foreign
key, or somesuch.

What's the actual application?

Cheers,
   Steve




pgsql-general by date:

Previous
From: carty mc
Date:
Subject: Re: dblink ,dblink_exec not participating in a Transaction??
Next
From: "Ian Sillitoe"
Date:
Subject: (FAQ?) JOIN condition - 'WHERE NULL = NULL'