Thread: md5 of table
Is there a way to get an md5 or other hash of an entire table? I want to be able to easily compare 2 tables in different databases. I thought about using dblink and the EXCEPT query, but then I need to know the field list of each query result, which is a pain in the butt. If I could return an md5 of the entire table, then I could check if the tables have the same hash and be confident enough that the tables were identical. Thanks Sim
On Thu, Sep 01, 2011 at 11:47:24AM +0300, Sim Zacks wrote: > Is there a way to get an md5 or other hash of an entire table? > > I want to be able to easily compare 2 tables in different databases. > > I thought about using dblink and the EXCEPT query, but then I need to > know the field list of each query result, which is a pain in the > butt. > > If I could return an md5 of the entire table, then I could check if > the tables have the same hash and be confident enough that the tables > were identical. One option might be to pg_dump in an appropriate format and md5-compare the output ? Another option might be to - cross-check columns/column types - query from information_schema - compare row counts - may need a lock - compare table sizes - may need vaccum ? If all three match that may be good enough ? Karsten -- GPG key ID E4071346 @ gpg-keyserver.de E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
md5 has size limitations, the second approach seems more practical. Στις Thursday 01 September 2011 12:30:45 ο/η Karsten Hilbert έγραψε: > On Thu, Sep 01, 2011 at 11:47:24AM +0300, Sim Zacks wrote: > > > Is there a way to get an md5 or other hash of an entire table? > > > > I want to be able to easily compare 2 tables in different databases. > > > > I thought about using dblink and the EXCEPT query, but then I need to > > know the field list of each query result, which is a pain in the > > butt. > > > > If I could return an md5 of the entire table, then I could check if > > the tables have the same hash and be confident enough that the tables > > were identical. > > One option might be to pg_dump in an appropriate format and > md5-compare the output ? > > Another option might be to > > - cross-check columns/column types > - query from information_schema > - compare row counts > - may need a lock > - compare table sizes > - may need vaccum ? > > If all three match that may be good enough ? > > Karsten > -- > GPG key ID E4071346 @ gpg-keyserver.de > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > -- Achilleas Mantzios
On 09/01/2011 12:26 PM, Pavel Stehule wrote: > Hello > > postgres=# create table tt(a int, b varchar); > CREATE TABLE > postgres=# insert into tt values(10,'hello'); > INSERT 0 1 > > postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from tt; > md5 > ---------------------------------- > 20a92a676f52699e613da1bb114bd6f0 > (1 row) > > Regards > > Pavel Stehule Would be perfect, but 8.2 can't cast a UDT to text. ERROR: cannot cast type tt to text LINE 1: select tt::text from tt Thanks Sim
On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks <sim@compulab.co.il> wrote: > On 09/01/2011 12:26 PM, Pavel Stehule wrote: >> >> Hello >> >> postgres=# create table tt(a int, b varchar); >> CREATE TABLE >> postgres=# insert into tt values(10,'hello'); >> INSERT 0 1 >> >> postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from I do that as well, but it might have questionable performance when your table has 16M rows, and is 50GB + -- GJ
body p { margin-bottom: 0cm; margin-top: 0pt; } <body style="direction: ltr;" bidimailui-detected-decoding-type="UTF-8" bgcolor="#ffffff" text="#000000"> <blockquote cite="mid:CAFj8pRCf0tsUQs307Xoj1SiPjbdm8yjhKQkn=X=bQGUAUUMpGA@mail.gmail.com" type="cite"> I am not sure if this will work, but you can try it http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks#Cast_to_varchar Pavel I appreciate your help, but UDTs don't have input/ouput functions unless you define them manually and I need this for all of my tables. Thanks Sim
<meta content=3D"text/html; charset=3DUTF-8" http-equiv=3D"Content-Type= "> body p { margin-bottom: 0cm; margin-top: 0pt; } <body style=3D"direction: ltr;" bidimailui-detected-decoding-type=3D"UTF-8" bgcolor=3D"#ffffff" text=3D"#000000"> On 09/01/2011 01:35 PM, Grzegorz Ja=C5=9Bkiewicz wrote: <blockquote cite=3D"mid:CAJY59_h6TqWfh3NKxfDCq1Pb9hBcYGBqxb5h+czWcgLj4ETeGQ@mail.gmail.= com" type=3D"cite"> On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks <a class=3D= "moz-txt-link-rfc2396E" href=3D"mailto:sim@compulab.co.il"><sim@compulab= .co.il> wrote: On 09/01/2011 12:26 PM, Pavel Stehule wrote: Hello postgres=3D# create table tt(a int, b varchar); CREATE TABLE postgres=3D# insert into tt values(10,'hello'); INSERT 0 1 postgres=3D# select md5(array_to_string(array_agg(md5(tt::text)),'')) from I do that as well, but it might have questionable perf= ormance when your table has 16M rows, and is 50GB + I don't need performance. This is for regression testing for a new database version. I want to run my functions in the old db and the new db and when it modifies a table, I want to be able to check that the tables are the same. Sim
On Thu, Sep 1, 2011 at 3:48 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote: > md5 has size limitations, the second approach seems more practical. Really? I was not aware of size limits of md5, what are they?
Στις Thursday 01 September 2011 15:50:21 ο/η Scott Marlowe έγραψε: > On Thu, Sep 1, 2011 at 3:48 AM, Achilleas Mantzios > <achill@matrix.gatewaynet.com> wrote: > > md5 has size limitations, the second approach seems more practical. > > Really? I was not aware of size limits of md5, what are they? > sorry, i was wrong. i dont know why i had this impression, just checked with a 43GB table on a freebsd machine and went fine. -- Achilleas Mantzios
2011/9/1 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks <sim@compulab.co.il> wrote: >> On 09/01/2011 12:26 PM, Pavel Stehule wrote: >>> >>> Hello >>> >>> postgres=# create table tt(a int, b varchar); >>> CREATE TABLE >>> postgres=# insert into tt values(10,'hello'); >>> INSERT 0 1 >>> >>> postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from > I do that as well, but it might have questionable performance when > your table has 16M rows, and is 50GB + you need order by for that to work. I would do it like this: select md5(array(select foo from foo order by foo_pkey)::text); it's great quick'n'dirty, but not much scalable beyond millions. OP: > I thought about using dblink and the EXCEPT query, but then I need to know the field list of each query result, which isa pain in the butt. That is not correct. As long as the table definitions are precisely the same, you can move records across dblink without specifying fields. You do this by using record type for the composite which dblink sends across as text. merlin
> OP: >> I thought about using dblink and the EXCEPT query, but then I need to know the field list of each query result, whichis a pain in the butt. > That is not correct. As long as the table definitions are precisely > the same, you can move records across dblink without specifying > fields. You do this by using record type for the composite which > dblink sends across as text. > > merlin Do you have a quick example? This is what I have tried: select * from tbla except select * from dblink('host=dbhost dbname=otherdb user=myuser password=mypwd'::text, 'select * from tbla') The error I get back is: ERROR: a column definition list is required for functions returning "record" Sim
2011/9/1 Merlin Moncure <mmoncure@gmail.com>: > 2011/9/1 Grzegorz Jaśkiewicz <gryzman@gmail.com>: >> On Thu, Sep 1, 2011 at 11:14 AM, Sim Zacks <sim@compulab.co.il> wrote: >>> On 09/01/2011 12:26 PM, Pavel Stehule wrote: >>>> >>>> Hello >>>> >>>> postgres=# create table tt(a int, b varchar); >>>> CREATE TABLE >>>> postgres=# insert into tt values(10,'hello'); >>>> INSERT 0 1 >>>> >>>> postgres=# select md5(array_to_string(array_agg(md5(tt::text)),'')) from >> I do that as well, but it might have questionable performance when >> your table has 16M rows, and is 50GB + > > you need order by for that to work. I would do it like this: > select md5(array(select foo from foo order by foo_pkey)::text); > > it's great quick'n'dirty, but not much scalable beyond millions. I've always liked doing this with my pager: [robert@client-168] export PAGER=md5 -=[11:40:25 Thu Sep 01]=---=[ pagila-0.10.1 ]=- [robert@client-168] psql -hlocalhost -dpagila psql (9.0.4, server 9.1beta3) WARNING: psql version 9.0, server version 9.1. Some psql features might not work. Type "help" for help. pagila=# select * from actor order by actor_id; f381ebdefe0aada9c0bc14e657962c1f Robert Treat conjecture: xzilla.net consulting: omniti.com
2011/9/1 Sim Zacks <sim@compulab.co.il>: > >> OP: >>> >>> I thought about using dblink and the EXCEPT query, but then I need to >>> know the field list of each query result, which is a pain in the butt. >> >> That is not correct. As long as the table definitions are precisely >> the same, you can move records across dblink without specifying >> fields. You do this by using record type for the composite which >> dblink sends across as text. >> >> merlin > > Do you have a quick example? This is what I have tried: > > select * from tbla > except > select * from dblink('host=dbhost dbname=otherdb user=myuser > password=mypwd'::text, 'select * from tbla') > > The error I get back is: > ERROR: a column definition list is required for functions returning > "record" sure: select tbla from tbla except select t::tbla from dblink('host=dbhost dbname=otherdb user=myuser password=mypwd'::text, 'select tbla::text from tbla') R(t text); there's a bunch of ways to do that -- you can also do the md5 on the remote side so you can just send the digests. select * from tbla except select (t::tbla).* from dblink('host=dbhost dbname=otherdb user=myuser password=mypwd'::text, 'select tbla::text from tbla') R(t text); should also work. This *might* work -- I didn't try. It's been a while since I've used stock dblink. select * from tbla except select (t).* from dblink('host=dbhost dbname=otherdb user=myuser password=mypwd'::text, 'select tbla from tbla') R(t tbla); merlin
On Thu, Sep 1, 2011 at 7:56 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote: > Στις Thursday 01 September 2011 15:50:21 ο/η Scott Marlowe έγραψε: >> Really? I was not aware of size limits of md5, what are they? >> > > sorry, i was wrong. i dont know why i had this impression, > just checked with a 43GB table on a freebsd machine and went fine. Well, it might have been an older version or a 32 bit version or something you were thinking of. Sometimes a long memory helps us get the wrong answer. :)
On Thursday 01 September 2011 11:47:24 Sim Zacks wrote: > Is there a way to get an md5 or other hash of an entire table? > > I want to be able to easily compare 2 tables in different databases. > > I thought about using dblink and the EXCEPT query, but then I need to > know the field list of each query result, which is a pain in the butt. > > If I could return an md5 of the entire table, then I could check if the > tables have the same hash and be confident enough that the tables were > identical. > > > Thanks > Sim You might also want to take a look at http://pgfoundry.org/projects/pg-comparator/ which can give a more nuanced view of db differences and tries to be smart about performance. It looks a bit stale; I haven't used it in ages, but it used to be a trusty part of our test suite. -- Vincent de Phily
>>>> I thought about using dblink and the EXCEPT query, but then I need to >>>> know the field list of each query result, which is a pain in the butt. >>> That is not correct. As long as the table definitions are precisely >>> the same, you can move records across dblink without specifying >>> fields. You do this by using record type for the composite which >>> dblink sends across as text. >>> >>> merlin >> Do you have a quick example? This is what I have tried: >> >> select * from tbla >> except >> select * from dblink('host=dbhost dbname=otherdb user=myuser >> password=mypwd'::text, 'select * from tbla') >> >> The error I get back is: >> ERROR: a column definition list is required for functions returning >> "record" > sure: > select tbla from tbla > except > select t::tbla from dblink('host=dbhost dbname=otherdb user=myuser > password=mypwd'::text, 'select tbla::text from tbla') R(t text); We tried something like that. Unfortunately, in 8.2 you can't cast a row type as text. > there's a bunch of ways to do that -- you can also do the md5 on the > remote side so you can just send the digests. > select * from tbla > except > select (t::tbla).* from dblink('host=dbhost dbname=otherdb user=myuser > password=mypwd'::text, 'select tbla::text from tbla') R(t text); > > should also work. > > This *might* work -- I didn't try. It's been a while since I've used > stock dblink. > select * from tbla > except > select (t).* from dblink('host=dbhost dbname=otherdb user=myuser > password=mypwd'::text, 'select tbla from tbla') R(t tbla); > > merlin This looks like it might work for us. At least I would only need the table name for the field list instead of the entire column list.