Thread: Postgres schema comparison.
Hi all, I've got a master database with many other databases that all have (or is supposed to have) the same exact same schema as the master database (the master database is basically an empty template database containing the schema definition). The problem is that none of the schemas actually match the master schema. e.g. missing columns, columns not in the correct order (attnum), missing indexes and primary keys, and in severe cases, missing sequences and tables. I have the wonderful job of re-synch'ing all the schemas out there not conforming to the master. I've looked everywhere for something that will help doing this. I'm specifically looking for a way to do a sumcheck or something similar on tables and/or schema as a whole to be able to do a table comparison with the master database. It will be a bonus to pick up exactly what is missing, but for now, just identifying differences is what I want to achieve. I'm using postgres 7.3 mostly, but I may want to use this for 7.4 and 8.0 databases as well. Has anybody got some suggestions of what I can do or use to do this. TIA Kind Regards Stefan
Hi, Stef, Stef schrieb: > It will be a bonus to pick up exactly what is missing, but for now, just identifying > differences is what I want to achieve. I'm using postgres 7.3 mostly, but > I may want to use this for 7.4 and 8.0 databases as well. > > Has anybody got some suggestions of what I can do or use to do this. There are (at least) two independently developed pgdiff applications, they can be found at: http://pgdiff.sourceforge.net/ http://gborg.postgresql.org/project/pgdiff/projdisplay.php I did not try the first one, but the latter one worked on some of my datas, but fails on others. I filed a bug report some time ago, but got no answer, so I'm afraid this tool currently is unmaintained: http://gborg.postgresql.org/project/pgdiff/bugs/bugupdate.php?895 But maybe a pg_dump --schema-only on all the databases, and then manually diffing the files may already fulfil your needs. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On Mar 7, 2005, at 4:33 AM, Stef wrote: > I have the wonderful job of re-synch'ing all the schemas out there not > conforming to the master. I've looked everywhere for something that > will help doing this. I'm specifically looking for a way to do a > sumcheck > or something similar on tables and/or schema as a whole to be able to > do a table comparison with the master database. > Develop a function that builds a string describing the tables/schemas you want to compare. Then have your function return the md5 sum of the string as the result. This will give you a 32 character value you can use to determine if there is a mismatch. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
Markus Schaber mentioned : => There are (at least) two independently developed pgdiff applications, => they can be found at: => => http://pgdiff.sourceforge.net/ => => http://gborg.postgresql.org/project/pgdiff/projdisplay.php Thanks a lot! => I did not try the first one, but the latter one worked on some of my => datas, but fails on others. I filed a bug report some time ago, but got => no answer, so I'm afraid this tool currently is unmaintained: => http://gborg.postgresql.org/project/pgdiff/bugs/bugupdate.php?895 => => But maybe a pg_dump --schema-only on all the databases, and then => manually diffing the files may already fulfil your needs. I've tested something similar, that seems to work ok for me for now : pg_dump -s -t [TABLE] [DBNAME] | grep -v "^--" | md5sum The problem I have with this, is that I have to run the command per table, and seeing that I have over 500 tables in each database, this takes quite a long time. I'll test some of the above pgdiffs, and see if either can do it better. Kind Regards Stefan
Hi, Stef, Stef schrieb: > The problem I have with this, is that I have to run the command per table, > and seeing that I have over 500 tables in each database, this takes quite a > long time. Some weeks ago, I posted here a script that uses psql to create split dumps. Maybe you can reuse some of its logics to create per-table md5sums for all tables in a database automatically. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Attachment
Stef <svb@ucs.co.za> writes: > Markus Schaber mentioned : > => But maybe a pg_dump --schema-only on all the databases, and then > => manually diffing the files may already fulfil your needs. > I've tested something similar, that seems to work ok for me for now : > pg_dump -s -t [TABLE] [DBNAME] | grep -v "^--" | md5sum > The problem I have with this, is that I have to run the command per table, Why? If the problem is varying order of table declarations, try 8.0's pg_dump. regards, tom lane
John DeSoi mentioned : => Develop a function that builds a string describing the tables/schemas => you want to compare. Then have your function return the md5 sum of the => string as the result. This will give you a 32 character value you can => use to determine if there is a mismatch. OK, this may be exactly what I need. I've compiled and installed contrib/pgcrypto and I want to use either one of : Result data type | Schema | Name | Argument data types ------------------+--------+--------+--------------------- bytea | public | digest | bytea, text bytea | public | digest | text, text Is it possible to somehow pass the output of : "\d [TABLE NAME]" to this function? If not, what would return me consistent text that will describe the columns, indexes and primary keys of a table? Kind Regards Stefan
Markus Schaber mentioned : => Some weeks ago, I posted here a script that uses psql to create split => dumps. Maybe you can reuse some of its logics to create per-table => md5sums for all tables in a database automatically. Thanks, but I've got something very similar to this already. I almost thought you managed to split the output of the single schema dump of "pg_dump --schema-only" onto portions belonging to the various tables. That would be very impressive :) Kind Regards Stefan
Tom Lane mentioned : => > The problem I have with this, is that I have to run the command per table, => => Why? => => If the problem is varying order of table declarations, try 8.0's => pg_dump. Yes, this will solve the global schema check, but I will still need to split it into "per table" dumps , to do "per table" comparisons. Kind Regards Stefan
On Mar 7, 2005, at 10:09 AM, Stef wrote: > Is it possible to somehow pass the output of : "\d [TABLE NAME]" > to this function? If not, what would return me consistent text > that will describe the columns, indexes and primary keys of a table? > I'm not sure you can use \d directly, but if you startup psql with the -E option it will show you all the SQL it is using to run the \d command. It should be fairly easy to get the strings you need from the results of running a similar query. The psql source is a good place to look also. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
John DeSoi mentioned : => I'm not sure you can use \d directly, but if you startup psql with the => -E option it will show you all the SQL it is using to run the \d => command. It should be fairly easy to get the strings you need from the => results of running a similar query. The psql source is a good place to => look also. Sometimes you just need to see things from a different perspective. Thanks! Here's my final solution that runs in less than a minute for +- 543 tables : for x in $(psql -tc "select relname from pg_class where relkind = 'r' and relname not like 'pg_%'") do echo "$(psql -tc "select encode(digest('$(psql -c '\d '${x}'' mer9188_test | tr -d \"\'\")', 'md5'), 'hex')" mer9188_test| grep -v "^$"|tr -d " "):${x}" done > compare_list.lst
Stef, I use dblink to attach to both databases and query pg_namespace, pg_class, pg_attribute ... to get the diffs. See attached as an example. look for the dblink_connect lines to specify your database. You will need to install contrib/dblink. I used this with 7.4.X series and have NOT tested yet with 8.0.X. You can adjust the output to fit your needs. Jim ---------- Original Message ----------- From: Stef <svb@ucs.co.za> To: Tom Lane <tgl@sss.pgh.pa.us> Cc: pgsql-ADMIN@postgresql.org, pgsql-sql@postgresql.org Sent: Mon, 7 Mar 2005 17:31:55 +0200 Subject: Re: [ADMIN] [SQL] Postgres schema comparison. > Tom Lane mentioned : > => > The problem I have with this, is that I have to run the command per table, > => > => Why? > => > => If the problem is varying order of table declarations, try 8.0's > => pg_dump. > > Yes, this will solve the global schema check, but I will still need to split > it into "per table" dumps , to do "per table" comparisons. > > Kind Regards > Stefan > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly ------- End of Original Message -------
Attachment
Jim Buttafuoco mentioned : => I use dblink to attach to both databases and query pg_namespace, pg_class, pg_attribute ... to get the diffs. See => attached as an example. look for the dblink_connect lines to specify your database. You will need to install => contrib/dblink. I used this with 7.4.X series and have NOT tested yet with 8.0.X. Thanks! This is something I haven't even thought of. Only some of the machines have dblink installed at the moment, but that's the same work as having to install pgcrypto everywhere. This is actually more thorough. It seems to be working with some minor changes on 7.3 (The dblink functions don't allow multiple connections, and take only one argument, so I created temp tables in stead). This is actually very fast. Thanks again. Kind Regards Stefan
Hi all, If anyone is interested, here's the final solution that I'm using to build a list of tables and their md5sums based on what the psql interface queries when you do '\d [TABLE NAME]' I attached the function I created, and this is the SQL I run : select relname||':'||get_table_checksum(relname) from pg_class where relkind = 'r' and relname not like ('pg_%') and relnamenot like ('sql_%') order by relname; This gives the same result for a specific table across all versions of postgres >= 7.3, and runs for a minute or so for +- 450 tables on my machine. It may break if you have some exotic definitions that I didn't test for, but I think it's pretty solid as it is here. Kind Regards Stefan Stef mentioned : => Here's my final solution that runs in less than a minute for +- 543 tables : => for x in $(psql -tc "select relname from pg_class where relkind = 'r' and relname not like 'pg_%'") => do => echo "$(psql -tc "select encode(digest('$(psql -c '\d '${x}'' mer9188_test | tr -d \"\'\")', 'md5'), 'hex')" mer9188_test| grep -v "^$"|tr -d " "):${x}" => done > compare_list.lst
Attachment
Hi, > I have the wonderful job of re-synch'ing all the schemas out there not > conforming to the master. I've looked everywhere for something that > will help doing this. I'm specifically looking for a way to do a sumcheck > or something similar on tables and/or schema as a whole to be able to > do a table comparison with the master database. > > It will be a bonus to pick up exactly what is missing, but for now, just identifying > differences is what I want to achieve. I'm using postgres 7.3 mostly, but > I may want to use this for 7.4 and 8.0 databases as well. > > Has anybody got some suggestions of what I can do or use to do this. I've made a simple PHP script which compares the schemas of two databases. It was made in a hurry and is far from being complete, but it works for my purposes :) If you want give it a try, let me know Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com