Thread: Comparing two PostgreSQL databases -- order of pg_dump output
Hi, In order to compare the schema of two presumably identical databases, I've been diffing the output of pg_dump -Osx. However, I've found that the order of the output is not very reliable. For example, after recreating the Pagila sample database, I find the following: --- pagila.dmp 2011-08-26 14:34:48.000000000 -0400 +++ pagila.dev-dmp 2011-08-26 14:34:47.000000000 -0400 @@ -1140,7 +1140,7 @@ -- CREATE TRIGGER last_updated - BEFORE UPDATE ON city + BEFORE UPDATE ON actor FOR EACH ROW EXECUTE PROCEDURE last_updated(); @@ -1160,7 +1160,7 @@ -- CREATE TRIGGER last_updated - BEFORE UPDATE ON customer + BEFORE UPDATE ON category FOR EACH ROW EXECUTE PROCEDURE last_updated(); ... The same triggers exist on both databases, it's just that the order is different (apparently they're output in creation order). This even more crucial with PostGIS databases, which have several hundred function and operator pairs where the only difference is one takes arguments of type geometry and the other uses type geography. There the pg_dump diff approach is nearly useless. I thought that comparing database schemas would be quite desirable, e.g., between development/test and production databases. Is there perhaps some mechanism or tool that people use for this purpose, or is this not a requirement? Incidentally, these comparisons are for the Pyrseas tools I'm developing. The output of dbtoyaml is predictable (not because of anything I wrote, but because pyyaml outputs everything in alphabetical order), and I can compare the YAML outputs quite nicely (however, it doesn't show me things I haven't implemented yet, e.g., OPERATOR CLASSes in the case of PostGIS). Joe
On Tue, Aug 30, 2011 at 2:07 PM, Joe Abbate <jma@freedomcircle.com> wrote: > Hi, > > In order to compare the schema of two presumably identical databases, I've > been diffing the output of pg_dump -Osx. However, I've found that the order > of the output is not very reliable. what about using pg_dump -Fc -Osx and use pg_restore -l to list objects. then you can sort and compare objects and then a script that compare schema of objects extracting them with -P, -T or -t -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación
Hola Jaime, On 08/30/2011 03:24 PM, Jaime Casanova wrote: > what about using pg_dump -Fc -Osx and use pg_restore -l to list > objects. then you can sort and compare objects and then a script that > compare schema of objects extracting them with -P, -T or -t That appears to be of limited use (i.e., it would only work for functions, triggers and tables). pg_restore -L/--use_list is more comprehensive. So the script would have to do something like the following: $ pg_dump -Fc -Osx postgis > postgis.dump $ pg_restore -l postgis.dump | sort -k4 > postgis.list $ pg_restore -L postgis.list postgis.dump > postgis.sorted Rinse and repeat on the second database and then diff the .sorted files. Tried it and although it doesn't completely dothe trick it's much better than diffing the plain text pg_dump outputs (3000+ diff lines vs. less than 200 and about half of that are actual differences). Thanks, Joe
On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate <jma@freedomcircle.com> wrote: > Hola Jaime, > > On 08/30/2011 03:24 PM, Jaime Casanova wrote: >> >> what about using pg_dump -Fc -Osx and use pg_restore -l to list >> objects. then you can sort and compare objects and then a script that >> compare schema of objects extracting them with -P, -T or -t > > That appears to be of limited use (i.e., it would only work for functions, > triggers and tables). pg_restore -L/--use_list is more comprehensive. > So the script would have to do something like the following: > > $ pg_dump -Fc -Osx postgis > postgis.dump > $ pg_restore -l postgis.dump | sort -k4 > postgis.list why not "sort -k4,5"? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación
On 08/30/2011 05:33 PM, Jaime Casanova wrote: > On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate<jma@freedomcircle.com> wrote: >> Hola Jaime, >> >> On 08/30/2011 03:24 PM, Jaime Casanova wrote: >>> >>> what about using pg_dump -Fc -Osx and use pg_restore -l to list >>> objects. then you can sort and compare objects and then a script that >>> compare schema of objects extracting them with -P, -T or -t >> >> That appears to be of limited use (i.e., it would only work for functions, >> triggers and tables). pg_restore -L/--use_list is more comprehensive. >> So the script would have to do something like the following: >> >> $ pg_dump -Fc -Osx postgis> postgis.dump >> $ pg_restore -l postgis.dump | sort -k4> postgis.list > > why not "sort -k4,5"? sort -k4 sorts from the fourth field, the object type, to the end of line. -k4,5 would sort on the type and schema name. I want to sort on object name/attributes as well. BTW, I figured out why it doesn't fully work. For functions, the arguments are listed, e.g., 82; 1255 700618 FUNCTION public _st_covers(geography, geography) jma 459; 1255 700259 FUNCTION public _st_covers(geometry, geometry) jma Unfortunately, for operators, the operand types are not included: 843; 2617 699799 OPERATOR public < jma 1861; 2617 700565 OPERATOR public < jma so the pg_restore -L still keeps the original dump order (geometry before geography). Joe
Joe Abbate <jma@freedomcircle.com> writes: > In order to compare the schema of two presumably identical databases, > I've been diffing the output of pg_dump -Osx. However, I've found that > the order of the output is not very reliable. Yeah, we've been around on that before. pg_dump does actually sort the output items (modulo dependency requirements), but it sorts by the same "tag" values that are printed by pg_restore -l, and those aren't currently designed to be unique. It's not too clear if we could get away with changing the definitions of the tag strings. regards, tom lane
On 08/30/2011 06:07 PM, Tom Lane wrote: > Yeah, we've been around on that before. pg_dump does actually sort the > output items (modulo dependency requirements), but it sorts by the same > "tag" values that are printed by pg_restore -l, and those aren't currently > designed to be unique. It's not too clear if we could get away with > changing the definitions of the tag strings. The approach suggested by Jaime works fairly well. The only change I would make is to add OPERATOR args to the pg_restore -l output, e.g., 1843; 2617 699799 OPERATOR public <(geometry, geometry) jma 1861; 2617 700565 OPERATOR public <(geography, geography) jma Joe
* Joe Abbate (jma@freedomcircle.com) wrote: > In order to compare the schema of two presumably identical > databases, I've been diffing the output of pg_dump -Osx. I'm not sure exactly how it does it, but check_postgres.pl offers this. http://bucardo.org/wiki/Check_postgres It also offers a whole slew of other useful things to monitor. Thanks, Stephen
Hi Stephen, On 08/30/2011 07:11 PM, Stephen Frost wrote: > * Joe Abbate (jma@freedomcircle.com) wrote: >> In order to compare the schema of two presumably identical >> databases, I've been diffing the output of pg_dump -Osx. > > I'm not sure exactly how it does it, but check_postgres.pl offers this. > > http://bucardo.org/wiki/Check_postgres > > It also offers a whole slew of other useful things to monitor. Note that what I'm looking for is something to compare just about EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, operators, etc. The description of same_schema appears to imply only a subset of objects are compared (in fact, looking at the code, I can confirm that limitation). BTW, I tried installing check_postgres, but not being much into Perl and not knowing what dependencies it has, "make test" failed 38/42 tests. Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Note that what I'm looking for is something to compare just about > EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, > operators, etc. The description of same_schema appears to imply only a > subset of objects are compared (in fact, looking at the code, I can > confirm that limitation). You should try the latest version in git (which will soon be released as 2.18.0). The same_schema check has been overhauled, and now can also store a copy of a databases state to allow checking the same database over time to see what has changed. It doesn't check *everything* yet, but the only things missing are some of the more obscure items such as custom conversions. It should be pretty easy to add in anything that is not already covered, even for someone not versed in Perl. > BTW, I tried installing check_postgres, but not being much into Perl and > not knowing what dependencies it has, "make test" failed 38/42 tests. That's not much to worry about. It's a pretty straightforward script, in that it is very easy to determine if it is working for you or not, even if some of the tests fail. :) >> I'm not exactly sure how it does it check_postgres queries the system catalogs, normalizes some things based on the version, and creates a Perl object representation of the database. It then compares that to the same thing from a different database/server, or to a frozen version of an earlier scan. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201108302203 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk5dl28ACgkQvJuQZxSWSsidhwCeMGEx8eVeaPlyRALuh8VuQ+rN ynYAoLDGLOFNVbj3+NnRvZpLfgmh6Mgu =w1eI -----END PGP SIGNATURE-----
On tis, 2011-08-30 at 19:11 -0400, Stephen Frost wrote: > * Joe Abbate (jma@freedomcircle.com) wrote: > > In order to compare the schema of two presumably identical > > databases, I've been diffing the output of pg_dump -Osx. > > I'm not sure exactly how it does it, but check_postgres.pl offers this. > > http://bucardo.org/wiki/Check_postgres That tool is also not without bugs in this regard. Also, the interface it works with necessarily doesn't offer a good way to examine the differences in detail; it only shows you that there are differences.
On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote: > Joe Abbate <jma@freedomcircle.com> writes: > > In order to compare the schema of two presumably identical databases, > > I've been diffing the output of pg_dump -Osx. However, I've found that > > the order of the output is not very reliable. > > Yeah, we've been around on that before. pg_dump does actually sort the > output items (modulo dependency requirements), but it sorts by the same > "tag" values that are printed by pg_restore -l, and those aren't currently > designed to be unique. It's not too clear if we could get away with > changing the definitions of the tag strings. It's a bit strange that the tag for a trigger is "name" but the tag for the trigger's comment is "name ON table". Not having the table name in the trigger tag sounds wrong, because it makes the tag not very useful for selecting the trigger from the TOC.
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote: >> Yeah, we've been around on that before. pg_dump does actually sort the >> output items (modulo dependency requirements), but it sorts by the same >> "tag" values that are printed by pg_restore -l, and those aren't currently >> designed to be unique. It's not too clear if we could get away with >> changing the definitions of the tag strings. > It's a bit strange that the tag for a trigger is "name" but the tag for > the trigger's comment is "name ON table". Not having the table name in > the trigger tag sounds wrong, because it makes the tag not very useful > for selecting the trigger from the TOC. I don't think changing that would be a problem. What gets unpleasant is trying to guarantee that pg_dump object tags are unconditionally unique. That would, for example, mean that every argument type of every function would have to be written out fully-schema-qualified. Short of that sort of anal-retentiveness, there are going to be cases where the dump order is a bit unpredictable. IMO what we need is a reasonable compromise between verbosity and uniqueness, such that in normal cases (ie, where you *didn't* intentionally create near-identical functions in different schemas) you get a unique ordering. To get to that, somebody's got to go through all the tag writing code and identify where the trouble spots are. So far we've heard triggers and operators nominated ... what else? regards, tom lane
On 08/31/2011 10:17 AM, Tom Lane wrote: > Short of that sort of anal-retentiveness, there are going to be cases > where the dump order is a bit unpredictable. IMO what we need is a > reasonable compromise between verbosity and uniqueness, such that in > normal cases (ie, where you *didn't* intentionally create near-identical > functions in different schemas) you get a unique ordering. To get to > that, somebody's got to go through all the tag writing code and identify > where the trouble spots are. So far we've heard triggers and operators > nominated ... what else? So far, for Pyrseas, I've tested aggregates, casts, constraint triggers, conversions, domains, functions, indexes, languages, operators, rules, schemas, sequences, tables (including check constraints, primary keys, foreign keys, unique constraints and inherited tables), triggers, types (base and composite), views and comments on the various objects. I'll be testing operator classes and operator families in the coming weeks. So far, triggers and operators are the only ones that have caused an issue when using the technique suggested by Jaime (pg_dump -Fc followed by pg_restore -l). Functions also caused problems in the plain text pg_dump, e.g., because funcx(geography) sorts after funcx(geometry) if the latter is created first. Joe
Hi Joe, I have run into what seems to be a similar issue with pg_dump --schema-only in its trigger ordering. Did you ever find a satisfactory solution to this? I posted my specific problem on DBA.StackExchange <http://dba.stackexchange.com/questions/123691/output-from-pg-dump-schema-only-has-inconsistent-order> , and based on some research I did, it seems like it could be an issue related to the Collate setting of the DB. I was wondering if you had come across anything supporting or refuting that. Thanks, -Randall -- View this message in context: http://postgresql.nabble.com/Comparing-two-PostgreSQL-databases-order-of-pg-dump-output-tp4751332p5877720.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.