Thread: Comparing two PostgreSQL databases -- order of pg_dump output

Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
Hi,

In order to compare the schema of two presumably identical databases, 
I've been diffing the output of pg_dump -Osx.  However, I've found that 
the order of the output is not very reliable.  For example, after 
recreating the Pagila sample database, I find the following:

--- pagila.dmp    2011-08-26 14:34:48.000000000 -0400
+++ pagila.dev-dmp    2011-08-26 14:34:47.000000000 -0400
@@ -1140,7 +1140,7 @@ --
 CREATE TRIGGER last_updated
-    BEFORE UPDATE ON city
+    BEFORE UPDATE ON actor     FOR EACH ROW     EXECUTE PROCEDURE last_updated();

@@ -1160,7 +1160,7 @@ --
 CREATE TRIGGER last_updated
-    BEFORE UPDATE ON customer
+    BEFORE UPDATE ON category     FOR EACH ROW     EXECUTE PROCEDURE last_updated();
...

The same triggers exist on both databases, it's just that the order is 
different (apparently they're output in creation order).  This even more 
crucial with PostGIS databases, which have several hundred function and 
operator pairs where the only difference is one takes arguments of type 
geometry and the other uses type geography.  There the pg_dump diff 
approach is nearly useless.

I thought that comparing database schemas would be quite desirable, 
e.g., between development/test and production databases.  Is there 
perhaps some mechanism or tool that people use for this purpose, or is 
this not a requirement?

Incidentally, these comparisons are for the Pyrseas tools I'm 
developing.  The output of dbtoyaml is predictable (not because of 
anything I wrote, but because pyyaml outputs everything in alphabetical 
order), and I can compare the YAML outputs quite nicely (however, it 
doesn't show me things I haven't implemented yet, e.g., OPERATOR CLASSes 
in the case of PostGIS).

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Jaime Casanova
Date:
On Tue, Aug 30, 2011 at 2:07 PM, Joe Abbate <jma@freedomcircle.com> wrote:
> Hi,
>
> In order to compare the schema of two presumably identical databases, I've
> been diffing the output of pg_dump -Osx.  However, I've found that the order
> of the output is not very reliable.

what about using pg_dump -Fc -Osx and use pg_restore -l to list
objects. then you can sort and compare objects and then a script that
compare schema of objects extracting them with -P, -T or -t

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
Hola Jaime,

On 08/30/2011 03:24 PM, Jaime Casanova wrote:
> what about using pg_dump -Fc -Osx and use pg_restore -l to list
> objects. then you can sort and compare objects and then a script that
> compare schema of objects extracting them with -P, -T or -t

That appears to be of limited use (i.e., it would only work for 
functions, triggers and tables).  pg_restore -L/--use_list is more 
comprehensive.    So the script would have to do something like the following:

$ pg_dump -Fc -Osx postgis > postgis.dump
$ pg_restore -l postgis.dump | sort -k4  > postgis.list
$ pg_restore -L postgis.list postgis.dump > postgis.sorted

Rinse and repeat on the second database and then diff the .sorted files.  Tried it and although it doesn't completely
dothe trick it's much 
 
better than diffing the plain text pg_dump outputs (3000+ diff lines vs. 
less than 200 and about half of that are actual differences).

Thanks,

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Jaime Casanova
Date:
On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate <jma@freedomcircle.com> wrote:
> Hola Jaime,
>
> On 08/30/2011 03:24 PM, Jaime Casanova wrote:
>>
>> what about using pg_dump -Fc -Osx and use pg_restore -l to list
>> objects. then you can sort and compare objects and then a script that
>> compare schema of objects extracting them with -P, -T or -t
>
> That appears to be of limited use (i.e., it would only work for functions,
> triggers and tables).  pg_restore -L/--use_list is more comprehensive.
> So the script would have to do something like the following:
>
> $ pg_dump -Fc -Osx postgis > postgis.dump
> $ pg_restore -l postgis.dump | sort -k4  > postgis.list

why not "sort -k4,5"?

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
On 08/30/2011 05:33 PM, Jaime Casanova wrote:
> On Tue, Aug 30, 2011 at 3:14 PM, Joe Abbate<jma@freedomcircle.com>  wrote:
>> Hola Jaime,
>>
>> On 08/30/2011 03:24 PM, Jaime Casanova wrote:
>>>
>>> what about using pg_dump -Fc -Osx and use pg_restore -l to list
>>> objects. then you can sort and compare objects and then a script that
>>> compare schema of objects extracting them with -P, -T or -t
>>
>> That appears to be of limited use (i.e., it would only work for functions,
>> triggers and tables).  pg_restore -L/--use_list is more comprehensive.
>> So the script would have to do something like the following:
>>
>> $ pg_dump -Fc -Osx postgis>  postgis.dump
>> $ pg_restore -l postgis.dump | sort -k4>  postgis.list
>
> why not "sort -k4,5"?

sort -k4 sorts from the fourth field, the object type, to the end of 
line.  -k4,5 would sort on the type and schema name.  I want to sort on 
object name/attributes as well.  BTW, I figured out why it doesn't fully 
work.  For functions, the arguments are listed, e.g.,

82; 1255 700618 FUNCTION public _st_covers(geography, geography) jma
459; 1255 700259 FUNCTION public _st_covers(geometry, geometry) jma

Unfortunately, for operators, the operand types are not included:

843; 2617 699799 OPERATOR public < jma
1861; 2617 700565 OPERATOR public < jma

so the pg_restore -L still keeps the original dump order (geometry 
before geography).

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Tom Lane
Date:
Joe Abbate <jma@freedomcircle.com> writes:
> In order to compare the schema of two presumably identical databases, 
> I've been diffing the output of pg_dump -Osx.  However, I've found that 
> the order of the output is not very reliable.

Yeah, we've been around on that before.  pg_dump does actually sort the
output items (modulo dependency requirements), but it sorts by the same
"tag" values that are printed by pg_restore -l, and those aren't currently
designed to be unique.  It's not too clear if we could get away with
changing the definitions of the tag strings.
        regards, tom lane


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
On 08/30/2011 06:07 PM, Tom Lane wrote:
> Yeah, we've been around on that before.  pg_dump does actually sort the
> output items (modulo dependency requirements), but it sorts by the same
> "tag" values that are printed by pg_restore -l, and those aren't currently
> designed to be unique.  It's not too clear if we could get away with
> changing the definitions of the tag strings.

The approach suggested by Jaime works fairly well.  The only change I 
would make is to add OPERATOR args to the pg_restore -l output, e.g.,

1843; 2617 699799 OPERATOR public <(geometry, geometry) jma
1861; 2617 700565 OPERATOR public <(geography, geography) jma

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Stephen Frost
Date:
* Joe Abbate (jma@freedomcircle.com) wrote:
> In order to compare the schema of two presumably identical
> databases, I've been diffing the output of pg_dump -Osx.

I'm not sure exactly how it does it, but check_postgres.pl offers this.

http://bucardo.org/wiki/Check_postgres

It also offers a whole slew of other useful things to monitor.
Thanks,
    Stephen

Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
Hi Stephen,

On 08/30/2011 07:11 PM, Stephen Frost wrote:
> * Joe Abbate (jma@freedomcircle.com) wrote:
>> In order to compare the schema of two presumably identical
>> databases, I've been diffing the output of pg_dump -Osx.
>
> I'm not sure exactly how it does it, but check_postgres.pl offers this.
>
> http://bucardo.org/wiki/Check_postgres
>
> It also offers a whole slew of other useful things to monitor.

Note that what I'm looking for is something to compare just about 
EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, 
operators, etc. The description of same_schema appears to imply only a 
subset of objects are compared (in fact, looking at the code, I can 
confirm that limitation).

BTW, I tried installing check_postgres, but not being much into Perl and 
not knowing what dependencies it has, "make test" failed 38/42 tests.

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Note that what I'm looking for is something to compare just about 
> EVERYTHING DDL under the PostgreSQL sun: tables, types, functions, 
> operators, etc. The description of same_schema appears to imply only a 
> subset of objects are compared (in fact, looking at the code, I can 
> confirm that limitation).

You should try the latest version in git (which will soon be released 
as 2.18.0). The same_schema check has been overhauled, and now can also 
store a copy of a databases state to allow checking the same database 
over time to see what has changed. It doesn't check *everything* yet, 
but the only things missing are some of the more obscure items such 
as custom conversions. It should be pretty easy to add in anything 
that is not already covered, even for someone not versed in Perl.

> BTW, I tried installing check_postgres, but not being much into Perl and 
> not knowing what dependencies it has, "make test" failed 38/42 tests.

That's not much to worry about. It's a pretty straightforward script, 
in that it is very easy to determine if it is working for you or not, 
even if some of the tests fail. :)

>> I'm not exactly sure how it does it

check_postgres queries the system catalogs, normalizes some things based 
on the version, and creates a Perl object representation of the database. 
It then compares that to the same thing from a different database/server, 
or to a frozen version of an earlier scan.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201108302203
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk5dl28ACgkQvJuQZxSWSsidhwCeMGEx8eVeaPlyRALuh8VuQ+rN
ynYAoLDGLOFNVbj3+NnRvZpLfgmh6Mgu
=w1eI
-----END PGP SIGNATURE-----



Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Peter Eisentraut
Date:
On tis, 2011-08-30 at 19:11 -0400, Stephen Frost wrote:
> * Joe Abbate (jma@freedomcircle.com) wrote:
> > In order to compare the schema of two presumably identical
> > databases, I've been diffing the output of pg_dump -Osx.  
> 
> I'm not sure exactly how it does it, but check_postgres.pl offers this.
> 
> http://bucardo.org/wiki/Check_postgres

That tool is also not without bugs in this regard.

Also, the interface it works with necessarily doesn't offer a good way
to examine the differences in detail; it only shows you that there are
differences.




Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Peter Eisentraut
Date:
On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote:
> Joe Abbate <jma@freedomcircle.com> writes:
> > In order to compare the schema of two presumably identical databases, 
> > I've been diffing the output of pg_dump -Osx.  However, I've found that 
> > the order of the output is not very reliable.
> 
> Yeah, we've been around on that before.  pg_dump does actually sort the
> output items (modulo dependency requirements), but it sorts by the same
> "tag" values that are printed by pg_restore -l, and those aren't currently
> designed to be unique.  It's not too clear if we could get away with
> changing the definitions of the tag strings.

It's a bit strange that the tag for a trigger is "name" but the tag for
the trigger's comment is "name ON table".  Not having the table name in
the trigger tag sounds wrong, because it makes the tag not very useful
for selecting the trigger from the TOC.




Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On tis, 2011-08-30 at 18:07 -0400, Tom Lane wrote:
>> Yeah, we've been around on that before.  pg_dump does actually sort the
>> output items (modulo dependency requirements), but it sorts by the same
>> "tag" values that are printed by pg_restore -l, and those aren't currently
>> designed to be unique.  It's not too clear if we could get away with
>> changing the definitions of the tag strings.

> It's a bit strange that the tag for a trigger is "name" but the tag for
> the trigger's comment is "name ON table".  Not having the table name in
> the trigger tag sounds wrong, because it makes the tag not very useful
> for selecting the trigger from the TOC.

I don't think changing that would be a problem.  What gets unpleasant is
trying to guarantee that pg_dump object tags are unconditionally unique.
That would, for example, mean that every argument type of every function
would have to be written out fully-schema-qualified.

Short of that sort of anal-retentiveness, there are going to be cases
where the dump order is a bit unpredictable.  IMO what we need is a
reasonable compromise between verbosity and uniqueness, such that in
normal cases (ie, where you *didn't* intentionally create near-identical
functions in different schemas) you get a unique ordering.  To get to
that, somebody's got to go through all the tag writing code and identify
where the trouble spots are.  So far we've heard triggers and operators
nominated ... what else?
        regards, tom lane


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
Joe Abbate
Date:
On 08/31/2011 10:17 AM, Tom Lane wrote:
> Short of that sort of anal-retentiveness, there are going to be cases
> where the dump order is a bit unpredictable.  IMO what we need is a
> reasonable compromise between verbosity and uniqueness, such that in
> normal cases (ie, where you *didn't* intentionally create near-identical
> functions in different schemas) you get a unique ordering.  To get to
> that, somebody's got to go through all the tag writing code and identify
> where the trouble spots are.  So far we've heard triggers and operators
> nominated ... what else?

So far, for Pyrseas, I've tested aggregates, casts, constraint triggers, 
conversions, domains, functions, indexes, languages, operators, rules, 
schemas, sequences, tables (including check constraints, primary keys, 
foreign keys, unique constraints and inherited tables), triggers, types 
(base and composite), views and comments on the various objects.  I'll 
be testing operator classes and operator families in the coming weeks. 
So far, triggers and operators are the only ones that have caused an 
issue when using the technique suggested by Jaime (pg_dump -Fc followed 
by pg_restore -l).  Functions also caused problems in the plain text 
pg_dump, e.g., because funcx(geography) sorts after funcx(geometry) if 
the latter is created first.

Joe


Re: Comparing two PostgreSQL databases -- order of pg_dump output

From
rsindlin
Date:
Hi Joe,

I have run into what seems to be a similar issue with pg_dump --schema-only
in its trigger ordering.  Did you ever find a satisfactory solution to this? 
I posted my specific problem on  DBA.StackExchange
<http://dba.stackexchange.com/questions/123691/output-from-pg-dump-schema-only-has-inconsistent-order> 
, and based on some research I did, it seems like it could be an issue
related to the Collate setting of the DB.  I was wondering if you had come
across anything supporting or refuting that.

Thanks,
-Randall



--
View this message in context:
http://postgresql.nabble.com/Comparing-two-PostgreSQL-databases-order-of-pg-dump-output-tp4751332p5877720.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.