Thread: pg_dump formatting

pg_dump formatting

From

Caleb Cushing

Date:

20 March 2009, 01:35:44

is there any way to format the output of pg_dump? in a way that is
more diff-able? like line wrap on each row, record or something? that
way you can easily find the differences of each dump if need be?

--
Caleb Cushing

http://xenoterracide.blogspot.com

Re: pg_dump formatting

From

Bruce Momjian

Date:

20 March 2009, 11:31:41

Caleb Cushing wrote:
> is there any way to format the output of pg_dump? in a way that is
> more diff-able? like line wrap on each row, record or something? that
> way you can easily find the differences of each dump if need be?

No, but there are database comparison tools to do that, pgdiff being one
of them:

    http://pgdiff.sourceforge.net/

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: pg_dump formatting

From

Caleb Cushing

Date:

20 March 2009, 16:47:27

On Fri, Mar 20, 2009 at 10:31 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> No, but there are database comparison tools to do that, pgdiff being one
> of them:

after reading a bit on it (docs seem a bit light but I also looked at
the mysql one), I don't think it solves my problem at all. it solves a
similar one. the way it reads suggests I have access to both
databases, I don't. I only have access to 1 (set up on my box from the
dump) and the dump from the other.

at work we dump our databases and add them to the git repository. and
pass the source back and forth, (I think it's a bit inefficient, but
I'm not the boss).

the problem is 2 fold, 1 it's impossible to get a human readable dump
diff, 2 the same reason is going to 'cause the git repository to get
large than it should because it stores blobs and these long lines
would each end up as a blob even though the character difference might
be like 1 character.

I suppose I could write a regex script that reformats the dump,
personally that scares me a bit.

it'd be great if pg could have the option to dump in a more 'diff-able' way.
--
Caleb Cushing

http://xenoterracide.blogspot.com

Re: pg_dump formatting

From

Tom Lane

Date:

21 March 2009, 01:58:16

Caleb Cushing <xenoterracide@gmail.com> writes:
> it'd be great if pg could have the option to dump in a more 'diff-able' way.

What exactly do you find un-diffable about it?

            regards, tom lane

Re: pg_dump formatting

From

Caleb Cushing

Date:

22 March 2009, 01:26:11

On Sat, Mar 21, 2009 at 12:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What exactly do you find un-diffable about it?

after doing a bit of testing to be more explicit... pg isn't half as
bad as mysql, so I've mostly been asking about something that doesn't
apply...

however, it seems that it creates 1 line per row, and although this is
much better than mysql's behavior. for larger tables or fields this
would still be a bit harder to diff.


COPY blarg (blah, bleh, blerg) FROM stdin;
1       random character data

djfa;djjf;sdjl;afkjeoiuoiejk,cxjueiojiojeef98hkjdyf98y92hvniay8syfkdnf38932hrhf9e83uifnskjjjjjj3h9r83hhjnnn2iyfhkjsndfi7y938hnksu879hf089h3n299ssdjfh923

the above in pg_dump is one line

it'd be easier to diff if it were say on 3, or best yet with the
exception of say, binary data or things that can't be wrapped, have
the data (for like text records) wrapped at 79 characters or maybe 78
with \ at the end or something...

mysql is much more abysmal, and I believe it puts the entire table on
one line... or some such insanity.
--
Caleb Cushing

http://xenoterracide.blogspot.com

Re: pg_dump formatting

From

Peter Eisentraut

Date:

22 March 2009, 07:40:54

On Sunday 22 March 2009 06:26:02 Caleb Cushing wrote:
> COPY blarg (blah, bleh, blerg) FROM stdin;
> 1       random character data
> djfa;djjf;sdjl;afkjeoiuoiejk,cxjueiojiojeef98hkjdyf98y92hvniay8syfkdnf38932
>hrhf9e83uifnskjjjjjj3h9r83hhjnnn2iyfhkjsndfi7y938hnksu879hf089h3n299ssdjfh92
>3
>
> the above in pg_dump is one line
>
> it'd be easier to diff if it were say on 3, or best yet with the
> exception of say, binary data or things that can't be wrapped, have
> the data (for like text records) wrapped at 79 characters or maybe 78
> with \ at the end or something...

Maybe wdiff will do what you want.

Re: pg_dump formatting

From

Tom Lane

Date:

22 March 2009, 11:44:10

Peter Eisentraut <peter_e@gmx.net> writes:
> On Sunday 22 March 2009 06:26:02 Caleb Cushing wrote:
>> COPY blarg (blah, bleh, blerg) FROM stdin;
>> 1       random character data
>> djfa;djjf;sdjl;afkjeoiuoiejk,cxjueiojiojeef98hkjdyf98y92hvniay8syfkdnf38932
>> hrhf9e83uifnskjjjjjj3h9r83hhjnnn2iyfhkjsndfi7y938hnksu879hf089h3n299ssdjfh92
>> 3
>>
>> the above in pg_dump is one line
>>
>> it'd be easier to diff if it were say on 3, or best yet with the
>> exception of say, binary data or things that can't be wrapped, have
>> the data (for like text records) wrapped at 79 characters or maybe 78
>> with \ at the end or something...

> Maybe wdiff will do what you want.

I don't think we'd consider making the kind of changes to the COPY
specification that would be needed to allow this sort of thing in COPY
mode.  However, it'd be pretty trivial to put \n instead of space
between fields in the INSERT dump formats.  I wonder whether that would
answer the need ...

            regards, tom lane

Re: pg_dump formatting

From

Tom Lane

Date:

22 March 2009, 14:09:52

I wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Maybe wdiff will do what you want.

> I don't think we'd consider making the kind of changes to the COPY
> specification that would be needed to allow this sort of thing in COPY
> mode.  However, it'd be pretty trivial to put \n instead of space
> between fields in the INSERT dump formats.  I wonder whether that would
> answer the need ...

On further reflection I think Peter's suggestion is better.  It's not
hard to think of cases where a data-value-per-line format is *less*
useful not more so for diff-ing, because you lose all sense that a
table row is one logical unit.  What the OP seems to be concerned about
is insertion/modification/deletion of words within fairly long data
fields.  wdiff is ideal for that problem, while simply breaking at field
boundaries wouldn't help all that much.

            regards, tom lane