Home > mailing lists

Re: pg_dump --split patch - Mailing list pgsql-hackers

From	Joel Jacobson
Subject	Re: pg_dump --split patch
Date	December 28, 2010 21:18:55
Msg-id	AANLkTim+sFO7N539V5C+yZFx7_fTFQxdHxtCUyhPD-3V@mail.gmail.com Whole thread Raw
In response to	Re: pg_dump --split patch (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: pg_dump --split patch Re: pg_dump --split patch
List	pgsql-hackers

Tree view

2010/12/29 Tom Lane <tgl@sss.pgh.pa.us>

If you've solved the deterministic-ordering problem, then this entire
patch is quite useless. You can just run a normal dump and diff it.

No, that's only half true.

Diff will do a good job minimizing the "size" of the diff output, yes, but such a diff is still quite useless if you want to quickly grasp the context of the change.

If you have a hundreds of functions, just looking at the changed source code is not enough to figure out which functions were modified, unless you have the brain power to memorize every single line of code and are able to figure out the function name just by looking at the old and new line of codes.

To understand a change to my database functions, I would start by looking at the top-level, only focusing on the names of the functions modified/added/removed.

At this stage, you want as little information as possible about each change, such as only the names of the functions.

To do this, get a list of changes functions, you cannot compare two full schema plain text dumps using diff, as it would only reveal the lines changed, not the name of the functions, unless you are lucky to get the name of the function within the (by default) 3 lines of copied context.

While you could increase the number of copied lines of context to a value which would ensure you would see the name of the function in the diff, that is not feasible if you want to quickly "get a picture" of the code areas modified, since you would then need to read through even more lines of diff output.

For a less database-centric system where you don't have hundreds of stored procedures, I would agree it's not an issue to keep track of changes by diffing entire schema files, but for extremely database-centric systems, such as the one we have developed at my company, it's not possible to "get the whole picture" of a change by analyzing diffs of entire schema dumps.

The patch has been updated:

*) Only spit objects with a namespace (schema) not being null

*) Append all objects of same tag (name) of same type (desc) of same namespace (schema) to the same file (i.e., do not append -2, -3, like before) (Suggested by David Wilson, thanks.)

I also tested to play around with "ORDER BY pronargs" and "ORDER BY pronargs DESC" to the queries in getFuncs() in pg_dump.c, but it had no effect to the order the functions of same name but different number of arguments were dumped.

Perhaps functions are already sorted?

Anyway, it doesn't matter that much, keeping all functions of the same name in the same file is a fair trade-off I think. The main advantage is the ability to quickly get a picture of the names of all changed functions, secondly to optimize the actual diff output.

--
Best regards,

Joel Jacobson
Glue Finance

E: jj@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box 549
114 11 Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 28 December 2010, 20:29:52
Subject: Re: Revised patches to add table function support to PL/Tcl (TODO item)

From: Andrew Dunstan
Date: 28 December 2010, 22:21:40
Subject: Re: pg_dump --split patch

Re: pg_dump --split patch - Mailing list pgsql-hackers

Attachment

Previous

Next