Thread: pg_dump - sort data?

pg_dump - sort data?

From
Michael Swierczek
Date:
Ladies and Gentlemen:
We have several Postgres 7.4.x and 8.x databases running on different machines which are not able to connect directly over IP (including VPN).  So periodically, data from tables is extracted in .csv format, zipped, encrypted, and transferred to other locations.   I have a set of Java JDBC apps that perform the integration of the extracted data.  At this point in time, the largest database creates a 25MB file from pg_dump.

When I am adding new features to the JDBC apps, I need to run tests periodically to ensure that the integration worked completely.  For very small test data sets, this is easy - just query the test database afterwards to make sure the new data is in place.   For larger test data sets with hundreds or thousands of inserts, it's not feasible.  

What I have been doing is running pg_dump on the test database, running my application with a large test data set, running pg_dump again, and trying to compare the result.   The only problem is, pg_dump does not seem to sort the contents of the "Copy <table> ... " sections.  So the before and after files can have the exact same data in totally different order.  What I do is sort the contents of the before and after dump files using Gnu sort on cygwin.   That puts the contents in alphanumeric order, but naturally it does not keep them seperate by table. 

So my questions are:
1. Does anyone see an easier way for me to test what I have been doing? 
2. Is there a way to make pg_dump dump the whole database and then sort the results by table?
3. If not 1 & 2, it looks like my best solution is a cygwin shell script, windows batch file, or executable that automates dumping the 10 tables I need to 10 different files and then sorting each file individually.  Does that sound correct?  Are there any existing Postgres tools or add-on tools that could do this for me?

Thanks very much,
Mike

PS While I'm on the list, I have a somewhat off-topic question.   Our product automatically generates PDF reports from sets of data using Crystal Reports.  Other than the Crystal Reports software, everything we use can be run on Linux.   Is anyone aware of a Linux-compatible alternative to Crystal Reports?  Ideally we would like to use open source, but even a proprietary system that ran on Linux would be acceptable.

Re: pg_dump - sort data?

From
Darren R
Date:
Michael,

On your question about Crystal Reports (*GAG*COUGH* -
excuse me, I had a sudden hairball experience):

As you're already using Java, you might consider
implementing an Eclipse deployment - it'll run on
Windoze and Linux. There are plug-ins for most
everything. And there's BIRT - Business Intelligence
Reporting Tool.

http://www.eclipse.org/birt/phoenix/

HTH,

Ren

--- Michael Swierczek <mike.swierczek@gmail.com>
wrote:

> Ladies and Gentlemen:
> We have several Postgres 7.4.x and 8.x databases
> running on different
> machines which are not able to connect directly over
> IP (including VPN).  So
> periodically, data from tables is extracted in .csv
> format, zipped,
> encrypted, and transferred to other locations.   I
> have a set of Java JDBC
> apps that perform the integration of the extracted
> data.  At this point in
> time, the largest database creates a 25MB file from
> pg_dump.
>
> When I am adding new features to the JDBC apps, I
> need to run tests
> periodically to ensure that the integration worked
> completely.  For very
> small test data sets, this is easy - just query the
> test database afterwards
> to make sure the new data is in place.   For larger
> test data sets with
> hundreds or thousands of inserts, it's not feasible.
>
> What I have been doing is running pg_dump on the
> test database, running my
> application with a large test data set, running
> pg_dump again, and trying to
> compare the result.   The only problem is, pg_dump
> does not seem to sort the
> contents of the "Copy <table> ... " sections.  So
> the before and after files
> can have the exact same data in totally different
> order.  What I do is sort
> the contents of the before and after dump files
> using Gnu sort on cygwin.
> That puts the contents in alphanumeric order, but
> naturally it does not keep
> them seperate by table.
>
> So my questions are:
> 1. Does anyone see an easier way for me to test what
> I have been doing?
> 2. Is there a way to make pg_dump dump the whole
> database and then sort the
> results by table?
> 3. If not 1 & 2, it looks like my best solution is a
> cygwin shell script,
> windows batch file, or executable that automates
> dumping the 10 tables I
> need to 10 different files and then sorting each
> file individually.  Does
> that sound correct?  Are there any existing Postgres
> tools or add-on tools
> that could do this for me?
>
> Thanks very much,
> Mike
>
> PS While I'm on the list, I have a somewhat
> off-topic question.   Our
> product automatically generates PDF reports from
> sets of data using Crystal
> Reports.  Other than the Crystal Reports software,
> everything we use can be
> run on Linux.   Is anyone aware of a
> Linux-compatible alternative to Crystal
> Reports?  Ideally we would like to use open source,
> but even a proprietary
> system that ran on Linux would be acceptable.
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com