Re: xml import/export tools and performance - Mailing list pgsql-general

From Ries van Twisk
Subject Re: xml import/export tools and performance
Date
Msg-id FF8197B1-A645-489C-9DC1-CE2BE801C181@rvt.dds.nl
Whole thread Raw
In response to xml import/export tools and performance  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
List pgsql-general

On Nov 6, 2009, at 2:52 AM, Ivan Sergio Borgonovo wrote:

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks


Ivan,

I have been using JasperETL for this purpose. But like what others say,
XML is very in efficient for this task.

Currently I am in the process if importing XML feeds into PostgreSQL.
At first I used the ETL tool to write to the correct table by creating joins
into the ETL tool.

However, this slows things down quite a bit (but the process is VERY manageable).
Currently I keep teh route within JasperETL as short as possible in teh following steps

- load XML
- remove columns I don't need
- cleanup some data using a javarow 
- load into a staging table

From there I execute a stored procedure to normalize the data (create additional records where needed in
other foreign tables) and push data into production. Triggers on my production table
do some additional work for tsearch2 and some other small stuff.

In my case, on a 32Bit system loading a 200Mb XML file consumes 1Gb real memory.
On 64Bit systems with Java 1.6 you can push this further down the road though.


if you can export to CSV and import using copy into PG, I would go for that route
if you can control the format of the CSV well and reliable, because detecting errors
in a CSV is much harder then in XML (missing column in CSV can mess up data)

Ries





pgsql-general by date:

Previous
From: Garry Saddington
Date:
Subject: pgcrypto
Next
From: "Albe Laurenz"
Date:
Subject: Re: MD5 Authentication