Home > mailing lists

Re: xml-file as foreign table? - Mailing list pgsql-general

From	Arjen Nienhuis
Subject	Re: xml-file as foreign table?
Date	April 29, 2016 08:01:31
Msg-id	CAG6W84KcdJaDjJ_CDRnvTQc2JBdSzHJQQgbQ8fyqaLeBOvaLyQ@mail.gmail.com Whole thread Raw
In response to	xml-file as foreign table? (Johann Spies <johann.spies@gmail.com>)
List	pgsql-general

Tree view

On Apr 28, 2016 14:33, "Johann Spies" <johann.spies@gmail.com> wrote:
>
> I have several large (7GB+) xml files to get into an SQL database.
>
> The xml-files can contain up to 500 000 subrecords which I want to be able to query in the database.
>
> They are too large to do something like this:
>
>
> insert into rawxml (xml) select XMLPARSE (DOCUMENT CONVERT_FROM(PG_READ_BINARY_FILE('FOO.xml' ), 'UTF8'));
>
> If it were possible, each file would be one huge record in the table which can then be unpacked using XPATH.
>
>
> The options I am considering is :
>
> 1. Unpack the individual records (will be more than 50 million) using something like python with lxml and psycopg2 and insert them after dropping all indexes and triggers on the table
>
> 2. Unpack the individual records and write a (very) large tsv-file and then insert it using 'copy'
>

The fastest way I found is to combine these two. Using iterparse from lxml combined with load_rows and COPY from py-postgresql:

http://python.projects.pgfoundry.org/docs/1.1/driver.html#copy-statements

That way you can stream the data.

> It would be convenient If I could use the present xml files as 'foreign tables' and parse them using the xpath-capabilities of PostgreSQL.
>
> Is this possible?
>

There is a multicorn fdw for that:

https://github.com/Kozea/Multicorn/blob/master/python /multicorn /xmlfdw.py

But I never tried it. It looks like it loads all rows in a python list.

Groeten, Arjen

pgsql-general by date:

From: Tim van der Linden
Date: 29 April 2016, 03:19:22
Subject: Re: Slow join over three tables

From: Yogesh Sharma
Date: 29 April 2016, 13:27:13
Subject: Re: Issue during postgresql startup

Re: xml-file as foreign table? - Mailing list pgsql-general

Previous

Next