Re: Using COPY to import large xml file - Mailing list pgsql-general

From Christoph Moench-Tegeder
Subject Re: Using COPY to import large xml file
Date
Msg-id 20180624174543.qj6yvp5x7koaql6k@squirrel.exwg.net
Whole thread Raw
In response to Re: Using COPY to import large xml file  (Adrien Nayrat <adrien.nayrat@anayrat.info>)
List pgsql-general
## Adrien Nayrat (adrien.nayrat@anayrat.info):

> I used this tool :
> https://github.com/Networks-Learning/stackexchange-dump-to-postgres

That will be awfully slow: this tool commits each INSERT on it's own,
see loop in
https://github.com/Networks-Learning/stackexchange-dump-to-postgres/blob/master/load_into_pg.py#L83

With only small changes - prepare the INSERT, execute for all (or at
least a huge lot of) rows, COMMIT at the end - you can safe quite a lot
of time (500 rows is not "a huge lot"). And when you do that, for
heaven's sake, do not try to create the INSERT statement as a string
with the values - Bobby Tables will eat you. See psycopg documentation
on how it's done (especially watch the warnings):
http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
On prepared statements with psycopg2, see
http://initd.org/psycopg/articles/2012/10/01/prepared-statements-psycopg/
(python makes stuff rather hard, but I'll spare you the snark and wait until
the language has matured to at least version 5).

Using the COPY protocol with psycopg2 seems to require some hoop-jumping,
but could improve matters even more.

Regards,
Christoph

-- 
Spare Space.


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Using COPY to import large xml file
Next
From: Tim Cross
Date:
Subject: Re: Using COPY to import large xml file