Re: [GENERAL] Insertion of large xml files into PostgreSQL 10beta1 - Mailing list pgsql-general

From David G. Johnston
Subject Re: [GENERAL] Insertion of large xml files into PostgreSQL 10beta1
Date
Msg-id CAKFQuwZ5UAtSjq=hR6rXfq6V++xOhomADEvOf+7i45D8DD_1sA@mail.gmail.com
Whole thread Raw
In response to [GENERAL] Insertion of large xml files into PostgreSQL 10beta1  (Alain Toussaint <atoussaint1976@gmail.com>)
Responses Re: [GENERAL] Insertion of large xml files into PostgreSQL 10beta1  (Alain Toussaint <atoussaint1976@gmail.com>)
List pgsql-general
On Fri, Jun 23, 2017 at 8:19 AM, Alain Toussaint <atoussaint1976@gmail.com> wrote:
Hello,

I am building up a PostgreSQL server which I intend to load the
entirety of the pubmed database data (23GB bzip2 compressed, 216GB
unpacked) which is available in xml form of which, here is an example:

https://www.ncbi.nlm.nih.gov/pubmed/21833294?report=xml&format=text

I looked at the documentation as well as several examples code for
loading the data and the one example who nearly succeeded is this
procedure:

/usr/bin/psql medline

\set :largexmlfile: 'cat /srv/pgsql/pubmed/medline17n0001.xml'
INSERT INTO samples (xmldata) VALUES :largexmlfile:

​I'll assume you've just mis-keyed this from memory since the syntax of the above doesn't like right.

(from reading the list post here:
https://www.postgresql.org/message-id/20160624083757.GA5459%40msg.df7cb.de)

In which, about 334MB of data from medline17n0001.xml will flood the
monitor.

​If the above general command sequence is done right, and echoing of commands is turned off, you should not see any of the XML file content echoed to the output.​
 

it is possible to turn off validation of the content between the xml
tags of the files.


​You can either turn off validation for the entire file or leave it on - PostgreSQL isn't recognizing tags here (you haven't defined the samples table for us...).​

​Narrowing down the entire file to a small problem region and posting a self-contained example, or at least providing the error messages and content, might help elicit good responses.​  Even if you could load the data without incident using it make end up proving problematic.  That said character encodings and sets are not my strong suit.

David J.

pgsql-general by date:

Previous
From: Alain Toussaint
Date:
Subject: [GENERAL] Insertion of large xml files into PostgreSQL 10beta1
Next
From: "Igal @ Lucee.org"
Date:
Subject: [GENERAL] Download 9.6.3 Binaries