Using XML_PARSE_HUGE in operations on xml fields? - Mailing list pgsql-general

From patrick mc allister
Subject Using XML_PARSE_HUGE in operations on xml fields?
Date
Msg-id 20140304155421.GM23803@rdorte.org
Whole thread Raw
List pgsql-general
Hello,

I ran into trouble with an xpath expression on a large XML file:

SELECT id, xpath('//tei:div/descendant::tei:head/text()', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS
stuffFROM test WHERE id=1; 

returns:

ERROR:  could not parse XML document
DETAIL:  line 491482: internal error: Huge input lookup
                        स्थापितः।। </p>
                                                       ^
line 491482: Extra content at the end of the document
                        स्थापितः।। </p>
                                                       ^

I think this is an error that comes from the libxml2 library, since
pretty much the same thing happens with xmllint if you pass the
`--memory' option.

This is the  libxml2 documentation which is relevant here, I think:

"""
#define XML_MAX_DICTIONARY_LIMIT

Maximum size allowed by the parser for a dictionary by default This is
not a limitation of the parser but a safety boundary feature, use
XML_PARSE_HUGE option to override it. Introduced in 2.9.0
"""

(see http://xmlsoft.org/html/libxml-parserInternals.html#XML_MAX_LOOKUP_LIMIT)

So I was wondering if and how I could set that XML_PARSE_HUGE option
in postgresql? I couldn't find anything in the docs or in this list's
archives.

If you want to replicate the problem quickly, with one approx 37MB xml
file and one smaller one:

createdb xmlpost
psql xmlpost

CREATE TABLE test (id integer, x xml);
\set content `curl http://sarit.indology.info/downloads/mahabharata-devanagari.xml`
INSERT INTO test (SELECT 1, (SELECT XMLPARSE (DOCUMENT :'content')));
\set content `curl http://sarit.indology.info/downloads/ratnakIrti-nibandhAvali.xml`
INSERT INTO test (SELECT 2, (SELECT XMLPARSE (DOCUMENT :'content')));

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS
stuffFROM test WHERE id=1; -- fails 

SELECT id, xpath('count(//tei:div/descendant::tei:head)', x, ARRAY[ARRAY['tei', 'http://www.tei-c.org/ns/1.0']]) AS
stuffFROM test WHERE id=2; -- works: 13 

Thanks for any hints,

--
patrick


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Segmentation fault
Next
From: Adrian Klaver
Date:
Subject: Re: Role Inheritance Without Explicit Naming?