Thread: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
"Lawrence Oluyede"
Date:
As specified in the W3C Recommendation for XML the DOCTYPE element is perfectly valid in a document. I have a bunch of XML files generated by the boost library which contains a doctype like this: <!DOCTYPE boost_serialization> which lies within the bound of the recommendation (http://www.w3.org/TR/xml/#sec-prolog-dtd): "Note that it is possible to construct a well-formed document containing a doctypedecl that neither points to an external subset nor contains an internal subset." PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype in its new native data type returning this error message: """ ERROR: invalid XML content DETAIL: Entity: line 2: parser error : StartTag: invalid element name <!DOCTYPE foo> ^ ********** Error ********** ERROR: invalid XML content SQL state: 2200N Detail: Entity: line 2: parser error : StartTag: invalid element name <!DOCTYPE foo> """ This kind of behavior surprises me because pgsql has been compiled with the following flags on the development machine: ./configure --with-python --with-openssl --with-pam --with-libxml --with-libxslt --enable-thread-safety --enable-debug During the configuration stage it creates a Makefile binding the system version of the libxml2 library which is 2.6.30, the same library I use through Python (which parses correctly the XML file with the doctype). Any hints?
Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
Bruce Momjian
Date:
Added to TODO: * Allow XML to accept more liberal DOCTYPE specifications http://archives.postgresql.org/pgsql-general/2008-02/msg00347.php --------------------------------------------------------------------------- Lawrence Oluyede wrote: > As specified in the W3C Recommendation for XML the DOCTYPE element is > perfectly valid in a document. > I have a bunch of XML files generated by the boost library which > contains a doctype like this: > > <!DOCTYPE boost_serialization> > > which lies within the bound of the recommendation > (http://www.w3.org/TR/xml/#sec-prolog-dtd): > > "Note that it is possible to construct a well-formed document > containing a doctypedecl that neither points to an external subset nor > contains an internal subset." > > PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype > in its new native data type returning this error message: > > """ > ERROR: invalid XML content > DETAIL: Entity: line 2: parser error : StartTag: invalid element name > <!DOCTYPE foo> > ^ > > ********** Error ********** > > ERROR: invalid XML content > SQL state: 2200N > Detail: Entity: line 2: parser error : StartTag: invalid element name > <!DOCTYPE foo> > """ > > This kind of behavior surprises me because pgsql has been compiled > with the following flags on the development machine: > ./configure --with-python --with-openssl --with-pam --with-libxml > --with-libxslt --enable-thread-safety --enable-debug > > During the configuration stage it creates a Makefile binding the > system version of the libxml2 library which is 2.6.30, the same > library I use through Python (which parses correctly the XML file with > the doctype). > > Any hints? > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
Kevin Grittner
Date:
Bruce Momjian wrote: > Added to TODO: > > * Allow XML to accept more liberal DOCTYPE specifications Is any form of DOCTYPE accepted? We're getting errors on the second line like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE DOT_OFFICER_CITATION SYSTEM http://host.domain/dtd/dotdisposition0_02.dtd"> The actual host.domain value is resolved by DNS, and wget of the url works on the machine. Attempts to cast the document to type xml give: ERROR: invalid XML content DETAIL: Entity: line 2: parser error : StartTag: invalid element name <!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot ^ It would be nice to use the xml type, but we always have DOCTYPE.... -Kevin
Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
Kevin Grittner
Date:
Bruce Momjian wrote: > Added to TODO: > > * Allow XML to accept more liberal DOCTYPE specifications Is any form of DOCTYPE accepted? We're getting errors on the second line like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE DOT_OFFICER_CITATION SYSTEM http://host.domain/dtd/dotdisposition0_02.dtd"> The actual host.domain value is resolved by DNS, and wget of the url works on the machine. Attempts to cast the document to type xml give: ERROR: invalid XML content DETAIL: Entity: line 2: parser error : StartTag: invalid element name <!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot ^ It would be nice to use the xml type, but we always have DOCTYPE.... -Kevin
Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
Kevin Grittner
Date:
Bruce Momjian wrote: > Added to TODO: > > * Allow XML to accept more liberal DOCTYPE specifications Is any form of DOCTYPE accepted? We're getting errors on a second line in an XML document that starts like this: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dotdisposition0_02.dtd"> The actual host.domain value is resolved by DNS, and wget of the url works on the server running PostgreSQL. Attempts to cast the document to type xml give: ERROR: invalid XML content DETAIL: Entity: line 2: parser error : StartTag: invalid element name <!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot ^ It would be nice to use the xml type, but we always have DOCTYPE. I understand that PostgreSQL won't validate against the specified DOCTYPE, but it shouldn't error out on it, either. -Kevin
Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
From
Peter Eisentraut
Date:
Am Thursday, 7. February 2008 schrieb Lawrence Oluyede: > PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype > in its new native data type returning this error message: > > """ > ERROR: invalid XML content > DETAIL: Entity: line 2: parser error : StartTag: invalid element name > <!DOCTYPE foo> > ^ It turns out that this behavior is entirely correct. It depends on the XML option. If you set the XML option to DOCUMENT, you can parse documents including DOCTYPE declarations. If you set the XML option to CONTENT, then what you can parse is defined by the production XMLDecl? content which does not allow for a DOCTYPE. The default XML option is CONTENT, which explains the behavior. Now, the supercorrect way to parse XML values would be using the XMLPARSE() function, which requires you to specify the XML option inline. That way, everything works.