Thread: review: xml_is_well_formed
Hello I looked on patch https://commitfest.postgresql.org/action/patch_view?id=334 .This patch moves function xml_is_well_formed from contrib xm2 to core. * Is the patch in context diff format? yes * Does it apply cleanly to the current CVS HEAD? yes * Does it include reasonable tests, necessary doc patches, etc? yes * Does the patch actually implement that? yes * Do we want that? yes * Do we already have it? yes - simplified version in core * Does it follow SQL spec, or the community-agreed behavior? no - I didn't find any resources about conformance with SQL spec, but it has same behave like original contrib function * Does it include pg_dump support (if applicable)? not related * Are there dangers? no *Are there any assertion failures or crashes? not found I have a few issues: * broken regress test (fedora 13 - xmllint: using libxml version 20707) postgres=# SELECT xml_is_well_formed('<pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>');xml_is_well_formed --------------------f (1 row) this xml is broken - but in regress tests is ok [pavel@pavel-stehule ~]$ xmllint xxx xxx:1: parser error : error parsing attribute name <pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo> * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed('ssss');xml_is_well_formed --------------------t (1 row) it is probably wrong result - is it ok?? * I don't understand to this fragment PG_TRY(); + { + size_t count; + xmlChar *version = NULL; + int standalone = -1; +. + res_code = parse_xml_decl(string, &count, &version, NULL, &standalone); + if (res_code != 0) + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT, + "invalid XML content: invalid XML declaration", + res_code); +. + doc = xmlNewDoc(version); + doc->encoding = xmlStrdup((const xmlChar *) "UTF-8"); + doc->standalone = 1; why? This function can raise exception when declaration is wrong. It is wrong - I think, this function should returns false instead exception. Regards Pavel Stehule postgres=# select version(); version -------------------------------------------------------------------------------------------------------------- --------PostgreSQL 9.1devel on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10),64-bit (1 row)
Hi Pavel, Thanks for taking the time to review my patch. Attached is a new version addressing your concerns. On 29/07/10 14:21, Pavel Stehule wrote: > I have a few issues: > * broken regress test (fedora 13 - xmllint: using libxml version 20707) > > postgres=# SELECT xml_is_well_formed('<pg:foo > xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>'); > xml_is_well_formed > -------------------- > f > (1 row) > > this xml is broken - but in regress tests is ok > > [pavel@pavel-stehule ~]$ xmllint xxx > xxx:1: parser error : error parsing attribute name > <pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo> > This is a problem that was observered recently by Thom Brown - the commit fest webapp adds the semicolon after the quote. If you look at the attachment I sent in a email client you'll find there is no semicolon. The patch attached here will also not have the semicolon. > * xml_is_well_formed returns true for simple text > > postgres=# SELECT xml_is_well_formed('ssss'); > xml_is_well_formed > -------------------- > t > (1 row) > > it is probably wrong result - is it ok?? > Yes this is OK, pure text is valid XML content. > * I don't understand to this fragment > > PG_TRY(); > + { > + size_t count; > + xmlChar *version = NULL; > + int standalone = -1; > +. > + res_code = parse_xml_decl(string,&count,&version, > NULL,&standalone); > + if (res_code != 0) > + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT, > + "invalid XML > content: invalid XML declaration", > + res_code); > +. > + doc = xmlNewDoc(version); > + doc->encoding = xmlStrdup((const xmlChar *) "UTF-8"); > + doc->standalone = 1; > > why? This function can raise exception when declaration is wrong. It > is wrong - I think, this function should returns false instead > exception. > > You're quite right, I should be returning false here and not causing an exception. I have corrected this in the attached patch. Regards, -- Mike Fowler Registered Linux user: 379787
Attachment
Hello 2010/7/30 Mike Fowler <mike@mlfowler.com>: > Hi Pavel, > > Thanks for taking the time to review my patch. Attached is a new version > addressing your concerns. > > On 29/07/10 14:21, Pavel Stehule wrote: >> >> I have a few issues: >> * broken regress test (fedora 13 - xmllint: using libxml version 20707) ok - main regress test is ok now, next I checked a contrib test for xml2 inside contrib/xml2 make installcheck, and there is a problem SET client_min_messages = warning; \set ECHO none + psql:pgxml.sql:10: ERROR: could not find function "xml_is_well_formed" in file "/usr/local/pgsql.xwf/lib/pgxml.so" + psql:pgxml.sql:15: ERROR: could not find function "xml_is_well_formed" in file "/usr/local/pgsql.xwf/lib/pgxml.so" RESET client_min_messages; select query_to_xml('select 1as x',true,false,''); you have to remove declaration from pgxml.sql.in and uninstall_pgxml.sql, and other related files in contrib/xml2/ regress test >> >> postgres=# SELECT xml_is_well_formed('<pg:foo >> xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>'); >> xml_is_well_formed >> -------------------- >> f >> (1 row) >> >> this xml is broken - but in regress tests is ok >> >> [pavel@pavel-stehule ~]$ xmllint xxx >> xxx:1: parser error : error parsing attribute name >> <pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo> >> > > This is a problem that was observered recently by Thom Brown - the commit > fest webapp adds the semicolon after the quote. If you look at the > attachment I sent in a email client you'll find there is no semicolon. The > patch attached here will also not have the semicolon. > ok - attached patch is correct, ... please, can you remove a broken patch? >> * xml_is_well_formed returns true for simple text >> >> postgres=# SELECT xml_is_well_formed('ssss'); >> xml_is_well_formed >> -------------------- >> t >> (1 row) >> >> it is probably wrong result - is it ok?? >> > > Yes this is OK, pure text is valid XML content. It interesting for me - is somewhere some documentation about it? My colleagues speak some else :) http://zvon.org/comp/r/tut-XML.html#Pages~MinimalQ20XnumberQ20XofQ20Xelements http://www.w3.org/TR/REC-xml/#sec-prolog-dtd I am not a specialist on XML - so just don't know > >> * I don't understand to this fragment >> >> PG_TRY(); >> + { >> + size_t count; >> + xmlChar *version = NULL; >> + int standalone = -1; >> +. >> + res_code = parse_xml_decl(string,&count,&version, >> NULL,&standalone); >> + if (res_code != 0) >> + xml_ereport_by_code(ERROR, >> ERRCODE_INVALID_XML_CONTENT, >> + "invalid XML >> content: invalid XML declaration", >> + res_code); >> +. >> + doc = xmlNewDoc(version); >> + doc->encoding = xmlStrdup((const xmlChar *) "UTF-8"); >> + doc->standalone = 1; >> >> why? This function can raise exception when declaration is wrong. It >> is wrong - I think, this function should returns false instead >> exception. >> >> > > You're quite right, I should be returning false here and not causing an > exception. I have corrected this in the attached patch. > ok > Regards, > > -- > Mike Fowler > Registered Linux user: 379787 > >
On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: > > * xml_is_well_formed returns true for simple text > > > > postgres=# SELECT xml_is_well_formed('ssss'); > > xml_is_well_formed > > -------------------- > > t > > (1 row) > > > > it is probably wrong result - is it ok?? > > > > Yes this is OK, pure text is valid XML content. Are you speaking of XML content fragments that SQL/XML defines? Well-formedness should probably only allow XML documents.
On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: >> > * xml_is_well_formed returns true for simple text >> > >> > postgres=# SELECT xml_is_well_formed('ssss'); >> > xml_is_well_formed >> > -------------------- >> > t >> > (1 row) >> > >> > it is probably wrong result - is it ok?? >> > >> >> Yes this is OK, pure text is valid XML content. > > Are you speaking of XML content fragments that SQL/XML defines? > > Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
2010/7/31 Robert Haas <robertmhaas@gmail.com>: > On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: >>> > * xml_is_well_formed returns true for simple text >>> > >>> > postgres=# SELECT xml_is_well_formed('ssss'); >>> > xml_is_well_formed >>> > -------------------- >>> > t >>> > (1 row) >>> > >>> > it is probably wrong result - is it ok?? >>> > >>> >>> Yes this is OK, pure text is valid XML content. >> >> Are you speaking of XML content fragments that SQL/XML defines? >> >> Well-formedness should probably only allow XML documents. > > I think the point of this function is to determine whether a cast to > xml will throw an error. The behavior should probably match exactly > whatever test would be applied there. I agree with this idea - so I am able to do: postgres=# select 'xxx'::xml;xml -----xxx (1 row) I have not any suggestions now - so I'll change flag to "ready to commit" Regards Pavel Stehule > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company >
2010/8/2 Pavel Stehule <pavel.stehule@gmail.com>: > 2010/7/31 Robert Haas <robertmhaas@gmail.com>: >> On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >>> On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: >>>> > * xml_is_well_formed returns true for simple text >>>> > >>>> > postgres=# SELECT xml_is_well_formed('ssss'); >>>> > xml_is_well_formed >>>> > -------------------- >>>> > t >>>> > (1 row) >>>> > >>>> > it is probably wrong result - is it ok?? >>>> > >>>> >>>> Yes this is OK, pure text is valid XML content. >>> >>> Are you speaking of XML content fragments that SQL/XML defines? >>> >>> Well-formedness should probably only allow XML documents. >> >> I think the point of this function is to determine whether a cast to >> xml will throw an error. The behavior should probably match exactly >> whatever test would be applied there. > > I agree with this idea - so I am able to do: > > postgres=# select 'xxx'::xml; > xml > ----- > xxx > (1 row) > > I have not any suggestions now - so I'll change flag to "ready to commit" sorry - contrib module should be a fixed patch attached > > Regards > > Pavel Stehule > >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise Postgres Company >> >
Attachment
On 02/08/10 07:46, Pavel Stehule wrote: >> >> I have not any suggestions now - so I'll change flag to "ready to commit" > > sorry - contrib module should be a fixed > > patch attached > Thanks Pavel, you saved me some time! Regards, -- Mike Fowler Registered Linux user: 379787
On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: > > Well-formedness should probably only allow XML documents. > > I think the point of this function is to determine whether a cast to > xml will throw an error. The behavior should probably match exactly > whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious.
Hello 2010/8/3 Peter Eisentraut <peter_e@gmx.net>: > On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: >> > Well-formedness should probably only allow XML documents. >> >> I think the point of this function is to determine whether a cast to >> xml will throw an error. The behavior should probably match exactly >> whatever test would be applied there. > > Maybe there should be > > xml_is_well_formed() > xml_is_well_formed_document() > xml_is_well_formed_content() > > I agree that consistency with SQL/XML is desirable, but for someone > coming from the outside, the unqualified claim that 'foo' is well-formed > XML might sound suspicious. > yes, it is little bit curious - but it can be just documented. Now, I don't think, so we need more functions. Regards Pavel
On 03/08/10 16:15, Peter Eisentraut wrote: > On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: >>> Well-formedness should probably only allow XML documents. >> >> I think the point of this function is to determine whether a cast to >> xml will throw an error. The behavior should probably match exactly >> whatever test would be applied there. > > Maybe there should be > > xml_is_well_formed() > xml_is_well_formed_document() > xml_is_well_formed_content() > > I agree that consistency with SQL/XML is desirable, but for someone > coming from the outside, the unqualified claim that 'foo' is well-formed > XML might sound suspicious. What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed -------------------- f (1 row) test=# SET xmloption TO CONTENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed -------------------- t (1 row) with the inverse for DOCUMENTS? To me this makes the most sense as it makes the function behave much more like the otherxml functions. -- Mike Fowler Registered Linux user: 379787
On Fri, Aug 6, 2010 at 4:28 AM, Mike Fowler <mike@mlfowler.com> wrote: > On 03/08/10 16:15, Peter Eisentraut wrote: >> >> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: >>>> >>>> Well-formedness should probably only allow XML documents. >>> >>> I think the point of this function is to determine whether a cast to >>> xml will throw an error. The behavior should probably match exactly >>> whatever test would be applied there. >> >> Maybe there should be >> >> xml_is_well_formed() >> xml_is_well_formed_document() >> xml_is_well_formed_content() >> >> I agree that consistency with SQL/XML is desirable, but for someone >> coming from the outside, the unqualified claim that 'foo' is well-formed >> XML might sound suspicious. > > What about making the function sensitive to the XML OPTION, such that: > > test=# SET xmloption TO DOCUMENT; > SET > text=# SELECT xml_is_well_formed('foo'); > > xml_is_well_formed > -------------------- > f > (1 row) That will make using this function a huge hassle, won't it? Functions that do different things depending on GUC settings are usually troublesome. Having three functions would be more sensible if we need all three behaviors, but I don't see why we do. Or perhaps it could return a string instead of a boolean: content, document, or NULL if it's neither. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 06/08/10 12:31, Robert Haas wrote: >>> >>> Maybe there should be >>> >>> xml_is_well_formed() >>> xml_is_well_formed_document() >>> xml_is_well_formed_content() >>> >>> I agree that consistency with SQL/XML is desirable, but for someone >>> coming from the outside, the unqualified claim that 'foo' is well-formed >>> XML might sound suspicious. >>> >> What about making the function sensitive to the XML OPTION, such that: >> >> test=# SET xmloption TO DOCUMENT; >> SET >> text=# SELECT xml_is_well_formed('foo'); >> >> xml_is_well_formed >> -------------------- >> f >> (1 row) >> > That will make using this function a huge hassle, won't it? Functions > that do different things depending on GUC settings are usually > troublesome. Having three functions would be more sensible if we need > all three behaviors, but I don't see why we do. > > Or perhaps it could return a string instead of a boolean: content, > document, or NULL if it's neither. > I like the sound of that. In fact this helps workaround the IS DOCUMENT and IS CONTENT limitations such that you can you can select only content, only documents or both is you use IS NOT NULL. Unless anyone sees a reason that this function needs to remain a boolean function, I'll rework the patch over the weekend. Regards, -- Mike Fowler Registered Linux user: 379787
On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote: > > What about making the function sensitive to the XML OPTION, such > that: > > > > test=# SET xmloption TO DOCUMENT; > > SET > > text=# SELECT xml_is_well_formed('foo'); > > > > xml_is_well_formed > > -------------------- > > f > > (1 row) > > That will make using this function a huge hassle, won't it? Functions > that do different things depending on GUC settings are usually > troublesome. Having three functions would be more sensible if we need > all three behaviors, but I don't see why we do. Upthread you opined that this function should essentially indicate whether a cast to type xml would succeed, and observing the xmloption is an essential part of that. I had assumed the original patch actually did that.
On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote: > > Or perhaps it could return a string instead of a boolean: content, > > document, or NULL if it's neither. > > > > I like the sound of that. In fact this helps workaround the IS > DOCUMENT > and IS CONTENT limitations such that you can you can select only > content, only documents or both is you use IS NOT NULL. > > Unless anyone sees a reason that this function needs to remain a > boolean function, I'll rework the patch over the weekend. What is the actual use case for this function? Is the above behavior actually useful? One reason to stick with boolean is backward compatibility.
On Fri, Aug 6, 2010 at 4:52 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote: >> > What about making the function sensitive to the XML OPTION, such >> that: >> > >> > test=# SET xmloption TO DOCUMENT; >> > SET >> > text=# SELECT xml_is_well_formed('foo'); >> > >> > xml_is_well_formed >> > -------------------- >> > f >> > (1 row) >> >> That will make using this function a huge hassle, won't it? Functions >> that do different things depending on GUC settings are usually >> troublesome. Having three functions would be more sensible if we need >> all three behaviors, but I don't see why we do. > > Upthread you opined that this function should essentially indicate > whether a cast to type xml would succeed, and observing the xmloption is > an essential part of that. I had assumed the original patch actually > did that. Oh. I didn't realize the casting behavior was already dependent on the GUC. Yuck. Maybe we should following the GUC after all, then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 06/08/10 21:55, Peter Eisentraut wrote: > On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote: >>> Or perhaps it could return a string instead of a boolean: content, >>> document, or NULL if it's neither. >>> >> >> I like the sound of that. In fact this helps workaround the IS >> DOCUMENT >> and IS CONTENT limitations such that you can you can select only >> content, only documents or both is you use IS NOT NULL. >> >> Unless anyone sees a reason that this function needs to remain a >> boolean function, I'll rework the patch over the weekend. > > What is the actual use case for this function? Is the above behavior > actually useful? The idea is to be able to filter a table that contains XML in TEXT that might not be well formed. Knowing that you're only dealing with well formed XML prevents you blowing up when you attempt the cast. > > One reason to stick with boolean is backward compatibility. > To be honest I'm happiest with returning a boolean, even if there is some confusion over content only being valid. Though changing the return value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the same results can be achieved by simply running: SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml IS DOCUMENT; Regards, -- Mike Fowler Registered Linux user: 379787
Peter Eisentraut <peter_e@gmx.net> writes: > On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: >> I think the point of this function is to determine whether a cast to >> xml will throw an error. The behavior should probably match exactly >> whatever test would be applied there. > Maybe there should be > xml_is_well_formed() > xml_is_well_formed_document() > xml_is_well_formed_content() > I agree that consistency with SQL/XML is desirable, but for someone > coming from the outside, the unqualified claim that 'foo' is well-formed > XML might sound suspicious. I think I agree with the later discussion that xml_is_well_formed() should tell you whether a cast to xml would succeed (and hence it has to pay attention to XMLOPTION). However, it seems to also make sense to provide the other two functions suggested here, both to satify people who know XML and to offer tests that will tell you whether XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed. Merging the three cases into one function doesn't seem like a win for either compatibility or usability. regards, tom lane
On Sun, Aug 8, 2010 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: >>> I think the point of this function is to determine whether a cast to >>> xml will throw an error. The behavior should probably match exactly >>> whatever test would be applied there. > >> Maybe there should be > >> xml_is_well_formed() >> xml_is_well_formed_document() >> xml_is_well_formed_content() > >> I agree that consistency with SQL/XML is desirable, but for someone >> coming from the outside, the unqualified claim that 'foo' is well-formed >> XML might sound suspicious. > > I think I agree with the later discussion that xml_is_well_formed() > should tell you whether a cast to xml would succeed (and hence it has to > pay attention to XMLOPTION). However, it seems to also make sense to > provide the other two functions suggested here, both to satify people > who know XML and to offer tests that will tell you whether > XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed. > > Merging the three cases into one function doesn't seem like a win > for either compatibility or usability. +1. I didn't realize how funky the XMLOPTION stuff was at the start of this discussion; I think your analysis here is spot-on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: > To be honest I'm happiest with returning a boolean, even if there is > some confusion over content only being valid. Though changing the > return > value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, > the > same results can be achieved by simply running: > > SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND > data::xml IS DOCUMENT; Note that this wouldn't necessarily work because it is not guaranteed that the well-formedness test is executed before the cast to xml. SQL doesn't short-circuit left to right. (A CASE expression could work.)
On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: >> To be honest I'm happiest with returning a boolean, even if there is >> some confusion over content only being valid. Though changing the >> return >> value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, >> the >> same results can be achieved by simply running: >> >> SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND >> data::xml IS DOCUMENT; > > Note that this wouldn't necessarily work because it is not guaranteed > that the well-formedness test is executed before the cast to xml. SQL > doesn't short-circuit left to right. (A CASE expression could work.) There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut <peter_e@gmx.net> wrote: >> On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: >>> To be honest I'm happiest with returning a boolean, even if there is >>> some confusion over content only being valid. Though changing the >>> return >>> value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, >>> the >>> same results can be achieved by simply running: >>> >>> SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND >>> data::xml IS DOCUMENT; >> >> Note that this wouldn't necessarily work because it is not guaranteed >> that the well-formedness test is executed before the cast to xml. SQL >> doesn't short-circuit left to right. (A CASE expression could work.) > > There's also the fact that it would probably end up parsing the data > twice. Given xmloption, I'm inclined to think Tom has it right: > provided xml_is_well_formed() that follows xmloption, plus a specific > version for each of content and document. Another reasonable option here would be to forget about having xml_is_well_formed() per se and ONLY offer xml_is_well_formed_content() and xml_is_well_formed_document(). As a project management note, this CommitFest is over in 4 days, so unless we have a new version of this patch real soon now we need to defer it to the September 15th CommitFest (of course not precluding the possibility that someone will pick it up and commit it sooner, but we're not going to postpone 9.1alpha1 for this patch). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> There's also the fact that it would probably end up parsing the data >> twice. �Given xmloption, I'm inclined to think Tom has it right: >> provided xml_is_well_formed() that follows xmloption, plus a specific >> version for each of content and document. > Another reasonable option here would be to forget about having > xml_is_well_formed() per se and ONLY offer > xml_is_well_formed_content() and xml_is_well_formed_document(). We already have xml_is_well_formed(); just dropping it doesn't seem like a helpful choice. > As a project management note, this CommitFest is over in 4 days, so > unless we have a new version of this patch real soon now we need to > defer it to the September 15th CommitFest Yes. Mike, are you expecting to submit a new version before the end of the week? regards, tom lane
On 11/08/10 21:27, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas<robertmhaas@gmail.com> wrote: >>> There's also the fact that it would probably end up parsing the data >>> twice. Given xmloption, I'm inclined to think Tom has it right: >>> provided xml_is_well_formed() that follows xmloption, plus a specific >>> version for each of content and document. > >> Another reasonable option here would be to forget about having >> xml_is_well_formed() per se and ONLY offer >> xml_is_well_formed_content() and xml_is_well_formed_document(). > > We already have xml_is_well_formed(); just dropping it doesn't seem like > a helpful choice. > >> As a project management note, this CommitFest is over in 4 days, so >> unless we have a new version of this patch real soon now we need to >> defer it to the September 15th CommitFest > > Yes. Mike, are you expecting to submit a new version before the end of > the week? > Yes and here it is, apologies for the delay. I have re-implemented xml_is_well_formed such that it is sensitive to the XMLOPTION. The additional _document and _content methods are now present. Tests and documentation adjusted to suit. Regards, -- Mike Fowler Registered Linux user: 379787
Attachment
Hello I checked last version: * there are not a problem with regress and contrib regress tests * the design is simple and clean now - well documented notes: * don't get a patch via copy/paste from mailing list archive - there are a broken xml2 tests via this access! * I didn't find a sentence so default for xml_is_well_formed a content checking - some like "without change of xmloption, the behave is same as xml_is_well_formed_content" Regards Pavel Stehule 2010/8/11 Mike Fowler <mike@mlfowler.com>: > On 11/08/10 21:27, Tom Lane wrote: >> >> Robert Haas<robertmhaas@gmail.com> writes: >>> >>> On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas<robertmhaas@gmail.com> >>> wrote: >>>> >>>> There's also the fact that it would probably end up parsing the data >>>> twice. Given xmloption, I'm inclined to think Tom has it right: >>>> provided xml_is_well_formed() that follows xmloption, plus a specific >>>> version for each of content and document. >> >>> Another reasonable option here would be to forget about having >>> xml_is_well_formed() per se and ONLY offer >>> xml_is_well_formed_content() and xml_is_well_formed_document(). >> >> We already have xml_is_well_formed(); just dropping it doesn't seem like >> a helpful choice. >> >>> As a project management note, this CommitFest is over in 4 days, so >>> unless we have a new version of this patch real soon now we need to >>> defer it to the September 15th CommitFest >> >> Yes. Mike, are you expecting to submit a new version before the end of >> the week? >> > > Yes and here it is, apologies for the delay. I have re-implemented > xml_is_well_formed such that it is sensitive to the XMLOPTION. The > additional _document and _content methods are now present. Tests and > documentation adjusted to suit. > > Regards, > > -- > Mike Fowler > Registered Linux user: 379787 >
Mike Fowler <mike@mlfowler.com> writes: > On 11/08/10 21:27, Tom Lane wrote: >> Yes. Mike, are you expecting to submit a new version before the end of >> the week? > Yes and here it is, apologies for the delay. I have re-implemented > xml_is_well_formed such that it is sensitive to the XMLOPTION. The > additional _document and _content methods are now present. Tests and > documentation adjusted to suit. Applied with minor cleanups, mostly in the documentation. The only thing that seems worth remarking on is that since xml_is_well_formed now depends on a GUC variable, it *cannot* be marked IMMUTABLE. The right marking in such cases is STABLE. regards, tom lane