Thread: Re: [PATCHES] XML ouput for psql
greg@turnstep.com writes: > I don't agree with this: XML and XHTML are two different things. No one claimed anything to the contrary. > We could certainly upgrade the HTML portion, but I am pretty sure that > the XML standard calls for this format: > > <columnname>data here</columnname> The XML standard does not call for any table format. But a number of table formats have been established within the XML framework. Some of them are formatting-oriented (e.g., the HTML model, or CALS which is used in DocBook) and some of them are processing-oriented (e.g., SQL/XML). Which do we need? And which do we need from psql in particular (keeping in mind that psql is primarily for interactive use and shell-scripting)? In any case, it should most likely be a standard table model and not a hand-crafted one. (If, for whatever reason, we go the "processing-oriented" route, then I claim that there should not be a different output with and without \x mode.) -- Peter Eisentraut peter_e@gmx.net
> greg@turnstep.com writes: > > > I don't agree with this: XML and XHTML are two different things. > > No one claimed anything to the contrary. > > > We could certainly upgrade the HTML portion, but I am pretty sure that > > the XML standard calls for this format: > > > > <columnname>data here</columnname> > > The XML standard does not call for any table format. But a number of > table formats have been established within the XML framework. Some of > them are formatting-oriented (e.g., the HTML model, or CALS which is used > in DocBook) and some of them are processing-oriented (e.g., SQL/XML). > Which do we need? And which do we need from psql in particular (keeping > in mind that psql is primarily for interactive use and shell-scripting)? > In any case, it should most likely be a standard table model and not a > hand-crafted one. I would expect XML output to be based on whatever the tree of data contained. If the tree is to be rewritten, then this would mean having some sort of transformation engine in PostgreSQL that you would have to program. If I want a CALS table, then I'll push CALS table data into the database. If I'm storing a GnuCash chart of accounts in PostgreSQL, I am ludicrously uninterested in seeing it rewritten for some sort of physical layout. Spit out the tags that are stored in the database, not some rewriting of it. -- (reverse (concatenate 'string "moc.enworbbc@" "enworbbc")) http://cbbrowne.com/info/linuxdistributions.html (1) Sigs are preceded by the "sigdashes" line, ie "\n-- \n" (dash-dash-space). (2) Sigs contain at least the name and address of the sender in the first line. (3) Sigs are at most four lines and at most eighty characters per line.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > The XML standard does not call for any table format. But a number of > table formats have been established within the XML framework. Some of > them are formatting-oriented (e.g., the HTML model, or CALS which is used > in DocBook) and some of them are processing-oriented (e.g., SQL/XML). > Which do we need? And which do we need from psql in particular (keeping > in mind that psql is primarily for interactive use and shell-scripting)? > In any case, it should most likely be a standard table model and not a > hand-crafted one. I think all psql needs is a simple output, similar to the ones used by Oracle, Sybase, and MySQL; the calling application should then process it in some way as needed (obviously this is not for interactive use). Where can one find a "standard table model?" All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share a similar theme, with subtle differences (i.e. some use <row>, some <row num="x">, some have <rowset>). I'd be happy to write whatever format we can find or develop. My personal vote is the DBIx::XML_RDB format, perhaps with the row number that Oracle uses, producing this: <?xml version="1.0"?> <RESULTSET statement="select * from xmltest"> <ROW num="1"><scoops>3</scoops><flavor>chocolate</flavor> </ROW> <ROW num="2"><scoops>2</scoops><flavor>vanilla</flavor> </ROW> </RESULTSET> > (If, for whatever reason, we go the "processing-oriented" route, then I > claim that there should not be a different output with and without \x > mode.) I agree with this. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200303041444 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+ZQJNvJuQZxSWSsgRArGEAKD4xs+4Ns3syG175T3k80B6MvNJvgCbBkvF hCkf5SMjLzMJ84uMl1w4tMY= =a2Uq -----END PGP SIGNATURE-----
* greg@turnstep.com <greg@turnstep.com> [2003-03-04 14:21]: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > The XML standard does not call for any table format. But a number of > > table formats have been established within the XML framework. Some of > > them are formatting-oriented (e.g., the HTML model, or CALS which is used > > in DocBook) and some of them are processing-oriented (e.g., SQL/XML). > > Which do we need? And which do we need from psql in particular (keeping > > in mind that psql is primarily for interactive use and shell-scripting)? > > In any case, it should most likely be a standard table model and not a > > hand-crafted one. > > I think all psql needs is a simple output, similar to the ones used by > Oracle, Sybase, and MySQL; the calling application should then process > it in some way as needed (obviously this is not for interactive use). > Where can one find a "standard table model?" > > All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share > a similar theme, with subtle differences (i.e. some use <row>, some > <row num="x">, some have <rowset>). I'd be happy to write whatever > format we can find or develop. My personal vote is the DBIx::XML_RDB > format, perhaps with the row number that Oracle uses, producing this: > > <?xml version="1.0"?> > <RESULTSET statement="select * from xmltest"> > <ROW num="1"> > <scoops>3</scoops> > <flavor>chocolate</flavor> > </ROW> > <ROW num="2"> > <scoops>2</scoops> > <flavor>vanilla</flavor> > </ROW> > </RESULTSET> > > > > (If, for whatever reason, we go the "processing-oriented" route, then I > > claim that there should not be a different output with and without \x > > mode.) > > I agree with this. I'm interested in creating XML documents that have heirarcy. I can produce the above with Perl. Acually, the difficult part has been getting the information back into the database. Getting it out is a very simple query. I imagine that every language/environment has an SQL->XML library somewhere, but I wasn't able to find something that would go from XML to SQL. I wrote a utility that takes an xml document, and xml configuration file, and writes the document to a PostgerSQL data base using the configuration file to figure out what goes where. The configuration file makes some use of XPath to pluck the correct values out of the xml doucment. I suppose the same code could generate a document, but it is so easy to do using Perl and cgi, I've not bothered. It has some constraints, but it is a very useful utility. I've been able to abosorb XML documents into my PostgreSQL db just by tweeking the configuration file. Currently, I am porting it to C++ from Perl. -- Alan Gutierrez - ajglist@izzy.net http://khtml-win32.sourceforge.net/ - KHTML on Windows
* greg@turnstep.com <greg@turnstep.com> [2003-03-04 14:21]: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The XML standard does not call for any table format. But a > > number of table formats have been established within the XML > > framework. Some of them are formatting-oriented (e.g., the HTML > > model, or CALS which is used in DocBook) and some of them are > > processing-oriented (e.g., SQL/XML). Which do we need? And > > which do we need from psql in particular (keeping in mind that > > psql is primarily for interactive use and shell-scripting)? In > > any case, it should most likely be a standard table model and > > not a hand-crafted one. > > I think all psql needs is a simple output, similar to the ones used by > Oracle, Sybase, and MySQL; the calling application should then process > it in some way as needed (obviously this is not for interactive use). > Where can one find a "standard table model?" > > All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share > a similar theme, with subtle differences (i.e. some use <row>, some > <row num="x">, some have <rowset>). I'd be happy to write whatever > format we can find or develop. My personal vote is the DBIx::XML_RDB > format, perhaps with the row number that Oracle uses, producing this: > > <?xml version="1.0"?> > <RESULTSET statement="select * from xmltest"> > <ROW num="1"> > <scoops>3</scoops> > <flavor>chocolate</flavor> > </ROW> > <ROW num="2"> > <scoops>2</scoops> > <flavor>vanilla</flavor> > </ROW> > </RESULTSET> > > > > (If, for whatever reason, we go the "processing-oriented" route, then I > > claim that there should not be a different output with and without \x > > mode.) > > I agree with this. I'm interested in creating XML documents that have heirarcy. I can produce the above with Perl. I wrote a utility that takes an xml document, and xml configuration file, and writes the document to a PostgerSQL data base using the configuration file to figure out what goes where. The configuration file makes some use of XPath to pluck the correct values out of the xml doucment. I suppose the same code could generate a document, but it is so easy to do using Perl and cgi, I've not bothered. This util has been very helpful to me in developing a document mangement application. Rather than writing insert/update logic every time the db or xml schema changes, I just tweak the config file and it will generated the inserts, updates, and deletes by comparing the XML document with the tables to which the XML elements are mapped. I've been able to handle tree structures tolerably well. I am currently rewriting the code in C++ from Perl. -- Alan Gutierrez - ajglist@izzy.net http://khtml-win32.sourceforge.net/ - KHTML on Windows
I've done a lot with XML lately, so I'll throw in my $0.02 worth. One thing I have noticed about the schemes that are being advanced is that they seem to be inherently unspecifiable, formally, because column names are being used as tags. An alternative might look something like this: <?xml version="1.0"?> <RESULTSET statement="select * from xmltest"> <COLUMNS> <COLUMN name="scoops" type="int" /> <COLUMN name="flavor" type="varchar(40)" /> </COLUMNS> <ROW><FIELD name="scoops" isNull="false">3</FIELD><FIELD name="flavor" isNull="false">chocolate</FIELD> </ROW> <ROW><FIELD name="scoops" isNull="false">2</FIELD><FIELD name="flavor" isNull="false">vanilla</FIELD> </ROW> </RESULTSET> Numbering the rows should be redundant (XPath will give it to you using "position()", for example). OTOH, reporting out a null value as opposed to an empty one is probably a good idea. The formal DTD would be something like this (courtesy of the wonderful tools at http://www.hitsw.com/xml_utilites/: <!ELEMENT RESULTSET ( COLUMNS, ROW* ) > <!ATTLIST RESULTSET statement CDATA #REQUIRED > <!ELEMENT COLUMNS ( COLUMN+ ) > <!ELEMENT COLUMN EMPTY > <!ATTLIST COLUMN name NMTOKEN #REQUIRED > <!ATTLIST COLUMN type CDATA #REQUIRED > <!ELEMENT ROW ( FIELD+ ) ><!ELEMENT FIELD ( #PCDATA ) > <!ATTLIST FIELD isNull ( false| true ) "false" > <!ATTLIST FIELD name NMTOKEN #REQUIRED >or the equivalent in a schema:<?xml version="1.0" encoding="UTF-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="COLUMN"> <xs:complexType> <xs:attribute name="type"type="xs:string" use="required" /> <xs:attribute name="name" type="xs:NMTOKEN" use="required" /> </xs:complexType></xs:element> <xs:element name="COLUMNS"> <xs:complexType> <xs:sequence> <xs:element ref="COLUMN" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="FIELD"> <xs:complexType mixed="true"> <xs:attribute name="isNull" use="optional" default="false"> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="false"/> <xs:enumeration value="true" /> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="name" type="xs:NMTOKEN" use="required" /> </xs:complexType> </xs:element> <xs:element name="RESULTSET"> <xs:complexType> <xs:sequence> <xs:element ref="COLUMNS" minOccurs="1" maxOccurs="1"/> <xs:element ref="ROW" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="statement"type="xs:string" use="required" /> </xs:complexType> </xs:element> <xs:element name="ROW"> <xs:complexType> <xs:sequence> <xs:element ref="FIELD" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
greg@turnstep.com writes: > I think all psql needs is a simple output, similar to the ones used by > Oracle, Sybase, and MySQL; the calling application should then process > it in some way as needed (obviously this is not for interactive use). > Where can one find a "standard table model?" I think for processing-oriented output, the system described in the SQL/XML standard draft is the way to go. Considering the people who wrote it, it's probably pulled from, or bound to appear in, a major commercial database. I also think that psql is not the place to implement something like this. It's most likely best put in the backend, as a function like xmlfoo('select * from t1;') Then any interface and application that likes it, not just psql-based ones, can use it. -- Peter Eisentraut peter_e@gmx.net
Andrew Dunstan writes: > One thing I have noticed about the schemes that are being advanced is that > they seem to be inherently unspecifiable, formally, because column names are > being used as tags. The SQL/XML draft addresses this by specifying that a mapping from SQL things to XML things spits out both the specification (XML Schema, IIRC) and the data in one operation. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > I also think that psql is not the place to implement something like this. Agreed. > It's most likely best put in the backend, as a function like > xmlfoo('select * from t1;') That seems a little bizarre. Wouldn't we want to have a switch that just flips the SELECT output format from one style to the other? This is also a good time to stop and ask whether the frontend/backend protocol needs to change to support this. Not having read the spec, I have no idea what the low-level transport needs are for XML output, but I suspect our present protocol is not it ... regards, tom lane
Tom Lane wrote: > This is also a good time to stop and ask whether the frontend/backend > protocol needs to change to support this. Not having read the spec, > I have no idea what the low-level transport needs are for XML output, > but I suspect our present protocol is not it ... It might be interesting to modify the protocol (and the backend at the point of projection to the front end) so that a user defined formating function could be applied and either accepted or rejected by the front end. Perhaps one flavor of XML output is a start, but I could imagine wanting a custom or even different "standard" output format. Joe
> Peter Eisentraut <peter_e@gmx.net> writes: > > I also think that psql is not the place to implement something like this. > > Agreed. > > > It's most likely best put in the backend, as a function like > > xmlfoo('select * from t1;') > That seems a little bizarre. Wouldn't we want to have a switch that > just flips the SELECT output format from one style to the other? Ah, but this approach has the merit that it doesn't require pushing out a completely new set of tools. > This is also a good time to stop and ask whether the frontend/backend > protocol needs to change to support this. Not having read the spec, I > have no idea what the low-level transport needs are for XML output, > but I suspect our present protocol is not it ... That could be; there's enough variation in what one might want to do with XML that it is not trivial to suggest an 'ideal' answer. We have already seen the proposal of: <record a="b" c="d" e="f"> <record a="c" c="e" e="g"> <record a="d" c="f" e="h"> <record a="e" c="g" e="i"> I would rather prefer something like: <tablea><record> <a>b</a> <c>d</c> <e>f</e></record> <record> <a>c</a> <c>d</c> <e>f</e></record> <record> <a>d</a><c>d</c> <e>f</e></record> <tablea> (Note that both approaches are quite rational possibilities.) I'd think that the "protocol" would involve passing back a row-as-string for each row in the result set. -- output = ("cbbrowne" "@cbbrowne.com") http://www.ntlug.org/~cbbrowne/xml.html "There are two major products that come out of Berkeley: LSD and Unix. We don't believe this to be a coincidence." - Jeremy S. Anderson
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > I think for processing-oriented output, the system described in the > SQL/XML standard draft is the way to go. Considering the people who wrote > it, it's probably pulled from, or bound to appear in, a major commercial > database. Do you have a link to the exact section? I've found conflicting versions of what constitutes the "standard" for xml output of SQL data. > I also think that psql is not the place to implement something like this. > It's most likely best put in the backend, as a function like > > xmlfoo('select * from t1;') > > Then any interface and application that likes it, not just psql-based > ones, can use it. I think that is a good long-term solution, but I still think we need to address the TODO item in the short run, and allow for a simple reformatting of the query results from psql. If not, we should remove that TODO item form psql and add a different one to the backend section. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200303061020 -----BEGIN PGP SIGNATURE----- Comment: http://www.turnstep.com/pgp.html iD8DBQE+Z2jHvJuQZxSWSsgRAj7IAJ4hLEos9OlE67O02gVrrqxwT9n3AQCeJxto N2LFyvXPfGY2whPUs5k+PQA= =PYfs -----END PGP SIGNATURE-----
Peter Eisentraut kirjutas N, 06.03.2003 kell 00:37: > greg@turnstep.com writes: > > > I think all psql needs is a simple output, similar to the ones used by > > Oracle, Sybase, and MySQL; the calling application should then process > > it in some way as needed (obviously this is not for interactive use). > > Where can one find a "standard table model?" > > I think for processing-oriented output, the system described in the > SQL/XML standard draft is the way to go. Considering the people who wrote > it, it's probably pulled from, or bound to appear in, a major commercial > database. > > I also think that psql is not the place to implement something like this. > It's most likely best put in the backend, as a function like > > xmlfoo('select * from t1;') > > Then any interface and application that likes it, not just psql-based > ones, can use it. I have written an aggregate function in pl/python for my own needs that returns underlying query fomatted as XML, but it has some problems: 1) both the row-to-xml-fragment and collect-the-fragments-to-wellformed-xml-doc have to be defined for each and every different query (the actual function text is the same). 2) it is unneccesaryly hard to define a function that takes a record as argument - the record type is lost: for even simple things like this select * from (select * from mytable) mtab; the result of inner query is _not_ of rowtype mytable, i.e. you can do select xmlfrag(mytable) from mytable; but not select xmlfrag(mytable) from (select * from mytable) mytable; ---------------- Hannu
Tom Lane writes: > This is also a good time to stop and ask whether the frontend/backend > protocol needs to change to support this. Not having read the spec, > I have no idea what the low-level transport needs are for XML output, > but I suspect our present protocol is not it ... The spec defines "mappings" between tables, schemas, and catalogs on the one side and each time a pair of XML documents on the other side, one of which is an XML schema document (sort of a document type declaration) and the other is an XML data document that follows the constraints of the schema document and contains the actual data. A table could of course more or less be interpreted to mean a query result. That means, this functionality provides both query result retrieval via XML and a pg_dump type mechanism with XML output. So I imagine, if this is done fully with changes in the protocol layer, then certain commands like "get table schema in XML" would have to exist in the protocol, which doesn't seem right. Also, the XML output isn't a sibling of the current text/binary tuples, since an XML result is always a whole document, not tuple data. What we could perhaps consider is a family of functions like I illustrated, but then provide a fast-path-driven layer on the client side, like for large objects. Initially, the development of these mapping functions could take place totally in user-space. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > So I imagine, if this is done fully with changes in the protocol layer, > then certain commands like "get table schema in XML" would have to exist > in the protocol, which doesn't seem right. Also, the XML output isn't a > sibling of the current text/binary tuples, since an XML result is always > a whole document, not tuple data. I would envision a distinction comparable to the existing one between T and D messages (RowDescription and AsciiRow, using the documentation's names): you send the table schema first, then the data. Also note that there is no "command" to get the T message; it comes for free whenever a SELECT result is sent to the frontend. > What we could perhaps consider is a family of functions like I > illustrated, but then provide a fast-path-driven layer on the client side, > like for large objects. Initially, the development of these mapping > functions could take place totally in user-space. I don't object to that as a quick-and-dirty context for prototyping work, but I'd sure hate to see it as the production version. The fastpath protocol is a mess, and until/unless we get it cleaned up, we ought not increase dependency on it. A larger point is that this is still a protocol revision; pretending it ain't is just willful obscurantism. You can tell it's a protocol revision because you will need to rewrite client-side libraries to take advantage of it. If we try to look the other way and pretend it isn't one, then we'll just be incurring pain --- the most obvious pain being that it will be hard for those client libraries to tell whether the protocol extension is supported or not. The way I'd prefer to see this handled is by providing alternatives to the printtup.c DestReceiver routines. The backend could be switched to any desired output representation just by invoking different sets of receiver routines. What we seem to need first is a context for doing that, in particular a way to understand how different output formats can be fit into the FE/BE protocol. regards, tom lane