Thread: Re: [PATCHES] XML ouput for psql

Re: [PATCHES] XML ouput for psql

From

Peter Eisentraut

Date:

03 March 2003, 12:55:06

greg@turnstep.com writes:

> I don't agree with this: XML and XHTML are two different things.

No one claimed anything to the contrary.

> We could certainly upgrade the HTML portion, but I am pretty sure that
> the XML standard calls for this format:
>
> <columnname>data here</columnname>

The XML standard does not call for any table format.  But a number of
table formats have been established within the XML framework.  Some of
them are formatting-oriented (e.g., the HTML model, or CALS which is used
in DocBook) and some of them are processing-oriented (e.g., SQL/XML).
Which do we need?  And which do we need from psql in particular (keeping
in mind that psql is primarily for interactive use and shell-scripting)?
In any case, it should most likely be a standard table model and not a
hand-crafted one.

(If, for whatever reason, we go the "processing-oriented" route, then I
claim that there should not be a different output with and without \x
mode.)

-- 
Peter Eisentraut   peter_e@gmx.net

Re: [PATCHES] XML ouput for psql

From

cbbrowne@cbbrowne.com

Date:

03 March 2003, 18:59:17

> greg@turnstep.com writes:
> 
> > I don't agree with this: XML and XHTML are two different things.
> 
> No one claimed anything to the contrary.
> 
> > We could certainly upgrade the HTML portion, but I am pretty sure that
> > the XML standard calls for this format:
> >
> > <columnname>data here</columnname>
> 
> The XML standard does not call for any table format.  But a number of
> table formats have been established within the XML framework.  Some of
> them are formatting-oriented (e.g., the HTML model, or CALS which is used
> in DocBook) and some of them are processing-oriented (e.g., SQL/XML).
> Which do we need?  And which do we need from psql in particular (keeping
> in mind that psql is primarily for interactive use and shell-scripting)?
> In any case, it should most likely be a standard table model and not a
> hand-crafted one.

I would expect XML output to be based on whatever the tree of data
contained.

If the tree is to be rewritten, then this would mean having some sort of
transformation engine in PostgreSQL that you would have to program.

If I want a CALS table, then I'll push CALS table data into the
database.

If I'm storing a GnuCash chart of accounts in PostgreSQL, I am
ludicrously uninterested in seeing it rewritten for some sort of
physical layout.  Spit out the tags that are stored in the database, not
some rewriting of it.
--
(reverse (concatenate 'string "moc.enworbbc@" "enworbbc"))
http://cbbrowne.com/info/linuxdistributions.html
(1) Sigs are preceded by the "sigdashes" line, ie "\n-- \n" (dash-dash-space).
(2) Sigs contain at least the name and address of the sender in the first line.
(3) Sigs are at most four lines and at most eighty characters per line.

Re: XML ouput for psql

From

greg@turnstep.com

Date:

04 March 2003, 14:50:18

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> The XML standard does not call for any table format.  But a number of
> table formats have been established within the XML framework.  Some of
> them are formatting-oriented (e.g., the HTML model, or CALS which is used
> in DocBook) and some of them are processing-oriented (e.g., SQL/XML).
> Which do we need?  And which do we need from psql in particular (keeping
> in mind that psql is primarily for interactive use and shell-scripting)?
> In any case, it should most likely be a standard table model and not a
> hand-crafted one.
I think all psql needs is a simple output, similar to the ones used by 
Oracle, Sybase, and MySQL; the calling application should then process 
it in some way as needed (obviously this is not for interactive use).
Where can one find a "standard table model?"

All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share 
a similar theme, with subtle differences (i.e. some use <row>, some 
<row num="x">, some have <rowset>). I'd be happy to write whatever 
format we can find or develop. My personal vote is the DBIx::XML_RDB 
format, perhaps with the row number that Oracle uses, producing this:

<?xml version="1.0"?>
<RESULTSET statement="select * from xmltest">
<ROW num="1"><scoops>3</scoops><flavor>chocolate</flavor>
</ROW>
<ROW num="2"><scoops>2</scoops><flavor>vanilla</flavor>
</ROW>
</RESULTSET>


> (If, for whatever reason, we go the "processing-oriented" route, then I
> claim that there should not be a different output with and without \x
> mode.)

I agree with this.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200303041444
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+ZQJNvJuQZxSWSsgRArGEAKD4xs+4Ns3syG175T3k80B6MvNJvgCbBkvF
hCkf5SMjLzMJ84uMl1w4tMY=
=a2Uq
-----END PGP SIGNATURE-----

Re: XML ouput for psql

From

Alan Gutierrez

Date:

04 March 2003, 19:25:30

* greg@turnstep.com <greg@turnstep.com> [2003-03-04 14:21]:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> 
> > The XML standard does not call for any table format.  But a number of
> > table formats have been established within the XML framework.  Some of
> > them are formatting-oriented (e.g., the HTML model, or CALS which is used
> > in DocBook) and some of them are processing-oriented (e.g., SQL/XML).
> > Which do we need?  And which do we need from psql in particular (keeping
> > in mind that psql is primarily for interactive use and shell-scripting)?
> > In any case, it should most likely be a standard table model and not a
> > hand-crafted one.
>  
> I think all psql needs is a simple output, similar to the ones used by 
> Oracle, Sybase, and MySQL; the calling application should then process 
> it in some way as needed (obviously this is not for interactive use).
> Where can one find a "standard table model?"
> 
> All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share 
> a similar theme, with subtle differences (i.e. some use <row>, some 
> <row num="x">, some have <rowset>). I'd be happy to write whatever 
> format we can find or develop. My personal vote is the DBIx::XML_RDB 
> format, perhaps with the row number that Oracle uses, producing this:
> 
> <?xml version="1.0"?>
> <RESULTSET statement="select * from xmltest">
> <ROW num="1">
>  <scoops>3</scoops>
>  <flavor>chocolate</flavor>
> </ROW>
> <ROW num="2">
>  <scoops>2</scoops>
>  <flavor>vanilla</flavor>
> </ROW>
> </RESULTSET>
> 
> 
> > (If, for whatever reason, we go the "processing-oriented" route, then I
> > claim that there should not be a different output with and without \x
> > mode.)
> 
> I agree with this.

I'm interested in creating XML documents that have heirarcy.
I can produce the above with Perl.

Acually, the difficult part has been getting the information back
into the database. Getting it out is a very simple query. I imagine
that every language/environment has an SQL->XML library somewhere,
but I wasn't able to find something that would go from XML to SQL.

I wrote a utility that takes an xml document, and xml configuration
file, and writes the document to a PostgerSQL data base using the
configuration file to figure out what goes where. The configuration
file makes some use of XPath to pluck the correct values out of the
xml doucment.

I suppose the same code could generate a document, but it is so easy
to do using Perl and cgi, I've not bothered.

It has some constraints, but it is a very useful utility. I've been
able to abosorb XML documents into my PostgreSQL db just by tweeking
the configuration file.

Currently, I am porting it to C++ from Perl.

-- 
Alan Gutierrez - ajglist@izzy.net
http://khtml-win32.sourceforge.net/ - KHTML on Windows

Re: XML ouput for psql

From

Alan Gutierrez

Date:

05 March 2003, 08:10:50

* greg@turnstep.com <greg@turnstep.com> [2003-03-04 14:21]:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1

> > The XML standard does not call for any table format.  But a
> > number of table formats have been established within the XML
> > framework.  Some of them are formatting-oriented (e.g., the HTML
> > model, or CALS which is used in DocBook) and some of them are
> > processing-oriented (e.g., SQL/XML).  Which do we need?  And
> > which do we need from psql in particular (keeping in mind that
> > psql is primarily for interactive use and shell-scripting)?  In
> > any case, it should most likely be a standard table model and
> > not a hand-crafted one.
>  
> I think all psql needs is a simple output, similar to the ones used by 
> Oracle, Sybase, and MySQL; the calling application should then process 
> it in some way as needed (obviously this is not for interactive use).
> Where can one find a "standard table model?"
> 
> All of the DBs I mentioned (and the perl module DBIx:XML_RDB) all share 
> a similar theme, with subtle differences (i.e. some use <row>, some 
> <row num="x">, some have <rowset>). I'd be happy to write whatever 
> format we can find or develop. My personal vote is the DBIx::XML_RDB 
> format, perhaps with the row number that Oracle uses, producing this:
> 
> <?xml version="1.0"?>
> <RESULTSET statement="select * from xmltest">
> <ROW num="1">
>  <scoops>3</scoops>
>  <flavor>chocolate</flavor>
> </ROW>
> <ROW num="2">
>  <scoops>2</scoops>
>  <flavor>vanilla</flavor>
> </ROW>
> </RESULTSET>
> 
> 
> > (If, for whatever reason, we go the "processing-oriented" route, then I
> > claim that there should not be a different output with and without \x
> > mode.)
> 
> I agree with this.

I'm interested in creating XML documents that have heirarcy.
I can produce the above with Perl.

I wrote a utility that takes an xml document, and xml configuration
file, and writes the document to a PostgerSQL data base using the
configuration file to figure out what goes where. The configuration
file makes some use of XPath to pluck the correct values out of the
xml doucment.

I suppose the same code could generate a document, but it is so easy
to do using Perl and cgi, I've not bothered.

This util has been very helpful to me in developing a document
mangement application. Rather than writing insert/update logic every
time the db or xml schema changes, I just tweak the config file and
it will generated the inserts, updates, and deletes by comparing the
XML document with the tables to which the XML elements are mapped.

I've been able to handle tree structures tolerably well.

I am currently rewriting the code in C++ from Perl.

-- 
Alan Gutierrez - ajglist@izzy.net
http://khtml-win32.sourceforge.net/ - KHTML on Windows

Re: XML ouput for psql

From

"Andrew Dunstan"

Date:

05 March 2003, 10:50:03

I've done a lot with XML lately, so I'll throw in my $0.02 worth.

One thing I have noticed about the schemes that are being advanced is that
they seem to be inherently unspecifiable, formally, because column names are
being used as tags.

An alternative might look something like this:

<?xml version="1.0"?>
<RESULTSET statement="select * from xmltest">
<COLUMNS> <COLUMN name="scoops" type="int" /> <COLUMN name="flavor" type="varchar(40)" />
</COLUMNS>
<ROW><FIELD name="scoops" isNull="false">3</FIELD><FIELD name="flavor" isNull="false">chocolate</FIELD>
</ROW>
<ROW><FIELD name="scoops" isNull="false">2</FIELD><FIELD name="flavor" isNull="false">vanilla</FIELD>
</ROW>
</RESULTSET>


Numbering the rows should be redundant (XPath will give it to you using
"position()", for example). OTOH, reporting out a null value as opposed to
an empty one is probably a good idea.

The formal DTD would be something like this (courtesy of the wonderful tools
at http://www.hitsw.com/xml_utilites/:

<!ELEMENT RESULTSET ( COLUMNS, ROW* ) >
<!ATTLIST RESULTSET statement CDATA #REQUIRED >
<!ELEMENT COLUMNS ( COLUMN+ ) >

<!ELEMENT COLUMN EMPTY >
<!ATTLIST COLUMN name NMTOKEN #REQUIRED >
<!ATTLIST COLUMN type CDATA #REQUIRED >

<!ELEMENT ROW ( FIELD+ ) ><!ELEMENT FIELD ( #PCDATA ) >
<!ATTLIST FIELD isNull ( false| true ) "false" >
<!ATTLIST FIELD name NMTOKEN #REQUIRED >or the equivalent in a schema:<?xml version="1.0" encoding="UTF-8" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="COLUMN">   <xs:complexType>     <xs:attribute
name="type"type="xs:string" use="required" />     <xs:attribute name="name" type="xs:NMTOKEN" use="required" />
</xs:complexType></xs:element>
 
 <xs:element name="COLUMNS">   <xs:complexType>     <xs:sequence>       <xs:element ref="COLUMN" minOccurs="1"
maxOccurs="unbounded"/>     </xs:sequence>   </xs:complexType> </xs:element>
 
 <xs:element name="FIELD">   <xs:complexType mixed="true">     <xs:attribute name="isNull" use="optional"
default="false">      <xs:simpleType>         <xs:restriction base="xs:NMTOKEN">           <xs:enumeration
value="false"/>           <xs:enumeration value="true" />         </xs:restriction>       </xs:simpleType>
</xs:attribute>    <xs:attribute name="name" type="xs:NMTOKEN" use="required" />   </xs:complexType> </xs:element>
 
 <xs:element name="RESULTSET">   <xs:complexType>     <xs:sequence>       <xs:element ref="COLUMNS" minOccurs="1"
maxOccurs="1"/>       <xs:element ref="ROW" minOccurs="0" maxOccurs="unbounded" />     </xs:sequence>     <xs:attribute
name="statement"type="xs:string" use="required" />   </xs:complexType> </xs:element>
 
 <xs:element name="ROW">   <xs:complexType>     <xs:sequence>       <xs:element ref="FIELD" minOccurs="1"
maxOccurs="unbounded"/>     </xs:sequence>   </xs:complexType> </xs:element>
 

</xs:schema>

Re: XML ouput for psql

From

Peter Eisentraut

Date:

05 March 2003, 17:37:26

greg@turnstep.com writes:

> I think all psql needs is a simple output, similar to the ones used by
> Oracle, Sybase, and MySQL; the calling application should then process
> it in some way as needed (obviously this is not for interactive use).
> Where can one find a "standard table model?"

I think for processing-oriented output, the system described in the
SQL/XML standard draft is the way to go.  Considering the people who wrote
it, it's probably pulled from, or bound to appear in, a major commercial
database.

I also think that psql is not the place to implement something like this.
It's most likely best put in the backend, as a function like
   xmlfoo('select * from t1;')

Then any interface and application that likes it, not just psql-based
ones, can use it.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: XML ouput for psql

From

Peter Eisentraut

Date:

05 March 2003, 17:38:22

Andrew Dunstan writes:

> One thing I have noticed about the schemes that are being advanced is that
> they seem to be inherently unspecifiable, formally, because column names are
> being used as tags.

The SQL/XML draft addresses this by specifying that a mapping from SQL
things to XML things spits out both the specification (XML Schema, IIRC)
and the data in one operation.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: XML ouput for psql

From

Tom Lane

Date:

05 March 2003, 19:16:57

Peter Eisentraut <peter_e@gmx.net> writes:
> I also think that psql is not the place to implement something like this.

Agreed.

> It's most likely best put in the backend, as a function like
>     xmlfoo('select * from t1;')

That seems a little bizarre.  Wouldn't we want to have a switch that
just flips the SELECT output format from one style to the other?

This is also a good time to stop and ask whether the frontend/backend
protocol needs to change to support this.  Not having read the spec,
I have no idea what the low-level transport needs are for XML output,
but I suspect our present protocol is not it ...
        regards, tom lane

Re: XML ouput for psql

From

Joe Conway

Date:

05 March 2003, 20:34:20

Tom Lane wrote:
> This is also a good time to stop and ask whether the frontend/backend
> protocol needs to change to support this.  Not having read the spec,
> I have no idea what the low-level transport needs are for XML output,
> but I suspect our present protocol is not it ...

It might be interesting to modify the protocol (and the backend at the 
point of projection to the front end) so that a user defined formating 
function could be applied and either accepted or rejected by the front 
end. Perhaps one flavor of XML output is a start, but I could imagine 
wanting a custom or even different "standard" output format.

Joe

Re: XML ouput for psql

From

cbbrowne@cbbrowne.com

Date:

05 March 2003, 21:11:04

> Peter Eisentraut <peter_e@gmx.net> writes:
> > I also think that psql is not the place to implement something like this.
> 
> Agreed.
> 
> > It's most likely best put in the backend, as a function like
> >     xmlfoo('select * from t1;')

> That seems a little bizarre.  Wouldn't we want to have a switch that
> just flips the SELECT output format from one style to the other?

Ah, but this approach has the merit that it doesn't require pushing out
a completely new set of tools.

> This is also a good time to stop and ask whether the frontend/backend
> protocol needs to change to support this.  Not having read the spec, I
> have no idea what the low-level transport needs are for XML output,
> but I suspect our present protocol is not it ...

That could be; there's enough variation in what one might want to do
with XML that it is not trivial to suggest an 'ideal' answer.

We have already seen the proposal of:
<record a="b" c="d" e="f">
<record a="c" c="e" e="g">
<record a="d" c="f" e="h">
<record a="e" c="g" e="i">

I would rather prefer something like:
<tablea><record>  <a>b</a> <c>d</c> <e>f</e></record>  <record>  <a>c</a> <c>d</c> <e>f</e></record>  <record>
<a>d</a><c>d</c> <e>f</e></record>  
 
<tablea>

(Note that both approaches are quite rational possibilities.)

I'd think that the "protocol" would involve passing back a row-as-string
for each row in the result set.
--
output = ("cbbrowne" "@cbbrowne.com")
http://www.ntlug.org/~cbbrowne/xml.html
"There are two major products that come out of Berkeley: LSD and Unix.
We don't believe this to be a coincidence." - Jeremy S. Anderson

Re: XML ouput for psql

From

greg@turnstep.com

Date:

06 March 2003, 10:31:38

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I think for processing-oriented output, the system described in the
> SQL/XML standard draft is the way to go.  Considering the people who wrote
> it, it's probably pulled from, or bound to appear in, a major commercial
> database.

Do you have a link to the exact section? I've found conflicting versions 
of what constitutes the "standard" for xml output of SQL data.

> I also think that psql is not the place to implement something like this.
> It's most likely best put in the backend, as a function like
> 
>     xmlfoo('select * from t1;')
> 
> Then any interface and application that likes it, not just psql-based
> ones, can use it.

I think that is a good long-term solution, but I still think we need 
to address the TODO item in the short run, and allow for a simple 
reformatting of the query results from psql. If not, we should remove 
that TODO item form psql and add a different one to the backend section.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200303061020

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+Z2jHvJuQZxSWSsgRAj7IAJ4hLEos9OlE67O02gVrrqxwT9n3AQCeJxto
N2LFyvXPfGY2whPUs5k+PQA=
=PYfs
-----END PGP SIGNATURE-----

Re: XML ouput for psql

From

Hannu Krosing

Date:

06 March 2003, 16:06:25

Peter Eisentraut kirjutas N, 06.03.2003 kell 00:37:
> greg@turnstep.com writes:
> 
> > I think all psql needs is a simple output, similar to the ones used by
> > Oracle, Sybase, and MySQL; the calling application should then process
> > it in some way as needed (obviously this is not for interactive use).
> > Where can one find a "standard table model?"
> 
> I think for processing-oriented output, the system described in the
> SQL/XML standard draft is the way to go.  Considering the people who wrote
> it, it's probably pulled from, or bound to appear in, a major commercial
> database.
> 
> I also think that psql is not the place to implement something like this.
> It's most likely best put in the backend, as a function like
> 
>     xmlfoo('select * from t1;')
> 
> Then any interface and application that likes it, not just psql-based
> ones, can use it.

I have written an aggregate function in pl/python for my own needs that
returns underlying query fomatted as XML, but it has some problems:

1) both the row-to-xml-fragment and
collect-the-fragments-to-wellformed-xml-doc have to be defined for each
and every different query (the actual function text is the same).

2) it is unneccesaryly hard to define a function that takes a record as
argument - the record type is lost: for even simple things like this

select * from (select * from mytable) mtab; 

the result of inner query is _not_ of rowtype mytable, i.e.

you can do

select xmlfrag(mytable) from mytable;

but not

select xmlfrag(mytable) from (select * from mytable) mytable;


----------------
Hannu

Re: XML ouput for psql

From

Peter Eisentraut

Date:

06 March 2003, 18:08:09

Tom Lane writes:

> This is also a good time to stop and ask whether the frontend/backend
> protocol needs to change to support this.  Not having read the spec,
> I have no idea what the low-level transport needs are for XML output,
> but I suspect our present protocol is not it ...

The spec defines "mappings" between tables, schemas, and catalogs on the
one side and each time a pair of XML documents on the other side, one of
which is an XML schema document (sort of a document type declaration) and
the other is an XML data document that follows the constraints of the
schema document and contains the actual data.  A table could of course
more or less be interpreted to mean a query result.  That means, this
functionality provides both query result retrieval via XML and a pg_dump
type mechanism with XML output.

So I imagine, if this is done fully with changes in the protocol layer,
then certain commands like "get table schema in XML" would have to exist
in the protocol, which doesn't seem right.  Also, the XML output isn't a
sibling of the current text/binary tuples, since an XML result is always
a whole document, not tuple data.

What we could perhaps consider is a family of functions like I
illustrated, but then provide a fast-path-driven layer on the client side,
like for large objects.  Initially, the development of these mapping
functions could take place totally in user-space.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: XML ouput for psql

From

Tom Lane

Date:

06 March 2003, 18:33:32

Peter Eisentraut <peter_e@gmx.net> writes:
> So I imagine, if this is done fully with changes in the protocol layer,
> then certain commands like "get table schema in XML" would have to exist
> in the protocol, which doesn't seem right.  Also, the XML output isn't a
> sibling of the current text/binary tuples, since an XML result is always
> a whole document, not tuple data.

I would envision a distinction comparable to the existing one between T
and D messages (RowDescription and AsciiRow, using the documentation's
names): you send the table schema first, then the data.  Also note that
there is no "command" to get the T message; it comes for free whenever
a SELECT result is sent to the frontend.

> What we could perhaps consider is a family of functions like I
> illustrated, but then provide a fast-path-driven layer on the client side,
> like for large objects.  Initially, the development of these mapping
> functions could take place totally in user-space.

I don't object to that as a quick-and-dirty context for prototyping work,
but I'd sure hate to see it as the production version.  The fastpath
protocol is a mess, and until/unless we get it cleaned up, we ought not
increase dependency on it.

A larger point is that this is still a protocol revision; pretending it
ain't is just willful obscurantism.  You can tell it's a protocol revision
because you will need to rewrite client-side libraries to take advantage
of it.  If we try to look the other way and pretend it isn't one, then
we'll just be incurring pain --- the most obvious pain being that it
will be hard for those client libraries to tell whether the protocol
extension is supported or not.

The way I'd prefer to see this handled is by providing alternatives to
the printtup.c DestReceiver routines.  The backend could be switched to
any desired output representation just by invoking different sets of
receiver routines.  What we seem to need first is a context for doing
that, in particular a way to understand how different output formats can
be fit into the FE/BE protocol.
        regards, tom lane