Thread: returning PGresult as xml
hello, this note is intended to describe my work on beginning to further integrate xml into postgresql. first, i'd like to thank the contributers of contrib/xml as their work was instrumental in helping me understand what support exists and where i wanted to go. thanks. my first requirement is to export data from the database into a format which can be read not only by existing (postgresql) clients but by people and programs that don't know how to use a PGresult. xml is very verbose, but its popularity makes it closer to universal than anything else of which i could think. in addition, ideas like XSL/XSLT make an export of xml very attractive to me. it's been said that converting a PGresult into xml is "trivial" and that's why it hasn't been done in the codebase as of yet. i have seen much code that writes xml, and many mistakes are made. most often improper escaping, or writing to a schema/DTD that has not been well-thought out. the transformation into xml is not difficult, but it does require attention to detail. i feel badly that i have not been able to use any existing standards. xmlrpc, i found, was not type-rich enough, and that made it difficult or impossible to use. in particular, the only way to represent a matrix is as a struct of structs. this makes it very verbose for one to encode a PGresult. i found SOAP too difficult for compliance. so my result was to create a schema, which results in a DTD. an example of what my code generates can be found below. the following xml is the result of the query "SELECT 1 as foo 2 as bar": <?xml version='1.0' encoding='ISO-8859-1'?> <!DOCTYPE PGresult [ <!ELEMENT PGresult (col_desc*, row*)> <!ATTLIST PGresult num_rows CDATA #REQUIRED num_cols CDATA#REQUIRED> <!ELEMENT col_desc EMPTY> <!ATTLIST col_desc num CDATA #REQUIRED format (text | binary) #REQUIRED type CDATA #REQUIRED name CDATA #REQUIRED> <!ELEMENT row (col*)> <!ATTLIST row num CDATA #REQUIRED> <!ELEMENTcol (#PCDATA)> <!ATTLIST col num CDATA #REQUIRED> <!ENTITY NULL ''> ]> <PGresult num_rows='1' num_cols='2'> <col_desc num='0' type='int4' format='text' name='foo' /> <col_desc num='1' type='int4'format='text' name='bar' /> <row num='0'> <col num='0'>1</col> <col num='1'>2</col> </row> </PGresult> a slightly more complicated example: template1=# select oid,typname,typlen,typtype from pg_type where oid<20;oid | typname | typlen | typtype -----+---------+--------+--------- 16 | bool | 1 | b 17 | bytea | -1 | b 18 | char | 1 | b 19 | name | 32 | b (4 rows) <!DOCTYPE PGresult [ <!ELEMENT PGresult (col_desc*, row*)> <!ATTLIST PGresult num_rows CDATA #REQUIRED num_cols CDATA#REQUIRED> <!ELEMENT col_desc EMPTY> <!ATTLIST col_desc num CDATA #REQUIRED format (text | binary) #REQUIRED type CDATA #REQUIRED name CDATA #REQUIRED> <!ELEMENT row (col*)> <!ATTLIST row num CDATA #REQUIRED> <!ELEMENT col(#PCDATA)> <!ATTLIST col num CDATA #REQUIRED> <!ENTITY NULL ''> ]> <PGresult num_rows='4' num_cols='4'> <col_desc num='0' type='oid' format='text' name='oid' /> <col_desc num='1' type='name'format='text' name='typname' /> <col_desc num='2' type='int2' format='text' name='typlen' /> <col_desc num='3'type='char' format='text' name='typtype' /> <row num='0'> <col num='0'>16</col> <col num='1'>bool</col> <colnum='2'>1</col> <col num='3'>b</col> </row> <row num='1'> <col num='0'>17</col> <col num='1'>bytea</col> <colnum='2'>-1</col> <col num='3'>b</col> </row> <row num='2'> <col num='0'>18</col> <col num='1'>char</col> <colnum='2'>1</col> <col num='3'>b</col> </row> <row num='3'> <col num='0'>19</col> <col num='1'>name</col> <colnum='2'>32</col> <col num='3'>b</col> </row> </PGresult> i have done this work for myself and my own needs, so i fully understand if this work is not interesting to the postgresql group in general. however, if there is some chance that the changes could be incorporated into the tree, i would be interested in contributing, as integration into a proper version of postgresql will make my build easier. ;) i would expect that integration would look something like exposing from libpq a function that looks something like: const char *PGresult_as_xml(PGresult *result, int include_dtd); i would also expect that psql would be modified to take a \X and to call the above function. there is some strangeness now, as psql doesn't call methods defined in libpq to print; it has its own printer. i, of course, would do this work. :) i just need to know that people are interested. also, if integration is going to happen, i will need to replace calls to my hashtables with calls to postgresql's hashtables. i saw dynamic hashtables in the backend, but not in the interfaces. i wasn't exactly sure how i should go about introducing another module to the frontend; there could be problems of which i remain blissfully unaware. i look forward to feedback, and i hope this note finds you well, b __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free web site building tool. Try it! http://webhosting.yahoo.com/ps/sb/
Brian Moore kirjutas P, 25.01.2004 kell 11:07: > hello, > > this note is intended to describe my work on beginning to further > integrate xml into postgresql. first, i'd like to thank the > contributers of contrib/xml as their work was instrumental in helping > me understand what support exists and where i wanted to go. thanks. First, IMHO having unified XML support is a good thing for Postgres. I still have some questions and suggestions: At what place do you intend to add your converter ? I remember someone started abstracting out the FE/BE protocol calls in server code with an aim of supporting multiple protocols, but stopped without making too much progress (?) Also, I would suggest that XML Schema datatypes should be used, preferrably together with either RelaxNG schema or something from SQL/XML spec. ------------- Hannu
Brian Moore wrote: > i feel badly that i have not been able to use any existing > standards. xmlrpc, i found, was not type-rich enough, and that made > it difficult or impossible to use. in particular, the only way to > represent a matrix is as a struct of structs. this makes it very > verbose for one to encode a PGresult. i found SOAP too difficult for > compliance. so my result was to create a schema, which results in a > DTD. Let me point out an implementation I made last time this subject was discussed: http://developer.postgresql.org/~petere/xmltable.tar.bz2 This package contains server-side functions that convert a table (more generally a query result) to an XML document and/or and XSL schema both mimicking the SQL/XML standard. Additionally, it contains a function to convert such an XML document back to a table source. I also threw in an XSLT stylesheet to convert an SQL/XML table to an HTML table, so you can more easily view the results. I also have some code in development that adds cursor interfaces, an XML data type, and some integration with the existing XPath functionality. I think that for processing XML in the database and as far as following the existing standards, this is the direction to take. Also last time this subject was dicussed, I believe it was Mike Mascari who proposed and implemented another solution which is more client-side oriented. He wrote a piece of code that took a normal libpq result set and shipped it off as SQL/XML wrapped in SOAP. And it had streaming capabilities for large result sets. These are two complementary approaches that exist more or less. Of course this only covers the C API and would need sensible extensions for other programming langauges. But I invite you to look at them and see whether they fit your needs (the ideas, not necessarily the state of the code).
Peter Eisentraut wrote: >Brian Moore wrote: > > >>i feel badly that i have not been able to use any existing >>standards. xmlrpc, i found, was not type-rich enough, and that made >>it difficult or impossible to use. in particular, the only way to >>represent a matrix is as a struct of structs. this makes it very >>verbose for one to encode a PGresult. i found SOAP too difficult for >>compliance. so my result was to create a schema, which results in a >>DTD. >> >> > >Let me point out an implementation I made last time this subject was >discussed: > >http://developer.postgresql.org/~petere/xmltable.tar.bz2 > >Also last time this subject was dicussed, I believe it was Mike Mascari >who proposed and implemented another solution which is more client-side >oriented. > I humbly confess it wasn't me. We use CORBA.... Mike Mascari
Peter Eisentraut wrote: >Let me point out an implementation I made last time this subject was >discussed: > >http://developer.postgresql.org/~petere/xmltable.tar.bz2 > >This package contains server-side functions that convert a table (more >generally a query result) to an XML document and/or and XSL schema both >mimicking the SQL/XML standard. > >Additionally, it contains a function to convert such an XML document >back to a table source. I also threw in an XSLT stylesheet to convert >an SQL/XML table to an HTML table, so you can more easily view the >results. > >I also have some code in development that adds cursor interfaces, an XML >data type, and some integration with the existing XPath functionality. >I think that for processing XML in the database and as far as following >the existing standards, this is the direction to take. > > > Peter: this looks very nice. What are your intentions with this code? Put it in contrib? Also, do you intend to implement the SQL/XML functions XMLElement, XMLForest, XMLAttributes, XMLConcat and XMLAgg? cheers andrew
On Jan 25, 2004, at 3:07 AM, Brian Moore wrote: > it's been said that converting a PGresult into xml is "trivial" and > that's why it hasn't been done in the codebase as of yet. i have seen > much code that writes xml, and many mistakes are made. most often > improper escaping, or writing to a schema/DTD that has not been > well-thought out. the transformation into xml is not difficult, but it > does require attention to detail. The escaping, at any rate, is trivial if you use a proper API. It sounds like your code is not using any XML API, given that you have not mentioned adding dependencies to libpq and that you've mentioned your own hashtable algorithm. It would be much easier if you did so, though I imagine the additional dependency would mean it would not be accepted into libpq. > <PGresult num_rows='1' num_cols='2'> > <col_desc num='0' type='int4' format='text' name='foo' /> > <col_desc num='1' type='int4' format='text' name='bar' /> > <row num='0'> > <col num='0'>1</col> > <col num='1'>2</col> > </row> > </PGresult> How would you filter for a column in XSLT based on column name with this schema? It's certainly not trivial. I have similar code, and I included the column name as an attribute in each column element for this reason. I also used the java.sql type names rather than PostgreSQL ones, as my code is not specific to PostgreSQL. > i would expect that integration would look something like exposing > from libpq a function that looks something like: > const char *PGresult_as_xml(PGresult *result, int include_dtd); Ugh. So it returns the whole thing as one big string? That won't hold up well if your resultset is large. A better way would be to pump out SAX events. This is what I did for three reasons: 1) The escaping becomes trivial, as mentioned above. In fact, not only does SAX escape things correctly, but it makes you explicitly specify that the string you're giving it is character data, an element name, an attribute name, an attribute value, etc, and handles everything properly based on that. So you'd really have to work to screw it up, unlike code that just does like printf("<elem foo='%s' bar='%s'>%s</elem>", xml_attr_escape(foo_val), xml_attr_escape(bar_val), xml_char_escape(elem_val)); where it would be quite easy to lose track of what needs to be escaped how, what variables are already escaped, etc. 2) It can stream large result sets, provided that the next stage supports doing so. Certainly a raw SAX serializer would, also some XSLT stylesheets with Xalan, and STX/Joost is designed for streaming transformations. 3) If the next stage is a transformation, this makes it unnecessary to serialize and parse the data between. So the SAX way is faster. You're welcome to take a look at my code. I imagine it will not be directly useful to you, as it is written in Java, but I have a live example which puts this stuff to use. Designing an acceptable API and schema is always much easier when you see how it is put to use. <http://www.slamb.org/projects/xmldb/> - my (so far poorly-named) xmldb project, which includes the org.slamb.xmldb.ResultSetProducer class to transform a java.sql.ResultSet to SAX events in my resultset schema. <http://www.slamb.org/svn/repos/projects/xmldb/src/java/org/slamb/ xmldb/ResultSetProducer.java> - source code for said class <http://www.slamb.org/projects/mb/> - a message board which uses this code and some XSLT <https://www.slamb.org/mb/> - a live example of said message board <http://www.slamb.org/svn/repos/projects/mb/src/WEB-INF/xsl/ resultset.xsl> - simple XSLT to take an arbitrary resultset and convert it to an HTML table <http://www.slamb.org/svn/repos/projects/mb/src/WEB-INF/xsl/main.xsl> - an example XSLT file that inherits this and then provides exceptions for a couple columns (not displaying the id column, instead including it as a hyperlink in the name column). Good luck. Scott Lamb
Scott Lamb wrote: > On Jan 25, 2004, at 3:07 AM, Brian Moore wrote: > >> <PGresult num_rows='1' num_cols='2'> >> <col_desc num='0' type='int4' format='text' name='foo' /> >> <col_desc num='1' type='int4' format='text' name='bar' /> >> <row num='0'> >> <col num='0'>1</col> >> <col num='1'>2</col> >> </row> >> </PGresult> > > > How would you filter for a column in XSLT based on column name with > this schema? It's certainly not trivial. I have similar code, and I > included the column name as an attribute in each column element for > this reason. Close to trivial if you set up a key on the col-desc elements, I should think. Maybe something like: <xsl:key name="coldesc" match="col-desc" use="@num" /> ... <xsl:for-each select=" key('coldesc',@num)/@name = 'colname' " > ... Alternatively you can get there using the parent and preceding-sibling axes, but it's less clear. cheers andrew
I tried to build plperl on 7.4.1, On my system perl -MConfig -e 'print $Config{ccdlflags}' returns -rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE however the build ends up using -rpath,$prefix/lib Dave -- Dave Cramer 519 939 0336 ICQ # 1467551
I'm curious what the result of a reverse index does on a table with url like data, so I did the following create function fn_strrev(text) returns text as 'return reverse($_[0])' language 'plperl' with (iscachable); create index r_url_idx on url( fn_strrev(url)); vacuum analyze; explain select * from url where url like fn_strrev('%beta12.html'); QUERY PLAN ---------------------------------------------------------Seq Scan on url (cost=0.00..13281.70 rows=1 width=454) Filter:((url)::text ~~ 'lmth.21ateb%'::text) Is it possible to get the planner to use an index scan ? How? the db is using locale 'C' -- Dave Cramer 519 939 0336 ICQ # 1467551
Dave Cramer <pg@fastcrypt.com> writes: > create index r_url_idx on url( fn_strrev(url)); > explain select * from url where url like fn_strrev('%beta12.html'); > QUERY PLAN > --------------------------------------------------------- > Seq Scan on url (cost=0.00..13281.70 rows=1 width=454) > Filter: ((url)::text ~~ 'lmth.21ateb%'::text) > Is it possible to get the planner to use an index scan ? Sure, but not that way. Try "fn_strrev(url) like something". You have to compare the indexed value to something... regards, tom lane
same answer davec=# show enable_seqscan;enable_seqscan ----------------off (1 row) davec=# explain analyze select * from url where fn_strrev(url) like '%beta12.html'; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Seq Scanon url (cost=100000000.00..100013533.04 rows=503 width=454) (actual time=3851.636..3851.636 rows=0 loops=1) Filter: (fn_strrev((url)::text) ~~ '%beta12.html'::text)Total runtime: 3851.712ms (3 rows) On Tue, 2004-01-27 at 12:33, Tom Lane wrote: > Dave Cramer <pg@fastcrypt.com> writes: > > create index r_url_idx on url( fn_strrev(url)); > > > explain select * from url where url like fn_strrev('%beta12.html'); > > QUERY PLAN > > --------------------------------------------------------- > > Seq Scan on url (cost=0.00..13281.70 rows=1 width=454) > > Filter: ((url)::text ~~ 'lmth.21ateb%'::text) > > > Is it possible to get the planner to use an index scan ? > > Sure, but not that way. Try "fn_strrev(url) like something". > You have to compare the indexed value to something... > > regards, tom lane > -- Dave Cramer 519 939 0336 ICQ # 1467551
On Tue, 27 Jan 2004, Dave Cramer wrote: > same answer > > davec=# show enable_seqscan; > enable_seqscan > ---------------- > off > (1 row) > > davec=# explain analyze select * from url where fn_strrev(url) like > '%beta12.html'; That's still an unanchored like clause, besides I think that would get urls that begin with lmth.21ateb. I think the condition you want would be: fn_strrev(url) like 'lmth.21ateb%'
On Tue, Jan 27, 2004 at 12:41:41PM -0500, Dave Cramer wrote: > davec=# explain analyze select * from url where fn_strrev(url) like > '%beta12.html'; Reverse the constant too: davec=# explain analyze select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); You won't get an indexscan if you have a % in front of the string. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Dave Cramer <pg@fastcrypt.com> writes: > davec=# explain analyze select * from url where fn_strrev(url) like > '%beta12.html'; Don't you need the % at the right end to have an indexable plan? I suspect that both of your tries so far are actually semantically wrong, and that what you intend is select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); regards, tom lane
Tried, all the suggestions --dc-- davec=# explain analyze select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Seq Scanon url (cost=100000000.00..100013533.04 rows=503 width=454) (actual time=1416.448..3817.221 rows=12 loops=1) Filter: (fn_strrev((url)::text) ~~ 'lmth.21ateb%'::text)Total runtime:3817.315 ms (3 rows) davec=# explain analyze select * from url where fn_strrev(url) like 'lmth.21ateb%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------Seq Scanon url (cost=100000000.00..100013533.04 rows=503 width=454) (actual time=1412.181..3843.998 rows=12 loops=1) Filter: (fn_strrev((url)::text) ~~ 'lmth.21ateb%'::text)Total runtime:3844.106 ms (3 rows) davec=# explain analyze select * from url where fn_strrev(url) like '%lmth.21ateb'; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Seq Scanon url (cost=100000000.00..100013533.04 rows=503 width=454) (actual time=3853.501..3853.501 rows=0 loops=1) Filter: (fn_strrev((url)::text) ~~ '%lmth.21ateb'::text)Total runtime: 3853.583ms (3 rows) On Tue, 2004-01-27 at 13:02, Tom Lane wrote: > Dave Cramer <pg@fastcrypt.com> writes: > > davec=# explain analyze select * from url where fn_strrev(url) like > > '%beta12.html'; > > Don't you need the % at the right end to have an indexable plan? > I suspect that both of your tries so far are actually semantically > wrong, and that what you intend is > > select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); > > regards, tom lane > -- Dave Cramer 519 939 0336 ICQ # 1467551
Dave Cramer <pg@fastcrypt.com> writes: > Tried, all the suggestions Mph. It works for me... what PG version are you using exactly, and are you certain you've selected C locale? (Do you get LIKE optimization on plain indexes?) regards, tom lane
I'm using 7.4.1, the db was initdb --locale='C' and no I don't get them on plain indexes ???? Dave On Tue, 2004-01-27 at 13:28, Tom Lane wrote: > Dave Cramer <pg@fastcrypt.com> writes: > > Tried, all the suggestions > > Mph. It works for me... what PG version are you using exactly, > and are you certain you've selected C locale? (Do you get LIKE > optimization on plain indexes?) > > regards, tom lane > -- Dave Cramer 519 939 0336 ICQ # 1467551
Dave Cramer <pg@fastcrypt.com> writes: > I'm using 7.4.1, the db was initdb --locale='C' > and no I don't get them on plain indexes ???? Oh? If it's 7.4 then you can confirm the locale selection with "show lc_collate" and "show lc_ctype" (I think the first of these is what the LIKE optimization checks). regards, tom lane
Interesting it works now, and the good news is it is *WAY* faster, this might be able to speed up marc's doc search by orders of magnitude this is searching 100536 rows select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); 1.57ms explain select * from url where url like '%beta12.html';3310.38 ms Dave On Tue, 2004-01-27 at 13:48, Tom Lane wrote: > Dave Cramer <pg@fastcrypt.com> writes: > > I'm using 7.4.1, the db was initdb --locale='C' > > and no I don't get them on plain indexes ???? > > Oh? If it's 7.4 then you can confirm the locale selection with > "show lc_collate" and "show lc_ctype" (I think the first of these > is what the LIKE optimization checks). > > regards, tom lane > -- Dave Cramer 519 939 0336 ICQ # 1467551
Andrew Dunstan wrote: > Peter: this looks very nice. What are your intentions with this code? Once we figure out how to handle the on-the-wire character set recoding when faced with XML documents (see separate thread a few weeks ago), I would like to finish it. > Put it in contrib? Also, do you intend to implement the SQL/XML > functions XMLElement, XMLForest, XMLAttributes, XMLConcat and XMLAgg? You have to implement these directly in the parser, which I'm not yet excited about.
Peter Eisentraut kirjutas N, 29.01.2004 kell 19:31: > Andrew Dunstan wrote: > > Peter: this looks very nice. What are your intentions with this code? > > Once we figure out how to handle the on-the-wire character set recoding > when faced with XML documents (see separate thread a few weeks ago), I > would like to finish it. > > > Put it in contrib? Also, do you intend to implement the SQL/XML > > functions XMLElement, XMLForest, XMLAttributes, XMLConcat and XMLAgg? > > You have to implement these directly in the parser, which I'm not yet > excited about. Why not use some standard parser ? libxml2 (www.xmlsoft.org) seems nice and is either available as a separate dynamic library or can also be (IIRC) configured to build with just the minimal needed functionality. --------------- Hannu
Dave Cramer wrote: > Interesting it works now, and the good news is it is *WAY* faster, this > might be able to speed up marc's doc search by orders of magnitude > > this is searching 100536 rows > > select * from url where fn_strrev(url) like fn_strrev('%beta12.html'); > 1.57ms > > > explain select * from url where url like '%beta12.html'; > 3310.38 ms The nice thing about this is that you can create your query thusly: SELECT * from table WHERE column like 'string' AND fn_strrev(column) LIKE fn_strrev('string') and, if you have both a standard index on column and a functional index on fn_strrev(column), the query will be fast (well, as fast as the pattern in question allows) as long as 'string' is anchored on either end. I've implemented the 'locate' utility in Perl using a PG backend instead of the standard locate database. I internally convert globs given as arguments into LIKE strings, and with a functional index like that the searches are now blazingly fast -- faster than the original 'locate' utility. It has the added advantage that you can specify a file type to further narrow the search (thus 'locate --type file "core"' will find all regular files named 'core' in the database). I'll be happy to share my code with anyone who's interested. -- Kevin Brown kevin@sysexperts.com