Re: Dreaming About Redesigning SQL - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Dreaming About Redesigning SQL
Date
Msg-id 200310192219.00118.josh@agliodbs.com
Whole thread Raw
In response to Re: Dreaming About Redesigning SQL  (Sailesh Krishnamurthy <sailesh@cs.berkeley.edu>)
List pgsql-hackers
Sailesh,

Warning:  I get carried away in this response.   I'm afraid that I'm a fond 
reader of Fabian Pascal and CJ Date, so I have far too much to say on the 
topic.   So if you really care about XML databases, you should probably hold 
off on reading the rest until you're well-caffinated and in a cheerful frame 
of mind.

Also, let me clarify that there is a distinction between using XML *as a* 
database, and putting XML documents into databases of other types.   I find 
the latter obvious and sensible, but the former a silly and wrong-headed 
idea, and it's the pure-XML-database which I attack below.

If you want to really have this out, I live in San Francisco and I love to 
argue.  Coffee at Intermezzo?  I'll buy.

-------------------------------
> If you look at the academic research work, there have been gazillions
> of recent papers on XML database technology.

Point me to one which presents an algebra, calculus, or other mathematical 
underpinning of XML databases, and I will be happy to eat my words on this 
list.   I can easily find lots of papers using google, but all of them are 
about *technical implementation* and do not provide a theoretical 
underpinning for XML databases.  

A few (such as Dan Suciu's paper) present some theory to back XQuery but it is 
presented entirely as an XML-based data access extension to SQL ... a role 
which seems fine to me.  

Others, even those cited by xmldb.org like the below, have rather lukewarm 
things to say on the topic, such as David Mertz, PhD:
(http://www-106.ibm.com/developerworks/library/x-matters8/index.html)

"XML is an extremely versatile data transport format, but despite high hopes 
for it, XML is mediocre to poor as a data storage and access format. ..." 
<snip>
" ...XML has no inherent mechanism for enforcing constraints of this sort 
(DTDs and schemas are constraints of a different, more limited sort). Without 
constraints, you just have data, not a data model (to slightly oversimplify 
matters). ..." <snip>
" ... In other words, go ahead and be excited by XML's promise of a universal 
data transport mechanism, but keep your backend data on something designed 
for it, like DB2 or Oracle (or on Postgres or MySQL for smaller-scale 
systems)."

And this guy is cited by XMLDB.org?   Perhaps not surprising, as among the 5 
goals of XMLDB.org, development of a standard theory of XML databases is not 
present.  

>  All the major database
> vendors (Oracle, IBM and Microsoft) are investing fairly heavily in
> core-engine XMLDB technology.

So?   Oracle, IBM and Microsoft also have SQL databases that do a terrible job 
of upholding the SQL standard, and their (at least Oracle's and Microsoft's) 
adherence is getting worse with successive versions rather than better. I 
wouldn't look to them for guidance.  

If they're spending millions on XML Databases, it's becuase it, however 
wrong-headed, is a fad and fads mean sales, and they don't want to take a 
chance on missing out.  And these companies have backed plenty of useless 
technologies before; remember Microsoft's "Periodicals on CD"?  

Not that I'm against XML; as far as I'm concerned, for interchangable, 
searchable, and archival documents, XML is the greatest thing since sliced 
Beatles.   I love XML-RPC for pushing data through HTTP, and I will happily 
be in the cheering squad for anyone who writes a set of OSS tools to extract 
data from XML docs stored in a PostgreSQL database, or to automate 
some-standard-XML-to-relational-data-and-back conversion.   That is a good 
application of XML+Database ideas.

XML databases, on the other hand, are an example of taking a good idea too 
far.  XML is a great data transmission tool; it's a great document 
transformation tool; it's a good way to store documents.   It is not, 
however, a good database.
------------------------------------------------------
-- 
Josh Berkus
Aglio Database Solutions
San Francisco


pgsql-hackers by date:

Previous
From: Shridhar Daithankar
Date:
Subject: Re: Vacuum thoughts
Next
From: Sailesh Krishnamurthy
Date:
Subject: Re: Dreaming About Redesigning SQL