Re: XML and postgres - Mailing list pgsql-hackers

From John Gray
Subject Re: XML and postgres
Date
Msg-id 1054506966.1456.25.camel@adzuki.azuli.co.uk
Whole thread Raw
In response to XML and postgres  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
On Fri, 2003-05-30 at 17:06, Oleg Bartunov wrote:
> Hello,
> 
> Is there interest to storing and indexed access methods for xml in
> postgresql ? While I don't use xml in my applications but I see

Yes. But my GiST skills were never up to it!

> possible directions to develop contrib module with indexed access methods
> to xml-like data type. We have already contrib/ltree for tree-like
> structures and recently we developed (not released yet) hstore module,
> which implements hash data type like in perl with indexed AM to keys, values.
> Motivation for this modules is need to store data with weak structure
> (semi-structured data), i.e. we have several obligatory fields and a bunch
> of optional data. Obligatory fields could be stored as usual, while
> for optional columns we use special data type - hstore, which serves as
> a storage of (key,value) pairs. There are could be many (key,value) pairs and
> hstore provides AM to them. We've realized that combination of
> ltree, hstore could be used for xml.

This is a possible way to approach it, but because my interest has come
from the direction of doing XPath queries, I'm particularly interested
in how you can efficiently index XML documents for use with XPath. The
arrival of expressional indexes is good because you can now optimise

select article_xml from table where
xpath(article_xml,'/article/author[@surname]') = 'Smith';

which is great for single-valued expressions, but it's quite possible to
have a multi-valued result, for which we want a different operator:

select article_xml from table where
xpath(article_xml,'/article/author[@surname]') =# 'Smith';

which returns the case where one of the values is Smith. 

Now, this is the case that interests me for indexing - how feasible is
an index that records the fact that xpath(...) as above is multi-valued
and has (say) three entries, and records them in the appropriate places
in the index.

2. We could alternatively say "We want to optimise XPath queries" - and
produce an index 'tree' which enumerated possible paths. From a root
node, we would have an edge labelled 'descendant' to 'article' and so on
(the XPath spec describes this model of document paths in great detail. 

Obviously, not all queries are rooted to the top of the hierarchy, which
means that you might end up with an index with 'bypass' nodes - making
it not a binary tree (in other words, I'm not sure it could actually
work...). This is all speculative stuff from the perspective of someone
who tried to get to grips with GiST and struggled!

Obviously this kind of index would require some integration with XPath
query support - which at present is (in contrib/xml) just passed over to
libxml to do. In test cases it's proved to be sufficiently fast even
doing full parsing, and I suspect that the index technique above could
help.

Why Xpath? Well, it seems to be getting adopted more widely (it is also
part of XSLT) and it represents a fairly easy syntax for navigating
hierarchical XML documents.

> 
> We have no spare time to elaborate this,  so if someone could work on
> this, we could provide hstore module and help with developing.
> 
> 

Oh, I sympathise completely. I have been able to do a little more work
on contrib/xml for a company in my spare time (work which I hope we'll
be able to release back) but my day job has no XPath in it now! (I am
now a broadcast engineer, rather than an IT consultant...)

Regards

John 

-- 
John Gray <jgray@azuli.co.uk>




pgsql-hackers by date:

Previous
From: "Francisco Figueiredo Jr."
Date:
Subject: initdb problems handling super username with spaces.
Next
From: Rod Taylor
Date:
Subject: Re: Table-driven SHOW (was Re: Are we losing momentum?)