Re: Native XML - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Native XML
Date
Msg-id 4D6BCC1E.3010406@dunslane.net
Whole thread Raw
In response to Re: Native XML  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 02/28/2011 10:51 AM, Tom Lane wrote:
> Andrew Dunstan<andrew@dunslane.net>  writes:
>> xpath_table is severely broken by design IMNSHO. We need a new design,
>> but I'm reluctant to work on that until someone does LATERAL, because a
>> replacement would be much nicer to design with it than without it.
> Well, maybe I'm missing something, but I don't really understand why
> xpath_table's design is so unreasonable.  Also, what would a better
> solution look like exactly?  (Feel free to assume LATERAL is available.)
>


What's unreasonable about it is that the supplied paths are independent 
of each other, and evaluated in the context of the entire XML document.

Let's take the given example in the docs, changed slightly to assume 
each piece of XML can have more than one article listing in it (i.e,. 
'article' is not the root node of the document):
   SELECT * FROM   xpath_table('article_id',                'article_xml',                'articles',
'//article/author|//article/pages|//article/title',               'date_entered>  ''2003-01-01'' ')   AS t(article_id
integer,author text, page_count integer, title text);
 

There is nothing that says that the author has to come from the same 
article as the title, nor is there any way of saying that they must. If 
an article node is missing author or pages or title, or has more than 
one where its siblings do not, they will line up wrongly.

An alternative would be to supply a single xpath expression that would 
specify the context nodes to be iterated over (in this case that would 
be '//article') and a set of xpath expressions to be evaluated in the 
context of those nodes (in this case 'article|pages|title' ort better 
yet, supply these as a text array). We'd produce exactly one row for 
each node found by the context expression, and take the first value 
found by each of the column expressions in that context (or we could 
error out if we found more than one, or supply an array if the result 
field is an array). So with LATERAL taking care of the rest, the 
function signature could be something like:
   xpath_table_new(        doc xml,        context_xpath text,        column_xpath text[])   returns setof record


Given this, you could not get a row with title and author from different 
article nodes in the source document like you can now.

cheers

andrew



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Native XML
Next
From: Tom Lane
Date:
Subject: EXPLAIN doesn't show sufficient info for wCTE cases