Re: Native XML - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Native XML
Date
Msg-id 4D6D4D15.9060206@dunslane.net
Whole thread Raw
In response to Re: Native XML  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Native XML
List pgsql-hackers

On 03/01/2011 02:15 PM, Kevin Grittner wrote:
>
>>> Given that there were similar issues for other hierarchical data
>>> types, perhaps we need something similar to tsvector, but for
>>> hierarchical data.  The extra layer of abstraction might not cost
>>> much when used for XML compared to the possible benefit with
>>> other data.  It seems likely to be a very nice fit with GiST
>>> indexes.
>>>
>>> So under this idea, you would always have the text (or maybe byte
>>> array?) version of the XML, and you could "shard" it to a
>>> separate column for fast searches.
>
>> Tsearch should be able to handle XML now. It certainly knows how
>> to recognize XML tags.
>
> I apparently didn't express myself very well, since you seem to have
> *completely* missed my point.  I know we can do tsearch2 searches
> against XML, or JSON, or YAML, or (insert next week's new favorite
> format here).  What we can't currently do efficiently is search for
> particular values in some particular place in the hierarchy of a
> document.  I've had loads of fun approximating it with regular
> expressions, but some days I'd like life to be easier.
>
> What I was arguing for is a new type which would represent the
> structure in a fashion which was independent of the particular text
> format and was efficient to traverse hierarchically.  Done right,
> that would map well to GiST.  Although, thinking about that some
> more, perhaps there would be a way to create a GiST index suitable
> for that straight from the XML text, and avoid the sharded column.
> A GiST index actually seems pretty close to what such a structure
> would look like anyway....
>


I probably didn't read your suggestion closely enough.


I think hierarchical data really only scratches the surface of the 
problem. It would be nice to be able to specify all sorts of context for 
searches:
   * foo after bar   * foo near bar   * foo and bar in the same paragraph   * foo as a
parent/child/ancestor/descendent/sibling/cousinof bar
 


cheers

andrew


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: wrapping up this CommitFest (was Re: knngist - 0.8)
Next
From: Andrew Hammond
Date:
Subject: Re: mysql2pgsql.perl update