On 03/01/2011 02:15 PM, Kevin Grittner wrote:
>
>>> Given that there were similar issues for other hierarchical data
>>> types, perhaps we need something similar to tsvector, but for
>>> hierarchical data. The extra layer of abstraction might not cost
>>> much when used for XML compared to the possible benefit with
>>> other data. It seems likely to be a very nice fit with GiST
>>> indexes.
>>>
>>> So under this idea, you would always have the text (or maybe byte
>>> array?) version of the XML, and you could "shard" it to a
>>> separate column for fast searches.
>
>> Tsearch should be able to handle XML now. It certainly knows how
>> to recognize XML tags.
>
> I apparently didn't express myself very well, since you seem to have
> *completely* missed my point. I know we can do tsearch2 searches
> against XML, or JSON, or YAML, or (insert next week's new favorite
> format here). What we can't currently do efficiently is search for
> particular values in some particular place in the hierarchy of a
> document. I've had loads of fun approximating it with regular
> expressions, but some days I'd like life to be easier.
>
> What I was arguing for is a new type which would represent the
> structure in a fashion which was independent of the particular text
> format and was efficient to traverse hierarchically. Done right,
> that would map well to GiST. Although, thinking about that some
> more, perhaps there would be a way to create a GiST index suitable
> for that straight from the XML text, and avoid the sharded column.
> A GiST index actually seems pretty close to what such a structure
> would look like anyway....
>
I probably didn't read your suggestion closely enough.
I think hierarchical data really only scratches the surface of the
problem. It would be nice to be able to specify all sorts of context for
searches:
* foo after bar * foo near bar * foo and bar in the same paragraph * foo as a
parent/child/ancestor/descendent/sibling/cousinof bar
cheers
andrew