Re: Native XML - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Native XML
Date
Msg-id 24401.1299007456@sss.pgh.pa.us
Whole thread Raw
In response to Re: Native XML  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I apparently didn't express myself very well, since you seem to have
> *completely* missed my point.  I know we can do tsearch2 searches
> against XML, or JSON, or YAML, or (insert next week's new favorite
> format here).  What we can't currently do efficiently is search for
> particular values in some particular place in the hierarchy of a
> document.  I've had loads of fun approximating it with regular
> expressions, but some days I'd like life to be easier.

Check.
> What I was arguing for is a new type which would represent the
> structure in a fashion which was independent of the particular text
> format and was efficient to traverse hierarchically.  Done right,
> that would map well to GiST.  Although, thinking about that some
> more, perhaps there would be a way to create a GiST index suitable
> for that straight from the XML text, and avoid the sharded column. 
> A GiST index actually seems pretty close to what such a structure
> would look like anyway....

FWIW, GIN might be a more natural match, at least for the cases where
"place in the document" has a scalar value.  If you need to search for
"place" with something other than equality or prefix match semantics,
maybe not.

But in any case I think your point is that this is an indexing problem,
and whether the full document in the table column is pre-parsed or not
isn't all that relevant for performance.  I agree.  tsearch2 is really a
precedent for your argument, not a distinct approach, because it doesn't
expect pre-parsed text columns either.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: pl/python tracebacks
Next
From: Robert Haas
Date:
Subject: Re: wrapping up this CommitFest (was Re: knngist - 0.8)