Home > mailing lists

Multi-entry indexes (with a view to XPath queries) - Mailing list pgsql-hackers

From	John Gray
Subject	Multi-entry indexes (with a view to XPath queries)
Date	June 25, 2001 15:41:38
Msg-id	io28h9.nk1.ln@adzuki Whole thread Raw
Responses	Re: Multi-entry indexes (with a view to XPath queries)
List	pgsql-hackers

Tree view

Firstly, I appreciate this may be a hare-brained scheme, but I've been
thinking about indexes in which the tuple pointer is not unique.

The reason for my interest is storing XML documents in text fields in the
database. (It could also help with particular kinds of full-text search?)

I would like to be able to construct indexes on a collection of XML
documents, based on the "value" of certain "fields" within the document.
(In jargon terms, producing an index whose key is the CDATA content of a
particular XML element). This could tie in with the Xpath and XQuery
proposals

A simplified example (from an archaeological site classification system):

<site><name>Glebe Farm, Long Itchington</name><location scheme="osgb">SU41793684</location><feature>
<type>Agricultural:StockControl</type>    <date scheme="code">med</date></feature><feature>
<type>Unassigned:Ditch</type>   <size type="depth" unit="m">1.5</size></feature>
 
</site> 

I'd like to produce an index on feature types so that I could type
(roughly):

SELECT siteid, xpath(doc,'//site/name'), xpath(doc,'//site/location') FROM
documents WHERE xpath(doc,'feature/type') = 'Agricultural: Stock Control';

[create table documents (integer siteid, text doc)]

Obviously I need to write a basic XML parser that can support such an
xpath function, but it would also be good to index by the results of that
function-i.e. to have an index containing feature type values. As each
document could have any number of these instances, the number of index
tuples would differ from the number of heap tuples.

As far as I can see, there is no particular reason why a btree index could
not be used[*]. However, vacuum.c makes assertions about number of index
tuples == number of heap tuples. I realise this is a useful consistency
check, but would it be possible to have a field in pg_index
(indnoidentity?) that indicates that a given index hasn't got a 1:1
index:heap relationship.

I have tried the approach of decomposing documents into cdata, element and
attribute tables, and I can use joins to extract a list of feature types
etc. (and could use triggers to update this) but the idea of not having to
parse a document to enter it into the database and not requiring
application logic to reconstruct it again seems a potential win for a
system which might store complex documents but usually searches on limited
criteria.

pgsql-hackers by date:

From: Bruce Momjian
Date: 25 June 2001, 15:01:42
Subject: Re: Re: [COMMITTERS] pgsql/src/bin/initdb initdb.sh

From: Hannu Krosing
Date: 25 June 2001, 16:30:57
Subject: Re: Multi-entry indexes (with a view to XPath queries)

Multi-entry indexes (with a view to XPath queries) - Mailing list pgsql-hackers

Previous

Next