Re: Multi-entry indexes (with a view to XPath queries) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Multi-entry indexes (with a view to XPath queries)
Date
Msg-id 28692.993502132@sss.pgh.pa.us
Whole thread Raw
In response to Multi-entry indexes (with a view to XPath queries)  ("John Gray" <jgray@beansindustry.co.uk>)
List pgsql-hackers
"John Gray" <jgray@beansindustry.co.uk> writes:
> Firstly, I appreciate this may be a hare-brained scheme, but I've been
> thinking about indexes in which the tuple pointer is not unique.

It sounds pretty hare-brained to me all right ;-).  What's wrong with
the normal approach of one index tuple per heap tuple, ie, multiple
index tuples with the same key?  It seems to me that your idea will just
make index maintenance a lot more difficult.  For example, what happens
when one of the referenced rows is deleted?  We'd have to actually
change, not just remove, the index tuple, since it'd also be pointing at
undeleted rows.  That'll create a whole bunch of concurrency problems.

> Obviously I need to write a basic XML parser that can support such an
> xpath function, but it would also be good to index by the results of that
> function-i.e. to have an index containing feature type values. As each
> document could have any number of these instances, the number of index
> tuples would differ from the number of heap tuples.

Why would you want multiple index entries for the same key (never mind
whether they are in a single index tuple or multiple tuples) pointing to
the same row?

Actually, after thinking a little more, I suspect the idea you are
really trying to describe here is index entries with finer-than-tuple
granularity.  This is not silly, but it is sufficiently outside the
normal domain of SQL that I think you are fighting an uphill battle.
You'd be *much* better off creating a table that has one row per
indexable entity, whatever that is.

> I have tried the approach of decomposing documents into cdata, element and
> attribute tables, and I can use joins to extract a list of feature types
> etc. (and could use triggers to update this) but the idea of not having to
> parse a document to enter it into the database

How do you expect that to happen, when you will have to parse it to get
the index terms?

You might be able to address your problem with two tables, one holding
original documents and one with a row for each indexable entity
(document section).  This second one would then have the field index
built on it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: Multi-entry indexes (with a view to XPath queries)
Next
From: Marko Kreen
Date:
Subject: Re: [PATCH] by request: base64 for bytea