GiST: Need ideas on how to minimise data in a GiST index - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject GiST: Need ideas on how to minimise data in a GiST index
Date
Msg-id 8F4A22E017460A458DB7BBAB65CA6AE502644E@webbased9
Whole thread Raw
Responses Re: [GENERAL] GiST: Need ideas on how to minimise data in a GiST index
List pgsql-hackers
Hi everyone,

I'm looking for someone with experience with GiST and PostgreSQL to
offer some advice on the following problem:

I'm currently looking at improving the R-Tree implementation of PostGIS
(http://postgis.refractions.net) by consolidating the indexable and
non-indexable operators so that they use the same functions. Each
geometry in the database is indexed by its bounding box as normal.
However, each database column also has a spatial reference system
identifier (SRID) that specifies the coordinate system the geometry is
in. What should happen is that given two geometries or bounding boxes,
an error should be generated whenever the SRIDs do not match.

The problem I am wrestling with at the moment is that the indexable
operators do not check the SRID between two geometries; this is because
a standard PostgreSQL BOX is used to specify the bounding boxes in the
index and so by extracting the key from the GISTENTRY structure, it is
impossible to determine whether the SRIDs match because the SRID is
stored in the geometry and not the index key.

The current line of thinking is to modify the index to use a structure
like below as the index key:


typedef struct BOX2D
{
    double LLB;        // Lower left bottom
    double URT;        // Upper right top
    long srid;        // Spatial ref sys identifier
}


However we know that for a given column (and index) the SRID will be the
same so it seems pointless to store the SRID in the index itself as it
is redundant information. The problem is therefore how is the best
method to store the SRID for a given column while keeping the index as
small as possible?

The only thing I can think of is to use the Relation, Page and Offset
pointers in the GISTENTRY structure to somehow return a reference to the
geometry and return the SRID as part of the Decompress() function for
leaf nodes. But this seems like a lot of extra work, even if it only has
to be done for leaf nodes, and I don't have a clue which routines I
would need to do this. Does anyone have a better idea as to how this
could be done, perhaps some way of storing/retrieving metadata about an
index through GiST?


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.


pgsql-hackers by date:

Previous
From: Karel Zak
Date:
Subject: Re: to_char(interval) --- done?
Next
From: Alvaro Herrera
Date:
Subject: Re: to_char(interval) --- done?