Thread: GiST: Need ideas on how to minimise data in a GiST index

GiST: Need ideas on how to minimise data in a GiST index

From

"Mark Cave-Ayland"

Date:

24 March 2003, 06:30:06

Hi everyone,

I'm looking for someone with experience with GiST and PostgreSQL to
offer some advice on the following problem:

I'm currently looking at improving the R-Tree implementation of PostGIS
(http://postgis.refractions.net) by consolidating the indexable and
non-indexable operators so that they use the same functions. Each
geometry in the database is indexed by its bounding box as normal.
However, each database column also has a spatial reference system
identifier (SRID) that specifies the coordinate system the geometry is
in. What should happen is that given two geometries or bounding boxes,
an error should be generated whenever the SRIDs do not match.

The problem I am wrestling with at the moment is that the indexable
operators do not check the SRID between two geometries; this is because
a standard PostgreSQL BOX is used to specify the bounding boxes in the
index and so by extracting the key from the GISTENTRY structure, it is
impossible to determine whether the SRIDs match because the SRID is
stored in the geometry and not the index key.

The current line of thinking is to modify the index to use a structure
like below as the index key:


typedef struct BOX2D
{
    double LLB;        // Lower left bottom
    double URT;        // Upper right top
    long srid;        // Spatial ref sys identifier
}


However we know that for a given column (and index) the SRID will be the
same so it seems pointless to store the SRID in the index itself as it
is redundant information. The problem is therefore how is the best
method to store the SRID for a given column while keeping the index as
small as possible?

The only thing I can think of is to use the Relation, Page and Offset
pointers in the GISTENTRY structure to somehow return a reference to the
geometry and return the SRID as part of the Decompress() function for
leaf nodes. But this seems like a lot of extra work, even if it only has
to be done for leaf nodes, and I don't have a clue which routines I
would need to do this. Does anyone have a better idea as to how this
could be done, perhaps some way of storing/retrieving metadata about an
index through GiST?


Cheers,

Mark.


---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: GiST: Need ideas on how to minimise data in a GiST index

From

Tom Lane

Date:

24 March 2003, 10:34:15

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> However, each database column also has a spatial reference system
> identifier (SRID) that specifies the coordinate system the geometry is
> in. What should happen is that given two geometries or bounding boxes,
> an error should be generated whenever the SRIDs do not match.

Offhand I'm not sure why you think the index should solve this.  It
seems more like grist for a CHECK constraint applied to the table.

One way to attack it without bloating the index is to consider the index
as lossy, meaning that the actual geometric operator must be reapplied
on each row identified by the index.  Then the operator can compare
SRIDs, but you need not store SRIDs in the index.

            regards, tom lane

Re: GiST: Need ideas on how to minimise data in a GiST index

From

"Mark Cave-Ayland"

Date:

24 March 2003, 11:17:27

Hi Tom,

> "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> > However, each database column also has a spatial reference system
> > identifier (SRID) that specifies the coordinate system the
> geometry is
> > in. What should happen is that given two geometries or
> bounding boxes,
> > an error should be generated whenever the SRIDs do not match.
>
> Offhand I'm not sure why you think the index should solve
> this.  It seems more like grist for a CHECK constraint
> applied to the table.

Yes, there is already a check constraint on the column that quite nicely
handles inserts/deletions etc. The problem occurs when someone tries to
query an area in a form like:

select * from geomtable where geom && BOX2D('1.0000 1.000, 2.000 2.000',
-2)

(where say -2 is the SRID and geom has a SRID of -1). In this case the
BOX2D (which is the area you are trying to query) gets cast into a BOX
based on its bounding box to perform the query with no way of
recognising the SRIDs are different :(

> One way to attack it without bloating the index is to
> consider the index as lossy, meaning that the actual
> geometric operator must be reapplied on each row identified
> by the index.  Then the operator can compare SRIDs, but you
> need not store SRIDs in the index.

This sounds exactly as if it would solve the problem. Have you got any
pointers to documentation on this, and more specifically on the
PostgreSQL implementation of GiST? (I'm guessing this is a GiST only
extension). Has it been available since PostgreSQL 7.1 aswell?

Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.

Re: GiST: Need ideas on how to minimise data in a GiST index

From

Tom Lane

Date:

24 March 2003, 11:27:00

"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes:
> This sounds exactly as if it would solve the problem. Have you got any
> pointers to documentation on this, and more specifically on the
> PostgreSQL implementation of GiST? (I'm guessing this is a GiST only
> extension). Has it been available since PostgreSQL 7.1 aswell?

No, lossiness is a generic index property.  There's been some level of
support for it since Berkeley days, although we improved the flexibility
in 7.2 --- we now think of lossiness as a property associated with
specific operators rather than a whole index.  That would probably not
matter for this particular situation, since you'd be wanting to mark all
the operators lossy anyway.

You can read about this stuff in the PG Programmer's Guide,
"Interfacing Extensions To Indexes".  I'd start with the 7.3 version
and then work backwards.

            regards, tom lane