Thread: GiST: Need ideas on how to minimise data in a GiST index
Hi everyone, I'm looking for someone with experience with GiST and PostgreSQL to offer some advice on the following problem: I'm currently looking at improving the R-Tree implementation of PostGIS (http://postgis.refractions.net) by consolidating the indexable and non-indexable operators so that they use the same functions. Each geometry in the database is indexed by its bounding box as normal. However, each database column also has a spatial reference system identifier (SRID) that specifies the coordinate system the geometry is in. What should happen is that given two geometries or bounding boxes, an error should be generated whenever the SRIDs do not match. The problem I am wrestling with at the moment is that the indexable operators do not check the SRID between two geometries; this is because a standard PostgreSQL BOX is used to specify the bounding boxes in the index and so by extracting the key from the GISTENTRY structure, it is impossible to determine whether the SRIDs match because the SRID is stored in the geometry and not the index key. The current line of thinking is to modify the index to use a structure like below as the index key: typedef struct BOX2D { double LLB; // Lower left bottom double URT; // Upper right top long srid; // Spatial ref sys identifier } However we know that for a given column (and index) the SRID will be the same so it seems pointless to store the SRID in the index itself as it is redundant information. The problem is therefore how is the best method to store the SRID for a given column while keeping the index as small as possible? The only thing I can think of is to use the Relation, Page and Offset pointers in the GISTENTRY structure to somehow return a reference to the geometry and return the SRID as part of the Decompress() function for leaf nodes. But this seems like a lot of extra work, even if it only has to be done for leaf nodes, and I don't have a clue which routines I would need to do this. Does anyone have a better idea as to how this could be done, perhaps some way of storing/retrieving metadata about an index through GiST? Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > However, each database column also has a spatial reference system > identifier (SRID) that specifies the coordinate system the geometry is > in. What should happen is that given two geometries or bounding boxes, > an error should be generated whenever the SRIDs do not match. Offhand I'm not sure why you think the index should solve this. It seems more like grist for a CHECK constraint applied to the table. One way to attack it without bloating the index is to consider the index as lossy, meaning that the actual geometric operator must be reapplied on each row identified by the index. Then the operator can compare SRIDs, but you need not store SRIDs in the index. regards, tom lane
Hi Tom, > "Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > > However, each database column also has a spatial reference system > > identifier (SRID) that specifies the coordinate system the > geometry is > > in. What should happen is that given two geometries or > bounding boxes, > > an error should be generated whenever the SRIDs do not match. > > Offhand I'm not sure why you think the index should solve > this. It seems more like grist for a CHECK constraint > applied to the table. Yes, there is already a check constraint on the column that quite nicely handles inserts/deletions etc. The problem occurs when someone tries to query an area in a form like: select * from geomtable where geom && BOX2D('1.0000 1.000, 2.000 2.000', -2) (where say -2 is the SRID and geom has a SRID of -1). In this case the BOX2D (which is the area you are trying to query) gets cast into a BOX based on its bounding box to perform the query with no way of recognising the SRIDs are different :( > One way to attack it without bloating the index is to > consider the index as lossy, meaning that the actual > geometric operator must be reapplied on each row identified > by the index. Then the operator can compare SRIDs, but you > need not store SRIDs in the index. This sounds exactly as if it would solve the problem. Have you got any pointers to documentation on this, and more specifically on the PostgreSQL implementation of GiST? (I'm guessing this is a GiST only extension). Has it been available since PostgreSQL 7.1 aswell? Cheers, Mark. --- Mark Cave-Ayland Webbased Ltd. Tamar Science Park Derriford Plymouth PL6 8BX England Tel: +44 (0)1752 764445 Fax: +44 (0)1752 764446 This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person.
"Mark Cave-Ayland" <m.cave-ayland@webbased.co.uk> writes: > This sounds exactly as if it would solve the problem. Have you got any > pointers to documentation on this, and more specifically on the > PostgreSQL implementation of GiST? (I'm guessing this is a GiST only > extension). Has it been available since PostgreSQL 7.1 aswell? No, lossiness is a generic index property. There's been some level of support for it since Berkeley days, although we improved the flexibility in 7.2 --- we now think of lossiness as a property associated with specific operators rather than a whole index. That would probably not matter for this particular situation, since you'd be wanting to mark all the operators lossy anyway. You can read about this stuff in the PG Programmer's Guide, "Interfacing Extensions To Indexes". I'd start with the 7.3 version and then work backwards. regards, tom lane