gist index build produces corrupt result on first access to table. - Mailing list pgsql-general

From Eric Davies
Subject gist index build produces corrupt result on first access to table.
Date
Msg-id 5.2.1.1.0.20040303092437.025db698@marilyn.barrodale.com
Whole thread Raw
Responses Re: gist index build produces corrupt result on first access to table.
List pgsql-general
We've implemented a 5D box data type and have implemented both RTree and GiST access methods under PostgresSQL 7.4 and PostgresSQL 7.4.1. The 5D box internally looks like:
   struct Box5D{
        float minBounds[5];
          float maxBounds[5];
    };
and so takes up 40 bytes and is of fixed length. The GiST access methods are basically a generalization of the 2D box indexing access methods from Refraction Research's PostGIS.

We've tested this by building an index on a table that contains 30000 rows (no nulls) representing small non-overlapping boxes on a 2D plane (ie, all the values for indices 0 and 1 vary; the values for indices 2,3, and 4 are constant). Then we try a select on the table with an overlaps test in the where clause that should result in 66 rows being returned.
We've used an "explain select ..... "  to verify that the index is used during the search.

An RTree index built on the table works fine, always. The ability to build a GiST index is less reliable; sometimes the index is built properly and subsequent searchs on it work perfectly. Sometimes the index is built incorrectly, and subsequent searchs on it fail. We have examined the keys passed to our GiST consistent access method during the select and have observed that when the index hasn't been built properly, the first key passed to the consistent method is gibberish (strange values for indices 0 and 1, strange nonzero values for indices 2,3 and 4). When the index has been built properly, the first key passed to the consistent method has reasonable looking values.

We've been able to isolate this behavior somewhat: the index build only seems to produce corrupt results if the data connection that sent the command did not perform any preceding statements that would have caused the boxes to have been read from disk. For example, the sequence:
    <open connection to db>
    create index box_index on box_table using gist(box_column);

would produce a corrupt index.
However, the sequence

    <open connection to db>
    select count(*) from box_table where box_column && '0 0 0 0 0 1 1 1 1 1';
    create index box_index on box_table using gist(box_column);

produces a valid index,
as does the sequence:

    <open connection to db>
    create index box_index on box_table using gist(box_column);
    drop index box_index;
    create index box_index on box_table using gist(box_column);

In short, it appears that an operation that forces the boxes to read from a table
needs to be done to "prime the index building pump".

We've tried installing PostGIS and it does not seem to be affected by any similar problems.
The notable differences between our code and Refraction's are:
  • our input geometries are fixed length, rather than variable length
  • our compress method just returns the GISTENTRY * it was passed as an argument. We've tried returning a deep copy of the GISENTRY *, but that made no difference.
We've looked at the contrib/cube code, but it's also a variable length parameter, and it uses a very different parameter passing mechanism.


In the short term, we can work around the issue, but it would be nice to be able to fix the problem. Any ideas are welcome.

Thank you.
Eric.

**********************************************
Eric Davies, M.Sc.
Barrodale Computing Services Ltd.
Tel: (250) 472-4372 Fax: (250) 472-4373
Web: http://www.barrodale.com
Email: eric@barrodale.com
**********************************************
Mailing Address:
P.O. Box 3075 STN CSC
Victoria BC Canada V8W 3W2

Shipping Address:
Hut R, McKenzie Avenue
University of Victoria
Victoria BC Canada V8W 3W2
**********************************************


pgsql-general by date:

Previous
From: Paulovič Michal
Date:
Subject: Re: Moving from MySQL to PGSQL....some questions (multilevel
Next
From: Bruce Momjian
Date:
Subject: Re: libpq API for PQcmdTuples()