Thread: GiST index on data types that require compression
I'm trying to get my geometric type to spatially index. I tried RTrees, but they dont like objects that are bigger than 8k. I'm now trying to get a GiST index to index based on the bounding box thats contained inside the geometry. So the index is on a GEOMETRY type, but the index is only acting on the GEOMETRY->bvolume (which is a BOX3D). So far, it doesnt work. Only one of my GiST support functions is called (the compress function), after that I get the error message: # create index qq on tp3 using gist (the_geom gist_geometry_ops) with (islossy); ERROR: index_formtuple: data takes 8424504 bytes, max is 8191 I simplified the all the geometry in the test (tp3) table so they contain only one point - each object is only a few hundred bytes, and there's only 100 rows. I'm obviously doing something very wrong. My compress function looks like: GISTENTRY *ggeometry_compress(GISTENTRY *entry) { BOX3D *tmp; GISTENTRY *retval; if (entry->leafkey) { tmp = (BOX3D *) palloc(sizeof(BOX3D)); memcpy((char *) tmp, (char *) &(((GEOMETRY *)(entry->pred))->bvol), sizeof(BOX3D)); retval = palloc(sizeof(GISTENTRY)); gistentryinit(*retval, (char *)tmp, entry->rel, entry->page, entry->offset, sizeof(BOX3D),FALSE); return(retval); } else return(entry); } On its first (and only) call, the geometry ("entry->pred") really is the first row in the tp3 table. Does anyone have any ideas where to start tracking this problem down? Am I writing code for a very old version of GiST? I've tried to find other examples of GiST using compression, but none of them work. "contrib/intarray" in the standard distribution just spins (cpu 100%) when you try to build an index, and "http://s2k-ftp.cs.berkeley.edu:8000/gist/pggist/" has an example using the standard built-in polygon type (I based my code on it) - but its really really old and I spent a few hours trying to get it to compile, then gave up. Any ideas or examples? dave ps. I'm using postgresql 7.1.1 with the gist.c 7.1 patch. I get the exact result with out-of-the-box-7.1.1. pps. My code is available at ftp://ftp.refractions.net/pub/refractions/postgis.c ftp://ftp.refractions.net/pub/refractions/postgis.h andthe sql definitions are at ftp://ftp.refractions.net/pub/refractions/def.sql and a dump of the tp3 table is at ftp://ftp.refractions.net/pub/refractions/tp3.sql
Dave Blasby <dblasby@refractions.net> writes: > So far, it doesnt work. Only one of my GiST support functions is called > (the compress function), after that I get the error message: > # create index qq on tp3 using gist (the_geom gist_geometry_ops) with > (islossy); > ERROR: index_formtuple: data takes 8424504 bytes, max is 8191 It looks like the GIST code expects your compress function to give back a varlena datatype, not the fixed-length type you are actually handing back. The ridiculous length comes from interpreting the first word of your BOX3D as a length. There are/were provisions in the GIST code for having the compress function emit a different datatype than it takes in, but I think they are incomplete or broken. Might be easiest to produce a varlena result for now. regards, tom lane
On Fri, 25 May 2001, Tom Lane wrote: > Dave Blasby <dblasby@refractions.net> writes: > > So far, it doesnt work. Only one of my GiST support functions is called > > (the compress function), after that I get the error message: > > # create index qq on tp3 using gist (the_geom gist_geometry_ops) with > > (islossy); > > ERROR: index_formtuple: data takes 8424504 bytes, max is 8191 > > It looks like the GIST code expects your compress function to give back > a varlena datatype, not the fixed-length type you are actually handing > back. The ridiculous length comes from interpreting the first word > of your BOX3D as a length. > > There are/were provisions in the GIST code for having the compress > function emit a different datatype than it takes in, but I think they > are incomplete or broken. Might be easiest to produce a varlena result > for now. compress fully supports fixed-length and varlena types. The problem is index_formtuple - types of key and column could be different (example - polygon, where column has varlena type but key is fixed-length) As a workaround one could use the samy type for key and column. 1st integer field in strcuture BOX3D should be length of this structure in bytes. Tom, do you have an idea how to fix this problem ? Oleg > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > compress fully supports fixed-length and varlena types. The problem is > index_formtuple - types of key and column could be different > (example - polygon, where column has varlena type but key is fixed-length) Right. There used to be a horrible kluge whereby the user could specify the type to be assumed for the key in the CREATE INDEX command (the "haskeytype" stuff is the remaining traces of this). This is brain dead of course ... the correct way is to look at the pg_proc definition of the compress function and see what type it's declared to return, not rely on the user to get it right. What I find just about as objectionable as the old haskeytype hack is that the user has to tell you whether the index is lossy or not. This should be a property available from the system catalogs. Not sure where to put it; do we need another column in pg_opclass, or is someplace other than the opclass needed? regards, tom lane
> What I find just about as objectionable as the old haskeytype hack is > that the user has to tell you whether the index is lossy or not. This > should be a property available from the system catalogs. Not sure where > to put it; do we need another column in pg_opclass, or is someplace > other than the opclass needed? > So, may by add to pg_opclass two fields? bool is_varlena_key bool is_lossy_compress Then index_formtuple must look at is_varlena_key and 'with (islossy)' could be determined automatically by view at used ops. -- Teodor Sigaev teodor@stack.net
> So, may by add to pg_opclass two fields?> bool is_varlena_key> bool is_lossy_compress Sorry, I was wrong. In GiST, index_formtuple doesn't know about size of fixed-length type of keys, because only loadable module has information about structure of key. So, may be it needs to have function which return size of key or index_formtuple must looks at GISTENTRY.bytes( Note: A lot of currrent implementation of GiST modules don't set value GISTENTRY.bytes ). Or fields in pg_opclass: int len_key bool is_lossy_compress if len_key==-1 then key is varlena type. -- Teodor Sigaev teodor@stack.net
I took your (Tom, Oleg, and Teodor's) advice and changed my GiST code so it compresses a normal GEOMETRY into a BOX3D-only-GEOMETRY by striping out the actual geometric information. This way, everything is consistent. I now appear to be able to create and use a GiST index (well, it works on my 1 test case ;^) ). Unfortunately, when I issue the CREATE INDEX command, it takes a really long time. The system goes to about 80% iowait (according to top), or mostly idle. There's lots of memory free. Any ideas why? dave
Teodor Sigaev <teodor@stack.net> writes: > So, may by add to pg_opclass two fields? > bool is_varlena_key > bool is_lossy_compress Certainly not 'is_varlena_key', since that's not all the info you need about the key datatype --- a type OID would be more appropriate. But it seems to me that we should be able to learn the key type OID by examining the signature of the compression function. The real question is whether the opclass is the right place for this info. After thinking some more, I'm inclined to think not, since the opclass isn't tied to a single index type. For example, poly_ops might be lossy for GIST but not for another index type such as rtree. It occurs to me that the structure of pg_opclass/pg_amop/pg_amproc might be wrong. Perhaps pg_opclass should be indexed by (AM OID, opclass name) not just opclass name, and then we could remove the amopid columns from pg_amop and pg_amproc, since the OID of a pg_opclass row would be sufficient to identify the access method. This would allow us to put access-method-specific information into pg_opclass. It would also be clearer whether a given AM supports a given opclass name or not (right now, one has to see whether there are matching entries in the other tables, which is pretty iffy considering that one doesn't know how many there should be). regards, tom lane
I'm trying to keep a variable around for the duration of a transaction. Unfortunately, the "SET LOCAL" command doesnt allow me to create my own variable. Also, the "CREATE TEMP TABLE ... ON COMMIT DELETE ROWS" isnt yet implemented. But, I believe I can implement it using a TEMP table like this: CREATE TEMP TABLE my_variable (transId xid, value in); INSERT INTO my_variable ( getTransactionID(), 25); And I can read from the table with: SELECT value FROM my_variable WHERE transId = getTransactionID(); The question is, how to write the getTransactionID() function. I'm comfortable writing "C" extensions to postgresql, but I'm not sure where to actually get the current transaction id. Could someone give me pointers to where I can find this magic value? dave