Thread: GiST index on data types that require compression

GiST index on data types that require compression

From
Dave Blasby
Date:
I'm trying to get my geometric type to spatially index.  I tried RTrees,
but they dont like objects that are bigger than 8k.

I'm now trying to get a GiST index to index based on the bounding box
thats contained inside the geometry.  So the index is on a GEOMETRY
type, but the index is only acting on the GEOMETRY->bvolume (which is a
BOX3D).

So far, it doesnt work.  Only one of my GiST support functions is called
(the compress function), after that I get the error message:
# create index qq on tp3 using gist (the_geom gist_geometry_ops) with
(islossy);
ERROR:  index_formtuple: data takes 8424504 bytes, max is 8191

I simplified the all the geometry in the test (tp3) table so they
contain only one point - each object is only a few hundred bytes, and
there's only 100 rows.

I'm obviously doing something very wrong.

My compress function looks like:

GISTENTRY *ggeometry_compress(GISTENTRY *entry)
{   BOX3D *tmp;   GISTENTRY *retval;
   if (entry->leafkey)    {    tmp = (BOX3D *) palloc(sizeof(BOX3D));    memcpy((char *) tmp, (char *) &(((GEOMETRY
*)(entry->pred))->bvol),
sizeof(BOX3D));
    retval = palloc(sizeof(GISTENTRY));    gistentryinit(*retval, (char *)tmp, entry->rel, entry->page,
   entry->offset, sizeof(BOX3D),FALSE);    return(retval);   }   else return(entry);
 
}


On its first (and only) call, the geometry ("entry->pred") really is the
first row in the tp3 table.

Does anyone have any ideas where to start tracking this problem down? 
Am I writing code for a very old version of GiST?

I've tried to find other examples of GiST using compression, but none of
them work.  "contrib/intarray" in the standard distribution just spins
(cpu 100%) when you try to build an index, and
"http://s2k-ftp.cs.berkeley.edu:8000/gist/pggist/" has an example using
the standard built-in polygon type (I based my code on it) - but its
really really old and I spent a few hours trying to get it to compile,
then gave up.


Any ideas or examples?

dave
ps. I'm using postgresql 7.1.1 with the gist.c 7.1 patch.  I get the
exact result with out-of-the-box-7.1.1.
pps. My code is available at
ftp://ftp.refractions.net/pub/refractions/postgis.c             ftp://ftp.refractions.net/pub/refractions/postgis.h
andthe sql definitions are at
 
ftp://ftp.refractions.net/pub/refractions/def.sql    and a dump of the tp3 table is at
ftp://ftp.refractions.net/pub/refractions/tp3.sql


Re: GiST index on data types that require compression

From
Tom Lane
Date:
Dave Blasby <dblasby@refractions.net> writes:
> So far, it doesnt work.  Only one of my GiST support functions is called
> (the compress function), after that I get the error message:
>  # create index qq on tp3 using gist (the_geom gist_geometry_ops) with
> (islossy);
> ERROR:  index_formtuple: data takes 8424504 bytes, max is 8191

It looks like the GIST code expects your compress function to give back
a varlena datatype, not the fixed-length type you are actually handing
back.  The ridiculous length comes from interpreting the first word
of your BOX3D as a length.

There are/were provisions in the GIST code for having the compress
function emit a different datatype than it takes in, but I think they
are incomplete or broken.  Might be easiest to produce a varlena result
for now.
        regards, tom lane


Re: GiST index on data types that require compression

From
Oleg Bartunov
Date:
On Fri, 25 May 2001, Tom Lane wrote:

> Dave Blasby <dblasby@refractions.net> writes:
> > So far, it doesnt work.  Only one of my GiST support functions is called
> > (the compress function), after that I get the error message:
> >  # create index qq on tp3 using gist (the_geom gist_geometry_ops) with
> > (islossy);
> > ERROR:  index_formtuple: data takes 8424504 bytes, max is 8191
>
> It looks like the GIST code expects your compress function to give back
> a varlena datatype, not the fixed-length type you are actually handing
> back.  The ridiculous length comes from interpreting the first word
> of your BOX3D as a length.
>
> There are/were provisions in the GIST code for having the compress
> function emit a different datatype than it takes in, but I think they
> are incomplete or broken.  Might be easiest to produce a varlena result
> for now.

compress fully supports fixed-length and varlena types. The problem is
index_formtuple - types of key and column could be different
(example - polygon, where column has varlena type but key is fixed-length)
As a workaround one could use the samy type for key and column.
1st integer field in strcuture BOX3D should be length of this structure
in bytes.

Tom, do you have an idea how to fix this problem ?
Oleg
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: GiST index on data types that require compression

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> compress fully supports fixed-length and varlena types. The problem is
> index_formtuple - types of key and column could be different
> (example - polygon, where column has varlena type but key is fixed-length)

Right.  There used to be a horrible kluge whereby the user could specify
the type to be assumed for the key in the CREATE INDEX command (the
"haskeytype" stuff is the remaining traces of this).  This is brain dead
of course ... the correct way is to look at the pg_proc definition of
the compress function and see what type it's declared to return, not
rely on the user to get it right.

What I find just about as objectionable as the old haskeytype hack is
that the user has to tell you whether the index is lossy or not.  This
should be a property available from the system catalogs.  Not sure where
to put it; do we need another column in pg_opclass, or is someplace
other than the opclass needed?
        regards, tom lane


Re: GiST index on data types that require compression

From
Teodor Sigaev
Date:
> What I find just about as objectionable as the old haskeytype hack is
> that the user has to tell you whether the index is lossy or not.  This
> should be a property available from the system catalogs.  Not sure where
> to put it; do we need another column in pg_opclass, or is someplace
> other than the opclass needed?
> 

So, may by add to pg_opclass two fields?
bool is_varlena_key
bool is_lossy_compress

Then index_formtuple must look at is_varlena_key and 'with (islossy)' 
could be determined automatically by view at used ops.



-- 
Teodor Sigaev
teodor@stack.net




Re: GiST index on data types that require compression

From
Teodor Sigaev
Date:
 > So, may by add to pg_opclass two fields?> bool is_varlena_key> bool is_lossy_compress


Sorry, I was wrong. In GiST, index_formtuple doesn't know about size of 
fixed-length type of keys, because only loadable module has information 
about structure of key. So, may be it needs to have function which 
return size of key or index_formtuple must looks at GISTENTRY.bytes( 
Note: A lot of currrent implementation of GiST modules don't set value 
GISTENTRY.bytes ). Or fields in pg_opclass:
int len_key
bool is_lossy_compress

if len_key==-1 then key is varlena type.

-- 
Teodor Sigaev
teodor@stack.net




Re: GiST index on data types that require compression

From
Dave Blasby
Date:
I took your (Tom, Oleg, and Teodor's) advice and changed my GiST code so
it compresses a normal GEOMETRY into a BOX3D-only-GEOMETRY by striping
out the actual geometric information.  This way, everything is
consistent.

I now appear to be able to create and use a GiST index (well, it works
on my 1 test case ;^) ).

Unfortunately, when I issue the CREATE INDEX command, it takes a really
long time.  The system goes to about 80% iowait (according to top), or
mostly idle.  There's lots of memory free.   Any ideas why?

dave


Re: GiST index on data types that require compression

From
Tom Lane
Date:
Teodor Sigaev <teodor@stack.net> writes:
> So, may by add to pg_opclass two fields?
> bool is_varlena_key
> bool is_lossy_compress

Certainly not 'is_varlena_key', since that's not all the info you
need about the key datatype --- a type OID would be more appropriate.
But it seems to me that we should be able to learn the key type OID
by examining the signature of the compression function.

The real question is whether the opclass is the right place for this
info.  After thinking some more, I'm inclined to think not, since the
opclass isn't tied to a single index type.  For example, poly_ops
might be lossy for GIST but not for another index type such as rtree.

It occurs to me that the structure of pg_opclass/pg_amop/pg_amproc
might be wrong.  Perhaps pg_opclass should be indexed by (AM OID,
opclass name) not just opclass name, and then we could remove the
amopid columns from pg_amop and pg_amproc, since the OID of a
pg_opclass row would be sufficient to identify the access method.
This would allow us to put access-method-specific information into
pg_opclass.  It would also be clearer whether a given AM supports
a given opclass name or not (right now, one has to see whether there
are matching entries in the other tables, which is pretty iffy
considering that one doesn't know how many there should be).
        regards, tom lane


Getting the current transaction's xid

From
David Blasby
Date:
I'm trying to keep a variable around for the duration of a transaction.

Unfortunately, the "SET LOCAL" command doesnt allow me to create my own 
variable.  Also, the "CREATE TEMP TABLE ... ON COMMIT DELETE ROWS" isnt 
yet implemented.

But, I believe I can implement it using a TEMP table like this:

CREATE TEMP TABLE my_variable (transId  xid, value in);

INSERT INTO my_variable ( getTransactionID(), 25);

And I can read from the table with:

SELECT value FROM my_variable WHERE transId = getTransactionID();

The question is, how to write the getTransactionID() function.

I'm comfortable writing "C" extensions to postgresql, but I'm not sure 
where to actually get the current transaction id.  Could someone give me 
pointers to where I can find this magic value?

dave