Thread: GiST consistent function, expected arguments; multi-dimensional indexes

In GiST, for each new data type we support we're expected to provide
(among other things) a function to determine whether a query is
consistent with a particular index entry (given an operator/
strategy).  I haven't been able to figure out when the query value
being passed (arg 1 in the <datatype>_consistent function) is the
actual value (say the value "6"), or a pointer to the value.  In the
btree_gist contrib examples, I see both cases (int4, a value; text, a
pointer).  Does anyone know how to tell when when the gist consistent
function should expect a pointer or a value (apparently this is setup
in the scankey in a generic fashion before getting into the index-
specific implementation but can't tell how exactly)?  Does this depend
on the storage clause in CREATE TYPE?

Also, a broader question.  GiST is setup to evaluate each column in
the index in the order it was specified (e.g. if the index has colX,
colY, then if colX satisfies the search, colX is checked, otherwise
colY is ignored)...as in traditional btree indexes.  I would like to
set up a multi-dimensional index that doesn't require all data to be
contained in a single column (like an rtree index where the X and Y
coordinates would be in different columns rather than a single
composite column).  I have taken the approach of hacking some of the
GiST code to do this (basically, my code looks at each itup on a page
and evaluates on multiple columns at once, depending on strategy and
according to my algorithm).  This bypasses some of the existing GiST
code but takes advantage of a lot of it (recovery features, etc.), but
it's a reluctant and probably temporary hack.  So my question....is
this type of initiative already in the works in some project out
there?  Hate to duplicate effort.  GiST  takes you a long way but not
quite all the way for these types of scenarios.

Thanks
Eric



Re: GiST consistent function, expected arguments; multi-dimensional indexes

From
Martijn van Oosterhout
Date:
On Wed, Jun 27, 2007 at 09:32:13AM -0700, Eric wrote:
> In GiST, for each new data type we support we're expected to provide
> (among other things) a function to determine whether a query is
> consistent with a particular index entry (given an operator/
> strategy).  I haven't been able to figure out when the query value
> being passed (arg 1 in the <datatype>_consistent function) is the
> actual value (say the value "6"), or a pointer to the value.  In the
> btree_gist contrib examples, I see both cases (int4, a value; text, a
> pointer).  Does anyone know how to tell when when the gist consistent
> function should expect a pointer or a value (apparently this is setup
> in the scankey in a generic fashion before getting into the index-
> specific implementation but can't tell how exactly)?  Does this depend
> on the storage clause in CREATE TYPE?

Everything is always passed as a Datum, so yes, it's is determined by
the storage clause in CREATE TYPE.

> Also, a broader question.  GiST is setup to evaluate each column in
> the index in the order it was specified (e.g. if the index has colX,
> colY, then if colX satisfies the search, colX is checked, otherwise
> colY is ignored)...as in traditional btree indexes.  I would like to
> set up a multi-dimensional index that doesn't require all data to be
> contained in a single column (like an rtree index where the X and Y
> coordinates would be in different columns rather than a single
> composite column).

The usual approach to this is to define the index on a composite of
the values. For example, if you have a table with two points that you
want to index together, you do:

CREATE INDEX foo ON bar((box(point1,point2)));

i.e. a functional index on the result of combining the points. It does
mean you need to use the same syntax when doing the queries, but it
works with modifying any internal code at all...

Given you can use rowtypes more easily these days, it's quite possible
you use build an operator class on a row type...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

>
> Everything is always passed as a Datum, so yes, it's is determined by
> the storage clause in CREATE TYPE.

Still not sure what to do in some scenarios.  One example is the gist
example code for btree (btree_gist).  If you look at the int4 example
consistent function, it gets an int32 value (param 1).  For other
data  types, it would get a pointer to a value.  Is the rule anything
<= 4 bytes it's a value, above that it's a pointer?  See the code
below...

Datum
gbt_int4_consistent(PG_FUNCTION_ARGS)
{
GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);int32        query = PG_GETARG_INT32(1);int32KEY   *kkk =
(int32KEY*) DatumGetPointer(entry->key);
 

>
> The usual approach to this is to define the index on a composite of
> the values. For example, if you have a table with two points that you
> want to index together, you do:
>
> CREATE INDEX foo ON bar((box(point1,point2)));
>
> i.e. a functional index on the result of combining the points. It does
> mean you need to use the same syntax when doing the queries, but it
> works with modifying any internal code at all...
>
> Given you can use rowtypes more easily these days, it's quite possible
> you use build an operator class on a row type...
>
> Have a nice day,
> --
> Martijn van Oosterhout   <klep...@svana.org>  http://svana.org/kleptog/
>
> > From each according to his ability. To each according to his ability to litigate.
>
>

Thanks Martijn.  I will consider that approach.




Re: GiST consistent function, expected arguments; multi-dimensional indexes

From
Martijn van Oosterhout
Date:
On Sun, Jul 01, 2007 at 07:20:08PM -0700, Eric wrote:
>
> >
> > Everything is always passed as a Datum, so yes, it's is determined by
> > the storage clause in CREATE TYPE.
>
> Still not sure what to do in some scenarios.  One example is the gist
> example code for btree (btree_gist).  If you look at the int4 example
> consistent function, it gets an int32 value (param 1).  For other
> data  types, it would get a pointer to a value.  Is the rule anything
> <= 4 bytes it's a value, above that it's a pointer?  See the code
> below...

Why guess. You know the type ID of what you're manipulating, right.
Then the function:

get_typlenbyval(Oid typid, int16 *typlen, bool *typbyval);

Returns the byval flag (true if passed by value, false if passed by
reference) and the typlen field will be either a positive integer,
representing the number of bytes, or negative for variable length.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

I guess you can also get this before writing code from

select typbyval from pg_type where typname='mytype'

...thanks again.



Re: GiST consistent function, expected arguments; multi-dimensional indexes

From
Martijn van Oosterhout
Date:
On Mon, Jul 02, 2007 at 10:44:55AM -0700, Eric wrote:
> I guess you can also get this before writing code from
>
> select typbyval from pg_type where typname='mytype'

Note that the flag might not be constant. For example int8 is not byval
currently whereas it could be on a 64-bit architecture. However,
variable-length values are always byref.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.