Thread: GiST Comparing IndexTuples/Datums

GiST Comparing IndexTuples/Datums

From

"Matthew Campbell"

Date:

12 February 2007, 16:11:48

Hey folks:<br /><br />    I posted this to the pgsql-novice mailing list but was told that it'd probably be better to
reposthere.  I've been working with a group trying to implement UNIQUE index functionality in GiST (we started with
hash,and have branched out to try and understand some of the other indexing structures as well).  We installed the
btree-gistcontrib module and have gist indexes working fine and understanding when it's supposed to be unique
(regardlessof which contrib module is installed).  We currently can walk over all the IndexTuples in a page and we were
hopingto compare the IndexTuple passed into the gistdoinsert() function (the new itup being added to the index) to each
ofthe IndexTuples in said page.  The idea we've been playing with right now is using 'datum = index_getattr(itup, 1,
RelationGetDescr(r),&isnull);' as its used in hashinsert.c, but we can't figure out how to compare the two datums
then. The actual data type of the value being inserted would be different depending on the type of column you created
theindex on.  Since you provide an opclass when creating a gist index, are we supposed to use one of the user defined
functionsto compare items?  (One of them is 'same', so in btree-gist, the function is gbt_text_same() for indexes on
charand text columns)  Is there a general way to access these functions without knowing specifically what contrib
moduleis specified, something like OpClassFunction7()?  Thanks so much for humoring me, and I apologize as I get myself
familiarwith PostgreSQL and database concepts in general. <br /><br /><br />-Matt

Re: GiST Comparing IndexTuples/Datums

From

Tom Lane

Date:

13 February 2007, 00:55:17

"Matthew Campbell" <mtthw.cmpbll@gmail.com> writes:
> I've been working with a group trying to
> implement UNIQUE index functionality in GiST
> ... but we can't figure out how to
> compare the two datums then.  The actual data type of the value being
> inserted would be different depending on the type of column you created the
> index on.  Since you provide an opclass when creating a gist index, are we
> supposed to use one of the user defined functions to compare items?

Yes; otherwise you have no datatype independence, not to mention that
different opclasses could legitimately have different definitions of
equality for the same datatype.  (Ye olde standard example is a
complex-number datatype with a sort-by-real-part opclass and a
sort-by-absolute-value opclass.)

I think the missing link here is that you'd need some convention for
having the opclass identify which of its functions/operators is an
equality check --- or perhaps even more to the point, having a way for
it to identify whether it supports equality at all.  It doesn't seem
unreasonable to me to legislate that if a GiST opclass supports unique
indexes, then it must use operator number so-and-so for equality.  But
there are lots of GiST opclasses that don't include equality at all; we
can't break that case.
        regards, tom lane

Re: GiST Comparing IndexTuples/Datums

From

Teodor Sigaev

Date:

13 February 2007, 03:54:07

> indexes, then it must use operator number so-and-so for equality.  But
> there are lots of GiST opclasses that don't include equality at all; we
> can't break that case.

There is a GiST support function for equality of keys, in btree_gist it's named 
as gbt_*_same. Equality function has support number 7 and is used for stored keys.

But the real issue in unique GiST index is unique :). First, the algorithm of 
insertion doesn't compare indexed keys on leaf page at all. Values on the same 
page are compared only when page is splitting (picksplit support method). 
Second, GiST implementation supports only unordered trees (btree_gist is a some 
kind of emulation) and it cannot guarantee that equal keys will be close in 
index. That's related to picksplit and gistpenalty method problem/optimization 
and data set.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: GiST Comparing IndexTuples/Datums

From

"Matthew Campbell"

Date:

17 February 2007, 15:28:35

Good news:

I think we've got GiST working (somewhat anyways), as we found gistKeyIsEQ(giststate, 0, datum, currdatum) in gistutil.c does the trick of comparing two datums. I swear most of our trouble is just finding our way around the postgres codebase, but we're getting there little by little. We've gone back to revisit hash to see if we can figure it out now that we understand a little bit about GiST, but we can't find an equivelent function in hash for the KeyIsEQ().
So two questions really. The first is if such a function exists for hash. The second is that nbtree and Neil Conways work a few years ago ( http://archives.postgresql.org/pgsql-patches/2003-09/msg00252.php) use the scan and scankey stuff, but we're having trouble understanding how these work. Is there some documentation on using these correctly (outside of just looking at nbtree code)? Thanks so much for the help folks!

-Matt

On 2/13/07, Teodor Sigaev <teodor@sigaev.ru> wrote:

> indexes, then it must use operator number so-and-so for equality.  But
> there are lots of GiST opclasses that don't include equality at all; we
> can't break that case.

There is a GiST support function for equality of keys, in btree_gist it's named
as gbt_*_same. Equality function has support number 7 and is used for stored keys.

But the real issue in unique GiST index is unique :). First, the algorithm of
insertion doesn't compare indexed keys on leaf page at all. Values on the same
page are compared only when page is splitting (picksplit support method).
Second, GiST implementation supports only unordered trees (btree_gist is a some
kind of emulation) and it cannot guarantee that equal keys will be close in
index. That's related to picksplit and gistpenalty method problem/optimization
and data set.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: GiST Comparing IndexTuples/Datums

From

Tom Lane

Date:

17 February 2007, 16:24:53

"Matthew Campbell" <mtthw.cmpbll@gmail.com> writes:
> revisit hash to see if we can figure it out now that we understand a little
> bit about GiST, but we can't find an equivelent function in hash for the
> KeyIsEQ().
>     So two questions really.  The first is if such a function exists for
> hash.

hash indexes don't have any need to compare two entries so you're
unlikely to find a bit of code that does exactly that, but the
HTEqualStrategyNumber member of the index's operator class is the
relevant equality operator, so it's surely possible to invoke it.
You could just do that directly instead of bothering with scankeys.
The only reason _bt_check_unique uses a scankey is that that's the
information it's already got, because btree generates a scankey for use
in searching the index for the correct insertion position.  There's
no comparable need in hash and hence no infrastructure for it.

It'd go something like
Oid    cmp_op = indexrel->rd_operator[HTEqualStrategyNumber-1];
RegProcedure cmp_proc = get_opcode(cmp_op);
if (!RegProcedureIsValid(cmp_proc))    elog...
result = DatumGetBool(OidFunctionCall2(cmp_proc, datum1, datum2));

although you'd probably want to avoid looking up the function again
for each index entry, so plan on an fmgr_info call in there.
        regards, tom lane