Thread: GiST Comparing IndexTuples/Datums
Hey folks:<br /><br /> I posted this to the pgsql-novice mailing list but was told that it'd probably be better to reposthere. I've been working with a group trying to implement UNIQUE index functionality in GiST (we started with hash,and have branched out to try and understand some of the other indexing structures as well). We installed the btree-gistcontrib module and have gist indexes working fine and understanding when it's supposed to be unique (regardlessof which contrib module is installed). We currently can walk over all the IndexTuples in a page and we were hopingto compare the IndexTuple passed into the gistdoinsert() function (the new itup being added to the index) to each ofthe IndexTuples in said page. The idea we've been playing with right now is using 'datum = index_getattr(itup, 1, RelationGetDescr(r),&isnull);' as its used in hashinsert.c, but we can't figure out how to compare the two datums then. The actual data type of the value being inserted would be different depending on the type of column you created theindex on. Since you provide an opclass when creating a gist index, are we supposed to use one of the user defined functionsto compare items? (One of them is 'same', so in btree-gist, the function is gbt_text_same() for indexes on charand text columns) Is there a general way to access these functions without knowing specifically what contrib moduleis specified, something like OpClassFunction7()? Thanks so much for humoring me, and I apologize as I get myself familiarwith PostgreSQL and database concepts in general. <br /><br /><br />-Matt
"Matthew Campbell" <mtthw.cmpbll@gmail.com> writes: > I've been working with a group trying to > implement UNIQUE index functionality in GiST > ... but we can't figure out how to > compare the two datums then. The actual data type of the value being > inserted would be different depending on the type of column you created the > index on. Since you provide an opclass when creating a gist index, are we > supposed to use one of the user defined functions to compare items? Yes; otherwise you have no datatype independence, not to mention that different opclasses could legitimately have different definitions of equality for the same datatype. (Ye olde standard example is a complex-number datatype with a sort-by-real-part opclass and a sort-by-absolute-value opclass.) I think the missing link here is that you'd need some convention for having the opclass identify which of its functions/operators is an equality check --- or perhaps even more to the point, having a way for it to identify whether it supports equality at all. It doesn't seem unreasonable to me to legislate that if a GiST opclass supports unique indexes, then it must use operator number so-and-so for equality. But there are lots of GiST opclasses that don't include equality at all; we can't break that case. regards, tom lane
> indexes, then it must use operator number so-and-so for equality. But > there are lots of GiST opclasses that don't include equality at all; we > can't break that case. There is a GiST support function for equality of keys, in btree_gist it's named as gbt_*_same. Equality function has support number 7 and is used for stored keys. But the real issue in unique GiST index is unique :). First, the algorithm of insertion doesn't compare indexed keys on leaf page at all. Values on the same page are compared only when page is splitting (picksplit support method). Second, GiST implementation supports only unordered trees (btree_gist is a some kind of emulation) and it cannot guarantee that equal keys will be close in index. That's related to picksplit and gistpenalty method problem/optimization and data set. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Good news:
I think we've got GiST working (somewhat anyways), as we found gistKeyIsEQ(giststate, 0, datum, currdatum) in gistutil.c does the trick of comparing two datums. I swear most of our trouble is just finding our way around the postgres codebase, but we're getting there little by little. We've gone back to revisit hash to see if we can figure it out now that we understand a little bit about GiST, but we can't find an equivelent function in hash for the KeyIsEQ().
So two questions really. The first is if such a function exists for hash. The second is that nbtree and Neil Conways work a few years ago ( http://archives.postgresql.org/pgsql-patches/2003-09/msg00252.php) use the scan and scankey stuff, but we're having trouble understanding how these work. Is there some documentation on using these correctly (outside of just looking at nbtree code)? Thanks so much for the help folks!
-Matt
I think we've got GiST working (somewhat anyways), as we found gistKeyIsEQ(giststate, 0, datum, currdatum) in gistutil.c does the trick of comparing two datums. I swear most of our trouble is just finding our way around the postgres codebase, but we're getting there little by little. We've gone back to revisit hash to see if we can figure it out now that we understand a little bit about GiST, but we can't find an equivelent function in hash for the KeyIsEQ().
So two questions really. The first is if such a function exists for hash. The second is that nbtree and Neil Conways work a few years ago ( http://archives.postgresql.org/pgsql-patches/2003-09/msg00252.php) use the scan and scankey stuff, but we're having trouble understanding how these work. Is there some documentation on using these correctly (outside of just looking at nbtree code)? Thanks so much for the help folks!
-Matt
On 2/13/07, Teodor Sigaev <teodor@sigaev.ru> wrote:
> indexes, then it must use operator number so-and-so for equality. But
> there are lots of GiST opclasses that don't include equality at all; we
> can't break that case.
There is a GiST support function for equality of keys, in btree_gist it's named
as gbt_*_same. Equality function has support number 7 and is used for stored keys.
But the real issue in unique GiST index is unique :). First, the algorithm of
insertion doesn't compare indexed keys on leaf page at all. Values on the same
page are compared only when page is splitting (picksplit support method).
Second, GiST implementation supports only unordered trees (btree_gist is a some
kind of emulation) and it cannot guarantee that equal keys will be close in
index. That's related to picksplit and gistpenalty method problem/optimization
and data set.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
"Matthew Campbell" <mtthw.cmpbll@gmail.com> writes: > revisit hash to see if we can figure it out now that we understand a little > bit about GiST, but we can't find an equivelent function in hash for the > KeyIsEQ(). > So two questions really. The first is if such a function exists for > hash. hash indexes don't have any need to compare two entries so you're unlikely to find a bit of code that does exactly that, but the HTEqualStrategyNumber member of the index's operator class is the relevant equality operator, so it's surely possible to invoke it. You could just do that directly instead of bothering with scankeys. The only reason _bt_check_unique uses a scankey is that that's the information it's already got, because btree generates a scankey for use in searching the index for the correct insertion position. There's no comparable need in hash and hence no infrastructure for it. It'd go something like Oid cmp_op = indexrel->rd_operator[HTEqualStrategyNumber-1]; RegProcedure cmp_proc = get_opcode(cmp_op); if (!RegProcedureIsValid(cmp_proc)) elog... result = DatumGetBool(OidFunctionCall2(cmp_proc, datum1, datum2)); although you'd probably want to avoid looking up the function again for each index entry, so plan on an fmgr_info call in there. regards, tom lane