Re: [HACKERS] How to implement a SP-GiST index as a extension module? - Mailing list pgsql-hackers
From | Connor Wolf |
---|---|
Subject | Re: [HACKERS] How to implement a SP-GiST index as a extension module? |
Date | |
Msg-id | CAAVqP=qMg4bVU9f-EaShcwsMMpHYeQmP4LBzB86HsfOQJ9Xxpw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] How to implement a SP-GiST index as a extension module? (Connor Wolf <connorw@imaginaryindustries.com>) |
Responses |
Re: [HACKERS] How to implement a SP-GiST index as a extension module?
|
List | pgsql-hackers |
Ok, I've managed to get my custom index working.
It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if anyone else needs a fuzzy-image searching system
that can integrate into postgresql..
It should be a pretty good basis for anyone else to use if they want to implement a SP-GiST index too.
Thanks!
On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <connorw@imaginaryindustries.com> wrote:
Never mind, it turns out the issue boiled down to me declaring the wrong prefixType in my config function.TL;DR - PEBKACOn Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries.com> wrote: Ok, I've got everything compiling and it installs properly, but I'm running into problems that I think are either a side-effect of implementing picksplit incorrectly (likely), or a bug in SP-GiST(?).Program received signal SIGSEGV, Segmentation fault.__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159 159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory. (gdb) bt#0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159 #1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>, __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53 #2 memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4, datum=datum@entry=184456929873 96472528) at spgutils.c:587 #3 0x00000000004ee06b in spgFormInnerTuple (state=state@entry=0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8, nodes=nodes@entry=0x13bd340) at spgutils.c:741#4 0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98, state=state@entry=0x7fff327325 e0, current=current@entry=0x7fff32 732020, parent=parent@entry=0x7fff32732040, newLeafTuple=newLeafTuple@entr y=0x13b9f00, level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at spgdoinsert.c:913 #5 0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98, state=state@entry=0x7fff327325 e0, heapPtr=heapPtr@entry=0x12e672 c, datum=12598555199787281, isnull=0 '\000') at spgdoinsert.c:2053#6 0x00000000004ee5cc in spgistBuildCallback (index=index@entry=0x7f2cf9de7f98, htup=htup@entry=0x12e6728, values=values@entry=0x7fff3273 21e0, isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entr y=1 '\001', state=state@entry=0x7fff327325 e0) at spginsert.c:56 #7 0x0000000000534e8d in IndexBuildHeapRangeScan (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@en try=0x7f2cf9de7f98, indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 '\001', anyvisible=anyvisible@entry=0 '\000', start_blockno=start_blockno@en try=0, numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>, callback_state=0x7fff327325e0) at index.c:2609#8 0x0000000000534f52 in IndexBuildHeapScan (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@en try=0x7f2cf9de7f98, indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 '\001', callback=callback@entry=0x4ee5 73 <spgistBuildCallback>, callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182 #9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8, index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140#10 0x0000000000535e55 in index_build (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@en try=0x7f2cf9de7f98, indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0 '\000', isreindex=isreindex@entry=0 '\000') at index.c:2043 #11 0x0000000000536ee8 in index_create (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelationName=indexRelatio nName@entry=0x12dd600 "int8idx_2", indexRelationId=16416, indexRelationId@entry=0, relFileNode=0, indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@en try=0x1390f40, accessMethodObjectId=4000, tableSpaceId=0, collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58, reloptions=0, isprimary=0 '\000',isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413, stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationI d@entry=0, is_alter_table=is_alter_table@entry=0 '\000', check_rights=check_rights@entr y=1 '\001', check_not_in_use=check_not_in_ use@entry=1 '\001', skip_build=0 '\000', quiet=0 '\000') at indexcmds.c:667#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry=0x12dd450, pstmt=pstmt@entry=0x12db108, queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_ UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108, queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );",context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928 #15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108, queryString=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:357#16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80, pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12db200, completionTag=completionTag@en try=0x7fff32732ed0 "") at pquery.c:1178 #17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapsho t@entry=0 '\000', dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200, completionTag=completionTag@en try=0x7fff32732ed0 "") at pquery.c:1324 #18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80, count=count@entry=922337203685 4775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at pquery.c:799 #19 0x000000000077bc12 in exec_simple_query (query_string=query_string@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );") at postgres.c:1120#20 0x000000000077d95c in PostgresMain (argc=<optimized out>, argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression", username=<optimized out>)at postgres.c:4139#21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at postmaster.c:4364#22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at postmaster.c:4036#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755#24 0x00000000007023af in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x12ba7c0) at postmaster.c:1363#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228It's segfaulting when trying to build the inner tuple after the picksplit operation.Adding debugging output to the print function, I see:NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16The first item in my input data file is zero, and if I change it to 1:NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16So pretty clearly, I'm trying to copy from the literal data representation of the data as an address.Following the data, this is the value I'm assigning to out->prefixDatum in my picksplit call. I can confirm this by hard-coding thevalue of out->prefixDatum in my picksplit call to a known value, it shows up as the address in the memcopy call.However, as far as I can tell, I'm assigning it correctly: out->prefixDatum = Int64GetDatum(val);This is similar to how the other spgist implementations work. spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);for example.I think this is the SP-GiST core failing to handle certain types being pass-by-value? I'm not totally certain.As I understand it, the "maybe-pass-by-reference" parameter is a global flag (USE_FLOAT8_BYVAL), but I'd like tokeep that enabled. What's the proper approach for adding support for this in the SP-GiST core?My (somewhat messy) extension module is here, if it's relevant.ConnorOn Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <connorw@imaginaryindustries.com> wrote: EDIT: That's actually exactly how the example I'm working off of works. DERP. The SQL isCREATE TYPE vptree_area AS(center _int4,distance float8);CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS boolean AS'MODULE_PATHNAME','vptree_area_match' LANGUAGE C IMMUTABLE STRICT;CREATE OPERATOR <@ (LEFTARG = _int4,RIGHTARG = vptree_area,PROCEDURE = vptree_area_match,RESTRICT = contsel,JOIN = contjoinsel);so I just need to understand how to parse out the custom type in my index operator.You can see the implementation of vptree_area_match function located in vptree.c. It just calls GetAttributeByNum() function.
There is also alternative approach for that implemented in pg_trgm contrib module. It has "text % text" operator which checks if two strings are similar enough. The similarity threshold is defined by pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with threshold distance value. However, it would place some limitations. For instance, you wouldn't be able to use different distance threshold in the same query.------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
pgsql-hackers by date: