Re: [HACKERS] How to implement a SP-GiST index as a extension module? - Mailing list pgsql-hackers

From Connor Wolf
Subject Re: [HACKERS] How to implement a SP-GiST index as a extension module?
Date
Msg-id CAAVqP=qMg4bVU9f-EaShcwsMMpHYeQmP4LBzB86HsfOQJ9Xxpw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] How to implement a SP-GiST index as a extension module?  (Connor Wolf <connorw@imaginaryindustries.com>)
Responses Re: [HACKERS] How to implement a SP-GiST index as a extension module?  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
Ok, I've managed to get my custom index working. 

It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if anyone else needs a fuzzy-image searching system 
that can integrate into postgresql..

It should be a pretty good basis for anyone else to use if they want to implement a SP-GiST index too. 

Thanks!

On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <connorw@imaginaryindustries.com> wrote:
Never mind, it turns out the issue boiled down to me declaring the wrong prefixType in my config function.

TL;DR - PEBKAC

On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries.com> wrote:
Ok, I've got everything compiling and it installs properly, but I'm running into problems that I think are either a side-effect of implementing picksplit incorrectly (likely), or a bug in SP-GiST(?).

Program received signal SIGSEGV, Segmentation fault.
__memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
159     ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory.
(gdb) bt
#0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:159
#1  0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>, __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
#2  memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4, datum=datum@entry=18445692987396472528) at spgutils.c:587
#3  0x00000000004ee06b in spgFormInnerTuple (state=state@entry=0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528, nNodes=8,
    nodes=nodes@entry=0x13bd340) at spgutils.c:741
#4  0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98, state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020,
    parent=parent@entry=0x7fff32732040, newLeafTuple=newLeafTuple@entry=0x13b9f00, level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at spgdoinsert.c:913
#5  0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98, state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c, datum=12598555199787281,
    isnull=0 '\000') at spgdoinsert.c:2053
#6  0x00000000004ee5cc in spgistBuildCallback (index=index@entry=0x7f2cf9de7f98, htup=htup@entry=0x12e6728, values=values@entry=0x7fff327321e0,
    isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1 '\001', state=state@entry=0x7fff327325e0) at spginsert.c:56
#7  0x0000000000534e8d in IndexBuildHeapRangeScan (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@entry=0x7f2cf9de7f98,
    indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 '\001', anyvisible=anyvisible@entry=0 '\000', start_blockno=start_blockno@entry=0,
    numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>, callback_state=0x7fff327325e0) at index.c:2609
#8  0x0000000000534f52 in IndexBuildHeapScan (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@entry=0x7f2cf9de7f98,
    indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 '\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>,
    callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182
#9  0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8, index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
#10 0x0000000000535e55 in index_build (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelation=indexRelation@entry=0x7f2cf9de7f98,
    indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0 '\000', isreindex=isreindex@entry=0 '\000') at index.c:2043
#11 0x0000000000536ee8 in index_create (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2",
    indexRelationId=16416, indexRelationId@entry=0, relFileNode=0, indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@entry=0x1390f40,
    accessMethodObjectId=4000, tableSpaceId=0, collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58, reloptions=0, isprimary=0 '\000',
    isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
    is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
#12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413, stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0,
    is_alter_table=is_alter_table@entry=0 '\000', check_rights=check_rights@entry=1 '\001', check_not_in_use=check_not_in_use@entry=1 '\001', skip_build=0 '\000',
    quiet=0 '\000') at indexcmds.c:667
#13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry=0x12dd450, pstmt=pstmt@entry=0x12db108,
    queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_UTILITY_TOPLEVEL,
    params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326
#14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108, queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
#15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108, queryString=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL,
    params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:357
#16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80, pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001',
    setHoldSnapshot=setHoldSnapshot@entry=0 '\000', dest=dest@entry=0x12db200, completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178
#17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80, isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 '\000',
    dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200, completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324
#18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001',
    run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at pquery.c:799
#19 0x000000000077bc12 in exec_simple_query (query_string=query_string@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
    at postgres.c:1120
#20 0x000000000077d95c in PostgresMain (argc=<optimized out>, argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression", username=<optimized out>)
    at postgres.c:4139
#21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at postmaster.c:4364
#22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at postmaster.c:4036
#23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
#24 0x00000000007023af in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x12ba7c0) at postmaster.c:1363
#25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228



It's segfaulting when trying to build the inner tuple after the picksplit operation.

Adding debugging output to the print function, I see: 

NOTICE:  Memcopying from 0000000000000000 to 00000000013d7938 with len 16

The first item in my input data file is zero, and if I change it to 1:

NOTICE:  Memcopying from 0000000000000001 to 0000000001b45938 with len 16

So pretty clearly, I'm trying to copy from the literal data representation of the data as an address. 
Following the data, this is the value I'm assigning to out->prefixDatum in my picksplit call. I can confirm this by hard-coding the
value of out->prefixDatum in my picksplit call to a known value, it shows up as the address in the memcopy call.

However, as far as I can tell, I'm assigning it correctly:  out->prefixDatum = Int64GetDatum(val);

This is similar to how the other spgist implementations work. spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
for example.

I think this is the SP-GiST core failing to handle certain types being pass-by-value? I'm not totally certain.

As I understand it, the "maybe-pass-by-reference" parameter is a global flag (USE_FLOAT8_BYVAL), but I'd like to 
keep that enabled. What's the proper approach for adding support for this in the SP-GiST core?

My (somewhat messy) extension module is here, if it's relevant.

Connor




On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <connorw@imaginaryindustries.com> wrote:
EDIT: That's actually exactly how the example I'm working off of works. DERP. The SQL is 

CREATE TYPE vptree_area AS
(
    center _int4,
    distance float8
);

CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS boolean AS
'MODULE_PATHNAME','vptree_area_match'
LANGUAGE C IMMUTABLE STRICT;

CREATE OPERATOR <@ (
LEFTARG = _int4,
RIGHTARG = vptree_area, 
PROCEDURE = vptree_area_match,
RESTRICT = contsel,
JOIN = contjoinsel);

so I just need to understand how to parse out the custom type in my index operator.

You can see the implementation of vptree_area_match function located in vptree.c.  It just calls GetAttributeByNum() function.

There is also alternative approach for that implemented in pg_trgm contrib module.  It has "text % text" operator which checks if two strings are similar enough.  The similarity threshold is defined by pg_trgm.similarity_threshold GUC.  Thus, you can also define GUC with threshold distance value.  However, it would place some limitations.  For instance, you wouldn't be able to use different distance threshold in the same query.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: [HACKERS] Incorrect comment for build_child_join_rel
Next
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages