Thread: OffsetNumber, picksplit, and GiST
Is the GiST examples I've looked through, in the picksplit functions, I see code that looks roughly like this: bytea *entryvec = (bytea *) PG_GETARG_POINTER(0); OffsetNumber i, maxoff; maxoff = ((VARSIZE(entryvec) - VARHDRSZ) / sizeof(GISTENTRY)) - 1; for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) { ... DatumGetPointer((((GISTENTRY *) (VARDATA(entryvec)))[i].key))... } I'm wondering, since FirstOffsetNumber is 1, what about the 0'th index in entryvec? Is it just not set? Also, is there some rationalle for using OffsetNumber in GIST_SPLITVEC? I see in off.h: /* * OffsetNumber: * * this is a 1-based index into the linp (ItemIdData) array in the * header of each disk page. */ and I don't get the connection with what picksplit is doing (keeping in mind that I don't know what "linp", "ItemIdData" or "disk page" mean). I have a very tenuous understanding of all this, so please bear with me. -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
> Is the GiST examples I've looked through, in the picksplit functions, > I see code that looks roughly like this: > > bytea *entryvec = (bytea *) PG_GETARG_POINTER(0); > OffsetNumber i, maxoff; > > maxoff = ((VARSIZE(entryvec) - VARHDRSZ) / sizeof(GISTENTRY)) - 1; > > for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i)) > { > ... DatumGetPointer((((GISTENTRY *) (VARDATA(entryvec)))[i].key)) ... > } > > I'm wondering, since FirstOffsetNumber is 1, what about the 0'th index > in entryvec? Is it just not set? In backend/access/gist/gist.c I see how entryvec is built: storage = palloc(MAXALIGN(VARHDRSZ) + (*len + 1) * sizeof(GISTENTRY)); entryvec = (bytea *) (storage + MAXALIGN(VARHDRSZ)- VARHDRSZ); decompvec = (bool *) palloc((*len + 1) * sizeof(bool)); VARATT_SIZEP(entryvec) = (*len +1) * sizeof(GISTENTRY) + VARHDRSZ; for (i = 1; i <= *len; i++) { [...] gistdentryinit(giststate, 0, &((GISTENTRY *)VARDATA(entryvec))[i], [...] } So it looks like the 0'th entry indeed is empty. Why? Also, in gist.c the index "i" has type "int". No mention of OffsetNumber anywhere. Any reason for that? Finally, should I be asking this on pgsql-hackers? -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
I've defined this: [...some definitions...] CREATE OPERATOR CLASS sig_ops FOR TYPE sig USING gist AS OPERATOR 7 ~>=, FUNCTION 1 g_sig_consistent (internal, internal,int2), FUNCTION 2 g_sig_union (bytea, internal), FUNCTION 3 g_sig_compress (internal), FUNCTION 4 g_sig_decompress(internal), FUNCTION 5 g_sig_penalty (internal, internal), FUNCTION 6 g_sig_picksplit (bytea, internal), FUNCTION 7 g_sig_same (internal, internal, internal); CREATE TABLE x (z sig); CREATE INDEX xx ON x USING gist (z sig_ops); [...insert 20000 rows into x...] VACUUM ANALYZE x; But I can't seem to get PG to use the index. Shouldn't this "explain verbose" at least attempt to check the cost of using the index? Am I not interpreting the output correctly? What am I doing wrong? # explain verbose # select z from x where z ~>= sig_in('{1,2,3}'::int4[]);; QUERY PLAN -------------------------------------------------------------------------------- { SEQSCAN :startup_cost 0.00 :total_cost2826.01 :rows 24334 :width 8 :qptargetlist ( { TARGETENTRY :resdom { RESDOM :resno 1 :restype 1802511 :restypmod -1 :resname z :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 1 :varattno 2 :vartype 1802511 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno2 } } ) :qpqual ( { EXPR :typeOid 16 :opType op :oper { OPER :opno 1836035 :opid 1802521 :opresulttype 16 :opretset false } :args ( { VAR :varno 1 :varattno 2 :vartype 1802511 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } { EXPR :typeOid 1802511 :opType func :oper { FUNC :funcid 1802514 :funcresulttype 1802511 :funcretset false :funcformat 0 } :args ( { CONST :consttype 1007 :constlen -1 :constbyval false :constisnull false :constvalue 36 [ 36 0 0 0 1 0 0 0 0 0 0 0 23 0 0 0 3 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0 ] } ) } ) } ) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 }Seq Scan on x (cost=0.00..2826.01 rows=24334 width=8) Filter: (z ~>= sig_in('{1,2,3}'::integer[])) (94 rows) -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Itai Zukerman <zukerman@math-hat.com> writes: > [...some definitions...] Never leave out the "unimportant" stuff --- that's usually where you went wrong ;-) My guess is that you didn't supply a restriction selectivity estimator for the ~>= operator. I forget what the default assumptions are with no estimator, but they're probably not optimistic enough to prompt use of an indexscan. > [...insert 20000 rows into x...] > Seq Scan on x (cost=0.00..2826.01 rows=24334 width=8) > Filter: (z ~>= sig_in('{1,2,3}'::integer[])) *How* many rows did you insert? The planner thinks 24334 will be selected here ... one hopes the default restrictivity estimate is less than 1.0, at least ... regards, tom lane
>> [...some definitions...] > > Never leave out the "unimportant" stuff --- that's usually where you > went wrong ;-) OK, I've attached below the SQL I use to generate stuff and an interactive session with the "explain" etc. The table has 40000 rows, I do a VACUUM before doing a select, and (what I really don't understand) the "explain verbose" doesn't mention anything about an index. > My guess is that you didn't supply a restriction selectivity estimator > for the ~>= operator. I'm using contsel and contjoinsel, which I pulled from the rtree_gist example. Anyway, hoping you can help... -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Attachment
> Ah. Your problem is you need to mark sig_in() as IMMUTABLE, or at > least [...] That was it! Thanks! For your enjoyment: test=# select count(*) from x;count --------200000 (1 row) test=# explain analyze select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]); QUERY PLAN ------------------------------------------------------------------------------------------------------Index Scan using xxon x (cost=0.00..4.82 rows=1 width=8) (actual time=5.07..93.15 rows=29 loops=1) Index Cond: (y ~>= '.*****..........................................................'::sig)Totalruntime: 93.31 msec (3 rows) test=# select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]); y ------------------------------------------------------------------.*****.*..*..**.....*.*...*.*.*.....*....................*...*...*******.*..**.*.*.***...*....*.....**..*..*.*...**..**....**....*****..*...**......*.....*.....*....*..*...*...*..............*.*****..*...**.....*....*.....*..*......**....*.......**...*....******........*.**.....*...............**.........*.......*......*****.*.**.*...***.............*...........*.**.*..*..**...*....******.*.*...*..*.****.*.*..**..*....**.*...***.*..*...****.**.******....*..*.*.*.**.*..*....*..................*..*...*.......*******..**..***.*..*...*...*.*.***.*.*...**...***.......*.......*****..................*.*..*.*.....*...**.......*....**..**....******......*.*.**.*.*.*..*.....***.............**......**.....******.*.....*.*...*......*...*.**.......*........***..*.*.....*.******....*...*....***.*.*..*.*.....*..*......*..*.......*.....*********.*..*.*...*...*.*.*.......................*.*....*...*..*****..............*.....................*....*...*...*....*..*.*******.**.*..*....**.*.....*.....*..*.*.*..**....*...**..*...*.*****............**..*..*.***....*...*.**.............*.........*******...*.**.***..*.****.*..**..*....*.*.**.**.*..*........********.*...**..*..*.****..*...*..*.*..******.....*.*..*.**..*....*****.*....*..****.........*....*..*...*............**..*....*********.**.......*..*........*..*.***....*........**...**.....*.******...*...........*.*.....**...**...*...**.....*...*.**...*...*****..............***.**.*...*....*...**.*...***......*..**...******..**....*......**.*................*.*...*.**......*.*.**..*****....*.*.*...*.**..*......*...*..**.**.*...*...**....****...*****..*...*........*....*....*....*....................*...*...*****..**.*........*..*...*..*.*.*..*..*....**.*..***.*......*..*****...*.......................................*...........*..*********........*.......*.......*..*.*...*....*.......*........ (29 rows) test=# drop index xx; DROP INDEX test=# explain analyze select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]); QUERY PLAN -----------------------------------------------------------------------------------------------Seq Scan on x (cost=0.00..22.50rows=1 width=8) (actual time=35.16..2231.98 rows=29 loops=1) Filter: (y ~>= '.*****..........................................................'::sig)Totalruntime: 2232.18 msec (3 rows) -- Itai Zukerman <http://www.math-hat.com/~zukerman/>
Itai Zukerman <zukerman@math-hat.com> writes: >> Never leave out the "unimportant" stuff --- that's usually where you >> went wrong ;-) > OK, I've attached below the SQL I use to generate stuff and an > interactive session with the "explain" etc. Ah. Your problem is you need to mark sig_in() as IMMUTABLE, or at least STABLE, for an expression involving sig_in() to be considered safe to index. If it's in C, more than likely it'd best be marked STRICT too (else you'd better have explicit NULL checks in the function code). Don't forget to look at your other custom functions too, to see if they're correctly marked. regards, tom lane