Thread: OffsetNumber, picksplit, and GiST

OffsetNumber, picksplit, and GiST

From
Itai Zukerman
Date:
Is the GiST examples I've looked through, in the picksplit functions,
I see code that looks roughly like this:
 bytea *entryvec = (bytea *) PG_GETARG_POINTER(0); OffsetNumber i, maxoff;
 maxoff = ((VARSIZE(entryvec) - VARHDRSZ) / sizeof(GISTENTRY)) - 1;
 for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))   {     ... DatumGetPointer((((GISTENTRY *)
(VARDATA(entryvec)))[i].key))...   }
 

I'm wondering, since FirstOffsetNumber is 1, what about the 0'th index
in entryvec?  Is it just not set?

Also, is there some rationalle for using OffsetNumber in
GIST_SPLITVEC?  I see in off.h:
 /*  * OffsetNumber:  *  * this is a 1-based index into the linp (ItemIdData) array in the  * header of each disk page.
*/
 

and I don't get the connection with what picksplit is doing (keeping
in mind that I don't know what "linp", "ItemIdData" or "disk page"
mean).

I have a very tenuous understanding of all this, so please bear with me.

-- 
Itai Zukerman  <http://www.math-hat.com/~zukerman/>


Re: OffsetNumber, picksplit, and GiST

From
Itai Zukerman
Date:
> Is the GiST examples I've looked through, in the picksplit functions,
> I see code that looks roughly like this:
>
>   bytea *entryvec = (bytea *) PG_GETARG_POINTER(0);
>   OffsetNumber i, maxoff;
>
>   maxoff = ((VARSIZE(entryvec) - VARHDRSZ) / sizeof(GISTENTRY)) - 1;
>
>   for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
>     {
>       ... DatumGetPointer((((GISTENTRY *) (VARDATA(entryvec)))[i].key)) ...
>     }
>
> I'm wondering, since FirstOffsetNumber is 1, what about the 0'th index
> in entryvec?  Is it just not set?

In backend/access/gist/gist.c I see how entryvec is built:
  storage = palloc(MAXALIGN(VARHDRSZ) + (*len + 1) * sizeof(GISTENTRY));  entryvec = (bytea *) (storage +
MAXALIGN(VARHDRSZ)- VARHDRSZ);  decompvec = (bool *) palloc((*len + 1) * sizeof(bool));  VARATT_SIZEP(entryvec) = (*len
+1) * sizeof(GISTENTRY) + VARHDRSZ;  for (i = 1; i <= *len; i++)  {    [...] gistdentryinit(giststate, 0, &((GISTENTRY
*)VARDATA(entryvec))[i], [...]  }
 

So it looks like the 0'th entry indeed is empty.  Why?

Also, in gist.c the index "i" has type "int".  No mention of
OffsetNumber anywhere.  Any reason for that?

Finally, should I be asking this on pgsql-hackers?

-- 
Itai Zukerman  <http://www.math-hat.com/~zukerman/>


GiST, Not Using Index

From
Itai Zukerman
Date:
I've defined this:
 [...some definitions...]
 CREATE OPERATOR CLASS sig_ops FOR TYPE sig USING gist AS   OPERATOR 7 ~>=,   FUNCTION 1 g_sig_consistent (internal,
internal,int2),   FUNCTION 2 g_sig_union (bytea, internal),   FUNCTION 3 g_sig_compress (internal),   FUNCTION 4
g_sig_decompress(internal),   FUNCTION 5 g_sig_penalty (internal, internal),   FUNCTION 6 g_sig_picksplit (bytea,
internal),  FUNCTION 7 g_sig_same (internal, internal, internal);
 
 CREATE TABLE x (z sig); CREATE INDEX xx ON x USING gist (z sig_ops);
 [...insert 20000 rows into x...]
 VACUUM ANALYZE x;

But I can't seem to get PG to use the index.  Shouldn't this "explain
verbose" at least attempt to check the cost of using the index?  Am I
not interpreting the output correctly?  What am I doing wrong?

# explain verbose
# select z from x where z ~>= sig_in('{1,2,3}'::int4[]);;
                                  QUERY PLAN                                   
--------------------------------------------------------------------------------   { SEQSCAN    :startup_cost 0.00
:total_cost2826.01    :rows 24334    :width 8    :qptargetlist (      { TARGETENTRY       :resdom          { RESDOM
    :resno 1          :restype 1802511          :restypmod -1          :resname z          :reskey 0          :reskeyop
0         :ressortgroupref 0          :resjunk false          }             :expr          { VAR          :varno 1
   :varattno 2          :vartype 1802511          :vartypmod -1           :varlevelsup 0          :varnoold 1
:varoattno2         }      }   )       :qpqual (      { EXPR       :typeOid 16        :opType op       :oper          {
OPER         :opno 1836035          :opid 1802521          :opresulttype 16          :opretset false          }
   :args (         { VAR          :varno 1          :varattno 2          :vartype 1802511          :vartypmod -1
  :varlevelsup 0          :varnoold 1          :varoattno 2         }                { EXPR          :typeOid 1802511
       :opType func          :oper             { FUNC             :funcid 1802514             :funcresulttype 1802511
         :funcretset false             :funcformat 0             }                   :args (            { CONST
   :consttype 1007             :constlen -1             :constbyval false             :constisnull false
:constvalue 36 [ 36 0 0 0 1 0 0 0 0 0 0 0 23 0 0 0 3 0 0 0 1 0 0             0 1 0 0 0 2 0 0 0 3 0 0 0 ]             }
      )         }      )      }   )       :lefttree <>    :righttree <>    :extprm ()       :locprm ()       :initplan
<>   :nprm 0     :scanrelid 1    }Seq Scan on x  (cost=0.00..2826.01 rows=24334 width=8)  Filter: (z ~>=
sig_in('{1,2,3}'::integer[]))
(94 rows)

-- 
Itai Zukerman  <http://www.math-hat.com/~zukerman/>


Re: GiST, Not Using Index

From
Tom Lane
Date:
Itai Zukerman <zukerman@math-hat.com> writes:
>   [...some definitions...]

Never leave out the "unimportant" stuff --- that's usually where you
went wrong ;-)

My guess is that you didn't supply a restriction selectivity estimator
for the ~>= operator.  I forget what the default assumptions are with
no estimator, but they're probably not optimistic enough to prompt use
of an indexscan.

>   [...insert 20000 rows into x...]
>  Seq Scan on x  (cost=0.00..2826.01 rows=24334 width=8)
>    Filter: (z ~>= sig_in('{1,2,3}'::integer[]))

*How* many rows did you insert?  The planner thinks 24334 will be
selected here ... one hopes the default restrictivity estimate is
less than 1.0, at least ...
        regards, tom lane


Re: GiST, Not Using Index

From
Itai Zukerman
Date:
>>   [...some definitions...]
>
> Never leave out the "unimportant" stuff --- that's usually where you
> went wrong ;-)

OK, I've attached below the SQL I use to generate stuff and an
interactive session with the "explain" etc.  The table has 40000 rows,
I do a VACUUM before doing a select, and (what I really don't
understand) the "explain verbose" doesn't mention anything about an index.

> My guess is that you didn't supply a restriction selectivity estimator
> for the ~>= operator.

I'm using contsel and contjoinsel, which I pulled from the rtree_gist
example.

Anyway, hoping you can help...


--
Itai Zukerman  <http://www.math-hat.com/~zukerman/>

Attachment

Re: GiST, Not Using Index

From
Itai Zukerman
Date:
> Ah.  Your problem is you need to mark sig_in() as IMMUTABLE, or at
> least [...]

That was it!  Thanks!

For your enjoyment:

test=# select count(*) from x;count  
--------200000
(1 row)

test=# explain analyze select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]);
     QUERY PLAN                                              
 
------------------------------------------------------------------------------------------------------Index Scan using
xxon x  (cost=0.00..4.82 rows=1 width=8) (actual time=5.07..93.15 rows=29 loops=1)  Index Cond: (y ~>=
'.*****..........................................................'::sig)Totalruntime: 93.31 msec
 
(3 rows)

test=# select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]);                               y
         
 

------------------------------------------------------------------.*****.*..*..**.....*.*...*.*.*.....*....................*...*...*******.*..**.*.*.***...*....*.....**..*..*.*...**..**....**....*****..*...**......*.....*.....*....*..*...*...*..............*.*****..*...**.....*....*.....*..*......**....*.......**...*....******........*.**.....*...............**.........*.......*......*****.*.**.*...***.............*...........*.**.*..*..**...*....******.*.*...*..*.****.*.*..**..*....**.*...***.*..*...****.**.******....*..*.*.*.**.*..*....*..................*..*...*.......*******..**..***.*..*...*...*.*.***.*.*...**...***.......*.......*****..................*.*..*.*.....*...**.......*....**..**....******......*.*.**.*.*.*..*.....***.............**......**.....******.*.....*.*...*......*...*.**.......*........***..*.*.....*.******....*...*....***.*.*..*.*.....*..*......*..*.......*.....*********.*..*.*...*...*.*.*.......................*.*....*...*..*****..............*.....................*....*...*...*....*..*.*******.**.*..*....**.*.....*.....*..*.*.*..**....*...**..*...*.*****............**..*..*.***....*...*.**.............*.........*******...*.**.***..*.****.*..**..*....*.*.**.**.*..*........********.*...**..*..*.****..*...*..*.*..******.....*.*..*.**..*....*****.*....*..****.........*....*..*...*............**..*....*********.**.......*..*........*..*.***....*........**...**.....*.******...*...........*.*.....**...**...*...**.....*...*.**...*...*****..............***.**.*...*....*...**.*...***......*..**...******..**....*......**.*................*.*...*.**......*.*.**..*****....*.*.*...*.**..*......*...*..**.**.*...*...**....****...*****..*...*........*....*....*....*....................*...*...*****..**.*........*..*...*..*.*.*..*..*....**.*..***.*......*..*****...*.......................................*...........*..*********........*.......*.......*..*.*...*....*.......*........
(29 rows)

test=# drop index xx;
DROP INDEX
test=# explain analyze select * from x where y ~>= sig_in('{1,2,3,4,5}'::int4[]);
 QUERY PLAN                                           
 
-----------------------------------------------------------------------------------------------Seq Scan on x
(cost=0.00..22.50rows=1 width=8) (actual time=35.16..2231.98 rows=29 loops=1)  Filter: (y ~>=
'.*****..........................................................'::sig)Totalruntime: 2232.18 msec
 
(3 rows)

-- 
Itai Zukerman  <http://www.math-hat.com/~zukerman/>


Re: GiST, Not Using Index

From
Tom Lane
Date:
Itai Zukerman <zukerman@math-hat.com> writes:
>> Never leave out the "unimportant" stuff --- that's usually where you
>> went wrong ;-)

> OK, I've attached below the SQL I use to generate stuff and an
> interactive session with the "explain" etc.

Ah.  Your problem is you need to mark sig_in() as IMMUTABLE, or at least
STABLE, for an expression involving sig_in() to be considered safe to
index.  If it's in C, more than likely it'd best be marked STRICT too
(else you'd better have explicit NULL checks in the function code).

Don't forget to look at your other custom functions too, to see if
they're correctly marked.
        regards, tom lane