Re: WIP: SP-GiST, Space-Partitioned GiST - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: WIP: SP-GiST, Space-Partitioned GiST
Date
Msg-id 4EE77EA1.6030503@sigaev.ru
Whole thread Raw
In response to Re: WIP: SP-GiST, Space-Partitioned GiST  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP: SP-GiST, Space-Partitioned GiST
List pgsql-hackers
> I wrote:
>> ... the leaf tuple datatype is hard-wired to be
> After another day's worth of hacking, I now understand the reason for
> the above: when an index is less than a page and an incoming value would
> still fit on the root page, the incoming value is simply dumped into a
> leaf tuple without ever calling any opclass-specific function at all.
Exactly.

> To allow the leaf datatype to be different from the indexed column,
> we'd need at least one more opclass support function, and it's not clear
> that the potential gain is worth any extra complexity.
Agree, all opclasses which I could imagine for sp-gist use the same type.
Without clear example I don't like an idea to add one more support function and 
it could be easily added later as an optional support function as it's already 
done for distance function for GiST


> However, I now have another question: what is the point of the
> allTheSame mechanism?  It seems to add quite a great deal of complexity,
I thought about two options: separate code path in core to support 
a-lot-of-the-same-values with minimal support in support functions and move all 
logic about this case to support functions. Second option is demonstrated in 
k-d-tree implementation, where split axis is contained by each half-plane.
May be it is a simpler solution although it moves responsibility to opclass 
developers.


> one thing, it's giving me fits while attempting to fix the limitation
> on storing long indexed values.  There's no reason why a suffix tree
> representation shouldn't work for long strings, but you have to be
> willing to cap the length of any given inner tuple's prefix to something
I don't see clear interface for now: let we have an empty index and we need to 
insert a long string (more than even several page). So, it's needed to have 
support function to split input value to several ones. I supposed that sp-gist 
is already complex enough for first step to add support for this non very useful 
case.

Of course, for future we have a plans to add support of long values, NULLs/IS 
NULL, knn-search at least.

> I'm also still wondering what your thoughts are on storing null values
> versus full-index-scan capability.  I'm leaning towards getting rid of
> the dead code, but if you have an idea how to remove the limitation,
> maybe we should do that instead.

I didn't have a plan to support NULLs in first stage, because it's not clear for 
me how and where to store them. It seems to me that it should be fully separated 
from normal path, like a linked list of pages with only ItemPointer data 
(similar to leaf data pages in GIN)

I missed that planner will not create qual-free scan, because I thought it's 
still possible with NOT NULL columns. If not, this code could be 
removed/commented/ifdefed.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: [REVIEW] Patch for cursor calling with named parameters
Next
From: Alex Goncharov
Date:
Subject: Re: libpq: PQcmdStatus, PQcmdTuples signatures can be painlessly improved