Thread: GSoC proposal. Index-only scans for GIST
Project name
Support for index-only scans for GIST index
Brief review
Currently GiST index don’t have index-only scan functionality. Index-only scans are a major performance feature added to Postgres 9.2. They allow certain types of queries to be satisfied just by retrieving data from indexes, and not from tables. This feature for B-trees (implemented since PostgreSQL-9.2) allows doing certain types of queries significantly faster. This is achieved by reduction in the amount of I/O necessary to satisfy queries. I think it will be useful to implement this feature for user defined data types that use GiST index.
Benefits to the PostgreSQL Community
Faster GiST index search would be actual for many PostgreSQL applications (for example some GIS systems).
Quantifiable results
Acceleration of GiST index search.
Project details
1. The GiST is a balanced tree structure like a B-tree, containing <key, pointer> pairs. GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries.
There are seven methods that an index operator class for GiST must provide, and an eighth that is optional.
-consistent
-union
-compress
-decompress
-penalty
-picksplit
-equal
-distance (optional)
I’m going to create new optional method fetch() in addition to existing. Thus user can create a method of retrieving data from the index without loss. It will be used when performing search queries to speed data retrieving.
2. gistget algorithm (several parts omitted):
Check the key
gistindex_keytest() – does this index tuple satisfy the scan key(s)?
Scan all items on the GiST index page and insert them into the queue (or directly to output areas)
plain scan
Heap tuple TIDs are returned into so->pageData[]
ordered scan
Heap tuple TIDs are pushed into individual search queue items
If the fetch() is specified by the developer, then using it, algorithm can retrieve the data directly to output areas at this stage, without reference to the heap.
3. Create function gistcanreturn to check whether fetch() is specified by user.
Amcanreturn – Function to check whether index supports index-only scans, or zero if none
There is the question of support index-only scans for multicolumn indexes. Probably it would require extend the interface to add separate columns checking.
To solve this question I need the community’s help.
4. Add details for index only scans into gistcostestimate function
Links
1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search trees for database systems. – September, 1995.
2) http://www.sai.msu.su/~megera/postgres/gist/
3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST Indexes.
--
On Tue, Mar 18, 2014 at 9:12 AM, Anastasia Lubennikova <lubennikovaav@gmail.com> wrote: > Support for index-only scans for GIST index This is a cool idea, if it can be made to work. > If the fetch() is specified by the developer, then using it, algorithm can > retrieve the data directly to output areas at this stage, without reference > to the heap. This seems to be the crux of your proposal, but it seems vague: what exactly do you mean by "retrieve the data directly to output areas"? What data are you going to retrieve and where are you going to put it? Another question to consider is: which operator classes do you anticipate that this will work for and which ones do you anticipate that it will not work for? Any operator class that lossifies that input data before storing it in the index is presumably doomed, but which ones do that, and which do not? Tom Lane previously proposed extending SP-GiST to support index-only scans. You might find that discussing worth reading, or perhaps consider it as an alternative if GiST doesn't work out: http://www.postgresql.org/message-id/10839.1323885644@sss.pgh.pa.us -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Tom Lane previously proposed extending SP-GiST to support index-only > scans. You might find that discussing worth reading, or perhaps > consider it as an alternative if GiST doesn't work out: > http://www.postgresql.org/message-id/10839.1323885644@sss.pgh.pa.us That wasn't just a proposal, see commits 3695a555136a6d179cac8ae48d5f90171d5b30e9 and 92203624934095163f8b57b5b3d7bbd2645da2c8. But yeah, that might be a useful reference for what is likely to be involved with making GIST do it. regards, tom lane
Alexander, Can you comment on the below proposal? I'd like your opinion on how difficult it will be. On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote: > Hello! > Here is the text of my proposal which I've applied to GSoC. > (and link > http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120) > Any suggestions and comments are welcome. > > *Project name* > > Support for index-only scans for GIST index > > > > *Brief review* > > Currently GiST index don't have index-only scan functionality. Index-only > scans are a major performance feature added to Postgres 9.2. They allow > certain types of queries to be satisfied just by retrieving data from > indexes, and not from tables. This feature for B-trees (implemented since > PostgreSQL-9.2) allows doing certain types of queries significantly faster. > This is achieved by reduction in the amount of I/O necessary to satisfy > queries. I think it will be useful to implement this feature for user > defined data types that use GiST index. > > > > *Benefits to the PostgreSQL Community* > > Faster GiST index search would be actual for many PostgreSQL applications > (for example some GIS systems). > > > *Quantifiable results* > > Acceleration of GiST index search. > > > *Project details* > > 1. The GiST is a balanced tree structure like a B-tree, containing <key, > pointer> pairs. GiST key is a member of a user-defined class, and > represents some property that is true of all data items reachable from the > pointer associated with the key. The GiST provides a possibility to create > custom data types with indexed access methods and extensible set of queries. > > There are seven methods that an index operator class for GiST must provide, > and an eighth that is optional. > > -consistent > > -union > > -compress > > -decompress > > -penalty > > -picksplit > > -equal > > -distance (optional) > > I'm going to create new optional method fetch() in addition to existing. > Thus user can create a method of retrieving data from the index without > loss. It will be used when performing search queries to speed data > retrieving. > > > > 2. gistget algorithm (several parts omitted): > > Check the key > gistindex_keytest() - does this index tuple satisfy the scan key(s)? > > Scan all items on the GiST index page and insert them into the queue (or > directly to output areas) > > plain scan > > Heap tuple TIDs are returned into so->pageData[] > > ordered scan > > Heap tuple TIDs are pushed into individual search queue items > > If the fetch() is specified by the developer, then using it, algorithm can > retrieve the data directly to output areas at this stage, without reference > to the heap. > > > 3. Create function gistcanreturn to check whether fetch() is specified by > user. > > Amcanreturn - Function to check whether index supports index-only scans, or > zero if none > > There is the question of support index-only scans for multicolumn indexes. > Probably it would require extend the interface to add separate columns > checking. > > To solve this question I need the community's help. > > > 4. Add details for index only scans into gistcostestimate function > > > > *Links* > > 1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search > trees for database systems. - September, 1995. > > 2) http://www.sai.msu.su/~megera/postgres/gist/ > > 3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST > Indexes. > -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
2. gistget algorithm (several parts omitted):
Check the key
gistindex_keytest() – does this index tuple satisfy the scan key(s)?Scan all items on the GiST index page and insert them into the queue (or directly to output areas)
plain scan
Heap tuple TIDs are returned into so->pageData[]
ordered scan
Heap tuple TIDs are pushed into individual search queue items
If the fetch() is specified by the developer, then using it, algorithm can retrieve the data directly to output areas at this stage, without reference to the heap.
3. Create function gistcanreturn to check whether fetch() is specified by user.
Amcanreturn – Function to check whether index supports index-only scans, or zero if none
There is the question of support index-only scans for multicolumn indexes. Probably it would require extend the interface to add separate columns checking.
To solve this question I need the community’s help.
4. Add details for index only scans into gistcostestimate function
With best regards,
Alexander Korotkov.
----
With best regards,
Alexander Korotkov.
Alexander,
Can you comment on the below proposal? I'd like your opinion on how
difficult it will be.> *Project name*
On 03/18/2014 06:12 AM, Anastasia Lubennikova wrote:
> Hello!
> Here is the text of my proposal which I've applied to GSoC.
> (and link
> http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
> Any suggestions and comments are welcome.
>>> *Brief review*
> Support for index-only scans for GIST index
>
>
>>> *Benefits to the PostgreSQL Community*
> Currently GiST index don't have index-only scan functionality. Index-only
> scans are a major performance feature added to Postgres 9.2. They allow
> certain types of queries to be satisfied just by retrieving data from
> indexes, and not from tables. This feature for B-trees (implemented since
> PostgreSQL-9.2) allows doing certain types of queries significantly faster.
> This is achieved by reduction in the amount of I/O necessary to satisfy
> queries. I think it will be useful to implement this feature for user
> defined data types that use GiST index.
>
>
>>> *Quantifiable results*
> Faster GiST index search would be actual for many PostgreSQL applications
> (for example some GIS systems).
>
>>> *Project details*
> Acceleration of GiST index search.
>
>> *Links*>
> 1. The GiST is a balanced tree structure like a B-tree, containing <key,
> pointer> pairs. GiST key is a member of a user-defined class, and
> represents some property that is true of all data items reachable from the
> pointer associated with the key. The GiST provides a possibility to create
> custom data types with indexed access methods and extensible set of queries.
>
> There are seven methods that an index operator class for GiST must provide,
> and an eighth that is optional.
>
> -consistent
>
> -union
>
> -compress
>
> -decompress
>
> -penalty
>
> -picksplit
>
> -equal
>
> -distance (optional)
>
> I'm going to create new optional method fetch() in addition to existing.
> Thus user can create a method of retrieving data from the index without
> loss. It will be used when performing search queries to speed data
> retrieving.
>
>
>
> 2. gistget algorithm (several parts omitted):
>
> Check the key
> gistindex_keytest() - does this index tuple satisfy the scan key(s)?
>
> Scan all items on the GiST index page and insert them into the queue (or
> directly to output areas)
>
> plain scan
>
> Heap tuple TIDs are returned into so->pageData[]
>
> ordered scan
>
> Heap tuple TIDs are pushed into individual search queue items
>
> If the fetch() is specified by the developer, then using it, algorithm can
> retrieve the data directly to output areas at this stage, without reference
> to the heap.
>
>
> 3. Create function gistcanreturn to check whether fetch() is specified by
> user.
>
> Amcanreturn - Function to check whether index supports index-only scans, or
> zero if none
>
> There is the question of support index-only scans for multicolumn indexes.
> Probably it would require extend the interface to add separate columns
> checking.
>
> To solve this question I need the community's help.
>
>
> 4. Add details for index only scans into gistcostestimate function
>
>
>Josh Berkus>
> 1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
> trees for database systems. - September, 1995.
>
> 2) http://www.sai.msu.su/~megera/postgres/gist/
>
> 3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
> Indexes.
>
--
PostgreSQL Experts Inc.
http://pgexperts.com
On 03/18/2014 12:11 PM, Alexander Korotkov wrote: > Josh, > > Anastasia has already consulted to me in person. It is not big proposal. > But for newbie who is not familiar with PostgreSQL code base and especially > GiST it seems fair enough. > Thanks, that's what I wanted to know. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
This seems to be the crux of your proposal, but it seems vague: what
> If the fetch() is specified by the developer, then using it, algorithm can
> retrieve the data directly to output areas at this stage, without reference
> to the heap.
exactly do you mean by "retrieve the data directly to output areas"?
What data are you going to retrieve and where are you going to put it?
Another question to consider is: which operator classes do you
anticipate that this will work for and which ones do you anticipate
that it will not work for? Any operator class that lossifies that
input data before storing it in the index is presumably doomed, but
which ones do that, and which do not?
I'm going to create function gistcanreturn() = If fetch() is defined for all indexed columns?
And last point of my project is to implement fetch() for existing opclasses based on GIST.
--