Thread: SP-GiST versus index-only scans
It would not take very much at all to make the SP-GiST stuff support index-only scans, except for one thing: it's conceivable that an opclass might choose to store only a lossy version of the original indexed datum, in which case it'd not be able to reconstruct the datum on demand. I'm not sure how likely this is, because the design of SP-GiST forces root-level leaf tuples to be stored with the original datum, but in principle leaf tuples at lower levels might have lossy representations. None of the current opclasses do that, but Oleg and Teodor evidently foresee such cases, else they'd not have bothered with the recheck support that's in there. (If the original datum is reconstructable then the opclass can surely deliver a correct answer for every leaf tuple.) So the problem is that we have to either disallow such opclass designs, or support per-opclass rather than per-index-AM decisions about whether index-only scans are possible. We discussed that idea when the index-only-scans patch went in a few months ago, but blew it off on the grounds that there didn't seem to be any immediate need for it. Well, the need is here. I think we should get rid of the pg_am.amcanreturn boolean column and replace it with an AM support function. btree's would just return constant true, but I envision SP-GiST's version interrogating the opclass config function. (If we make the config function return this info, we don't need to add anything to CREATE OPERATOR CLASS, which seems like a good thing to me.) regards, tom lane
On 2011-12-14 19:00, Tom Lane wrote: > So the problem is that we have to either disallow such opclass designs, > or support per-opclass rather than per-index-AM decisions about whether > index-only scans are possible. Just a quick comment, for some queries like the famous select count(*) from table where column @@ to_tsquery('something'); I do think index-only-scans does make sense even on indices where the original tuple cannot be re-constructed. This also goes for gin indices as well. .. and yes, I do have a real-world application that would utillize this. (and love it) Jesper -- Jesper
Jesper Krogh <jesper@krogh.cc> writes: > On 2011-12-14 19:00, Tom Lane wrote: >> So the problem is that we have to either disallow such opclass designs, >> or support per-opclass rather than per-index-AM decisions about whether >> index-only scans are possible. > Just a quick comment, for some queries like the famous > select count(*) from table where column @@ to_tsquery('something'); > I do think index-only-scans does make sense even on indices > where the original tuple cannot be re-constructed. This also goes > for gin indices as well. I think this is somewhat wishful thinking unfortunately. The difficulty is that if the index isn't capable of reconstructing the original value, then it's probably giving only an approximate (lossy) answer, which means we'll have to visit the heap to recheck each result, which pretty much defeats the purpose of an index-only scan. So I can't get excited about contorting things to support this. regards, tom lane
On 2011-12-14 19:48, Tom Lane wrote: > Jesper Krogh<jesper@krogh.cc> writes: >> On 2011-12-14 19:00, Tom Lane wrote: >>> So the problem is that we have to either disallow such opclass designs, >>> or support per-opclass rather than per-index-AM decisions about whether >>> index-only scans are possible. >> Just a quick comment, for some queries like the famous >> select count(*) from table where column @@ to_tsquery('something'); >> I do think index-only-scans does make sense even on indices >> where the original tuple cannot be re-constructed. This also goes >> for gin indices as well. > I think this is somewhat wishful thinking unfortunately. The difficulty > is that if the index isn't capable of reconstructing the original value, > then it's probably giving only an approximate (lossy) answer, which > means we'll have to visit the heap to recheck each result, which > pretty much defeats the purpose of an index-only scan. So I can't get > excited about contorting things to support this. I can see that it is hard to generalize, but in the tsvector case the we are indeed not capable of reconstructing the row since the positions are not stored in the index, the actual lookup is not a lossy and I'm fairly sure (based on experience) that pg dont revisit heap-tuples for checking (only for visibillity). -- Jesper -- Jesper
Jesper Krogh <jesper@krogh.cc> writes: > On 2011-12-14 19:48, Tom Lane wrote: >> I think this is somewhat wishful thinking unfortunately. The difficulty >> is that if the index isn't capable of reconstructing the original value, >> then it's probably giving only an approximate (lossy) answer, which >> means we'll have to visit the heap to recheck each result, which >> pretty much defeats the purpose of an index-only scan. > I can see that it is hard to generalize, but in the tsvector case the > we are indeed not capable of reconstructing the row since the > positions are not stored in the index, the actual lookup is not a > lossy and I'm fairly sure (based on experience) that pg dont > revisit heap-tuples for checking (only for visibillity). Well, the way the tsvector code handles this stuff is that it reports the result as lossy only if the query actually poses a constraint on position (some do, some don't). That case was actually what made us move the determination of lossiness from plan time to execution time, since in the case of a non-constant tsquery, there's no way for the planner to know about it (and even with the constant case, you'd need a helper function that doesn't exist today). But this behavior is problematic for index-only scans because the planner can't tell whether a query will be lossy or not, and it makes a heck of a lot bigger difference than it used to. regards, tom lane
Tom, There seems to exist some opportunities now with GIST which relate to geometry/geography types (but not only...): 1. Index-only scans on geometry columns with SP-GIST (being able to do a "SELECT id FROM my_table WHERE mygeom...;"). 2. Index clustering incuding NULL values (i.e. being able to do a "CLUSTER mygeom_index ON mytable;" ). This discussion suggests that at least 1. is close to be implemented. The problem of 2. has to do with handling NULL values; it's mentioned in the PostGIS manual [1]. I'm aware of kd-tree index development [2]. Don't know if clustering and index-only scans would be resolved there. But I can't find neither in the Todo List [3] ? What do you think? Yours, Stefan [2] http://postgis.refractions.net/docs/ch06.html#id2635907 [3] http://old.nabble.com/IMPORTANT%3A-%28Still%29-Seeking-Funding-for-Faster-PostGIS-Indexes-td32633545.html [3] http://wiki.postgresql.org/wiki/Todo#Indexes 2011/12/14 Tom Lane <tgl@sss.pgh.pa.us>: > Jesper Krogh <jesper@krogh.cc> writes: >> On 2011-12-14 19:48, Tom Lane wrote: >>> I think this is somewhat wishful thinking unfortunately. The difficulty >>> is that if the index isn't capable of reconstructing the original value, >>> then it's probably giving only an approximate (lossy) answer, which >>> means we'll have to visit the heap to recheck each result, which >>> pretty much defeats the purpose of an index-only scan. > >> I can see that it is hard to generalize, but in the tsvector case the >> we are indeed not capable of reconstructing the row since the >> positions are not stored in the index, the actual lookup is not a >> lossy and I'm fairly sure (based on experience) that pg dont >> revisit heap-tuples for checking (only for visibillity). > > Well, the way the tsvector code handles this stuff is that it reports > the result as lossy only if the query actually poses a constraint on > position (some do, some don't). That case was actually what made us > move the determination of lossiness from plan time to execution time, > since in the case of a non-constant tsquery, there's no way for the > planner to know about it (and even with the constant case, you'd need a > helper function that doesn't exist today). But this behavior is > problematic for index-only scans because the planner can't tell whether > a query will be lossy or not, and it makes a heck of a lot bigger > difference than it used to. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Stefan Keller <sfkeller@gmail.com> writes: > There seems to exist some opportunities now with GIST which relate to > geometry/geography types (but not only...): > 1. Index-only scans on geometry columns with SP-GIST (being able to do > a "SELECT id FROM my_table WHERE mygeom...;"). Well, if somebody builds an SP-GiST opclass for geometry, that should work. > 2. Index clustering incuding NULL values (i.e. being able to do a > "CLUSTER mygeom_index ON mytable;" ). Huh? GiST has supported nulls, and CLUSTER, since 8.2 or so. The section of the PostGIS manual you're referring to seems to be years out of date. regards, tom lane