Thread: GiST indexing tuples
Hi all. I'm wanting to write a new GiST index system, to improve performance on some queries I am running. I have had quite a look through the docs and code, and I'm not convinced that it is possible to do what I want. This is what I am wanting to index: CREATE INDEX range_index ON table(a, b) USING fancy_new_index; and then: SELECT * FROM table WHERE a > 1 AND b < 4; and have that answered by the index. Now, generating an index format that can answer that particular arrangement of constraints is easy. I can do that. However, getting multiple values into the GiST functions is something I don't know how to do. As far as I can see, I would need to create a composite type and index that, like the contrib package seg does. This would change my SQL to: CREATE INDEX range_index ON table(fancy_type(a, b)) USING fancy_index; SELECT * FROM table WHERE fancy_type(a, b) &^£@! fancy_type(1, 4); which I don't want to do. So, has this problem been solved before? Is there an already-existing index that will speed up my query? Is there a way to get more than one value into a GiST index? Thanks, Matthew -- If you let your happiness depend upon how somebody else feels about you, now you have to control how somebody else feels about you. -- Abraham Hicks
On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote: > SELECT * FROM table WHERE a > 1 AND b < 4; This sounds like something an R-tree can do. /* Steinar */ -- Homepage: http://www.sesse.net/
On Tue, 27 Nov 2007, Steinar H. Gunderson wrote: > On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote: > > SELECT * FROM table WHERE a > 1 AND b < 4; > > This sounds like something an R-tree can do. I *know* that. However, Postgres (as far as I can see) doesn't provide a simple R-tree index that will index two integers. This means I have to write one myself. I'm asking whether it is possible to get two values into a GiST index, which would allow me to implement this. Matthew -- It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt. -- Mark Twain
"Matthew" <matthew@flymine.org> writes: > On Tue, 27 Nov 2007, Steinar H. Gunderson wrote: >> On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote: >> > SELECT * FROM table WHERE a > 1 AND b < 4; >> >> This sounds like something an R-tree can do. > > I *know* that. However, Postgres (as far as I can see) doesn't provide a > simple R-tree index that will index two integers. This means I have to > write one myself. I'm asking whether it is possible to get two values into > a GiST index, which would allow me to implement this. The database is capable of determining that a>1 and b<4 are both conditions which a single index can satisfy. However GIST itself treats each column of the index independently applying the first column then the second one and so on like a traditional btree index, so it doesn't really do what you would want. I did propose a while back that GIST should consider all columns simultaneously in the same style as rtree. However this would require making GIST somewhat less general in another sense. Currently page splits can be handled arbitrarily but if you have to be able to combine different datatypes it would mean having to come up with a standard algorithm which would work everywhere. (I suggested making everything work in terms of "distance" and then using the n-space vector distance (ie sqrt((a1-b1)^2+(a2-b2)^2+...).) This means GIST wouldn't be as general as it is now but it would allow us to handle cases like yours automatically. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Matthew <matthew@flymine.org> writes: >> This sounds like something an R-tree can do. > I *know* that. However, Postgres (as far as I can see) doesn't provide a > simple R-tree index that will index two integers. This means I have to > write one myself. I'm asking whether it is possible to get two values into > a GiST index, which would allow me to implement this. Have you looked at contrib/seg/ ? regards, tom lane
On Wed, 28 Nov 2007, Tom Lane wrote: > Have you looked at contrib/seg/ ? Yes, I had a pretty good look at that. However, I believe that in order to use seg's indexes, I would need to put my data into seg's data type, and reformat my query, as I stated in my original message. What I'm looking for is a general R-tree (or similar) index that will index multiple columns of normal data types. For instance, the normal B-tree index on (a, b) is able to answer queries like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these, plus queries like "a > 5 AND b < 1". As far as I can see, it is not possible at the moment to write such an index system for GiST, which is a shame because the actual R-tree algorithm is very simple. It's just a matter of communicating both values from the query to the index code. Matthew -- I have an inferiority complex. But it's not a very good one.
Matthew wrote: > For instance, the normal B-tree index on (a, b) is able to answer queries > like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these, > plus queries like "a > 5 AND b < 1". Sorry in advance if this is a stupid question, but how is this better than two index, one on "a" and one on "b"? I supposed there could be a space savings but beyond that?
On Thu, Nov 29, 2007 at 03:23:10PM -0500, Matthew T. O'Connor wrote: > Sorry in advance if this is a stupid question, but how is this better than > two index, one on "a" and one on "b"? I supposed there could be a space > savings but beyond that? You could index on both columns simultaneously without a bitmap index scan. /* Steinar */ -- Homepage: http://www.sesse.net/
On Thu, 29 Nov 2007, Matthew T. O'Connor wrote: > Matthew wrote: > > For instance, the normal B-tree index on (a, b) is able to answer queries > > like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these, > > plus queries like "a > 5 AND b < 1". > > Sorry in advance if this is a stupid question, but how is this better > than two index, one on "a" and one on "b"? I supposed there could be a > space savings but beyond that? Imagine you have a table with columns "a" and "b". The table has a bazillion rows, and the values of "a" and "b" both range from a negative bazillion to a positive bazillion. (Note this is exactly the case in our database, for some value of a bazillion.) Then, you run the query: SELECT * FROM table WHERE a > 5 AND b < 1; So, an index on "a" will return half a bazillion results for the constraint "a > 5". Likewise, the index on "b" will return half a bazillion results for the constraint "b < 1". However, the intersection of these two constraints could be just a few rows. (Note this is exactly the case in our database.) Now, Postgres has two options. It could use just the one index and filter half a bazillion rows (which is what it used to do), or it could create a bitmap with a bazillion bits from each index, and do a logical AND operation on them to create a new bitmap with just a few bits set (which it now can do under some circumstances). Either way, it's going to be a heavy operation. An R-tree index on "a, b" would instantly return just those few rows, without using significant amounts of memory or time. Hope that helps, Matthew -- Patron: "I am looking for a globe of the earth." Librarian: "We have a table-top model over here." Patron: "No, that's not good enough. Don't you have a life-size?" Librarian: (pause) "Yes, but it's in use right now."