Thread: GiST indexing tuples

GiST indexing tuples

From
Matthew
Date:
Hi all.

I'm wanting to write a new GiST index system, to improve performance on
some queries I am running. I have had quite a look through the docs and
code, and I'm not convinced that it is possible to do what I want. This is
what I am wanting to index:

CREATE INDEX range_index ON table(a, b) USING fancy_new_index;

and then:

SELECT * FROM table WHERE a > 1 AND b < 4;

and have that answered by the index.

Now, generating an index format that can answer that particular
arrangement of constraints is easy. I can do that. However, getting
multiple values into the GiST functions is something I don't know how to
do. As far as I can see, I would need to create a composite type and index
that, like the contrib package seg does. This would change my SQL to:

CREATE INDEX range_index ON table(fancy_type(a, b)) USING fancy_index;

SELECT * FROM table WHERE fancy_type(a, b) &^£@! fancy_type(1, 4);

which I don't want to do.

So, has this problem been solved before? Is there an already-existing
index that will speed up my query? Is there a way to get more than one
value into a GiST index?

Thanks,

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

Re: GiST indexing tuples

From
"Steinar H. Gunderson"
Date:
On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
> SELECT * FROM table WHERE a > 1 AND b < 4;

This sounds like something an R-tree can do.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: GiST indexing tuples

From
Matthew
Date:
On Tue, 27 Nov 2007, Steinar H. Gunderson wrote:
> On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
> > SELECT * FROM table WHERE a > 1 AND b < 4;
>
> This sounds like something an R-tree can do.

I *know* that. However, Postgres (as far as I can see) doesn't provide a
simple R-tree index that will index two integers. This means I have to
write one myself. I'm asking whether it is possible to get two values into
a GiST index, which would allow me to implement this.

Matthew

--
It is better to keep your mouth closed and let people think you are a fool
than to open it and remove all doubt.                  -- Mark Twain

Re: GiST indexing tuples

From
Gregory Stark
Date:
"Matthew" <matthew@flymine.org> writes:

> On Tue, 27 Nov 2007, Steinar H. Gunderson wrote:
>> On Tue, Nov 27, 2007 at 06:28:23PM +0000, Matthew wrote:
>> > SELECT * FROM table WHERE a > 1 AND b < 4;
>>
>> This sounds like something an R-tree can do.
>
> I *know* that. However, Postgres (as far as I can see) doesn't provide a
> simple R-tree index that will index two integers. This means I have to
> write one myself. I'm asking whether it is possible to get two values into
> a GiST index, which would allow me to implement this.

The database is capable of determining that a>1 and b<4 are both conditions
which a single index can satisfy.

However GIST itself treats each column of the index independently applying the
first column then the second one and so on like a traditional btree index, so
it doesn't really do what you would want.

I did propose a while back that GIST should consider all columns
simultaneously in the same style as rtree.

However this would require making GIST somewhat less general in another sense.
Currently page splits can be handled arbitrarily but if you have to be able to
combine different datatypes it would mean having to come up with a standard
algorithm which would work everywhere. (I suggested making everything work in
terms of "distance" and then using the n-space vector distance (ie
sqrt((a1-b1)^2+(a2-b2)^2+...).) This means GIST wouldn't be as general as
it is now but it would allow us to handle cases like yours automatically.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: GiST indexing tuples

From
Tom Lane
Date:
Matthew <matthew@flymine.org> writes:
>> This sounds like something an R-tree can do.

> I *know* that. However, Postgres (as far as I can see) doesn't provide a
> simple R-tree index that will index two integers. This means I have to
> write one myself. I'm asking whether it is possible to get two values into
> a GiST index, which would allow me to implement this.

Have you looked at contrib/seg/ ?

            regards, tom lane

Re: GiST indexing tuples

From
Matthew
Date:
On Wed, 28 Nov 2007, Tom Lane wrote:
> Have you looked at contrib/seg/ ?

Yes, I had a pretty good look at that. However, I believe that in order to
use seg's indexes, I would need to put my data into seg's data type, and
reformat my query, as I stated in my original message. What I'm looking
for is a general R-tree (or similar) index that will index multiple
columns of normal data types.

For instance, the normal B-tree index on (a, b) is able to answer queries
like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
plus queries like "a > 5 AND b < 1".

As far as I can see, it is not possible at the moment to write such an
index system for GiST, which is a shame because the actual R-tree
algorithm is very simple. It's just a matter of communicating both values
from the query to the index code.

Matthew

--
I have an inferiority complex. But it's not a very good one.

Re: GiST indexing tuples

From
"Matthew T. O'Connor"
Date:
Matthew wrote:
> For instance, the normal B-tree index on (a, b) is able to answer queries
> like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
> plus queries like "a > 5 AND b < 1".

Sorry in advance if this is a stupid question, but how is this better
than two index, one on "a" and one on "b"?  I supposed there could be a
space savings but beyond that?


Re: GiST indexing tuples

From
"Steinar H. Gunderson"
Date:
On Thu, Nov 29, 2007 at 03:23:10PM -0500, Matthew T. O'Connor wrote:
> Sorry in advance if this is a stupid question, but how is this better than
> two index, one on "a" and one on "b"?  I supposed there could be a space
> savings but beyond that?

You could index on both columns simultaneously without a bitmap index scan.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: GiST indexing tuples

From
Matthew
Date:
On Thu, 29 Nov 2007, Matthew T. O'Connor wrote:
> Matthew wrote:
> > For instance, the normal B-tree index on (a, b) is able to answer queries
> > like "a = 5 AND b > 1" or "a > 5". An R-tree would be able to index these,
> > plus queries like "a > 5 AND b < 1".
>
> Sorry in advance if this is a stupid question, but how is this better
> than two index, one on "a" and one on "b"?  I supposed there could be a
> space savings but beyond that?

Imagine you have a table with columns "a" and "b". The table has a
bazillion rows, and the values of "a" and "b" both range from a negative
bazillion to a positive bazillion. (Note this is exactly the case in our
database, for some value of a bazillion.) Then, you run the query:

SELECT * FROM table WHERE a > 5 AND b < 1;

So, an index on "a" will return half a bazillion results for the
constraint "a > 5". Likewise, the index on "b" will return half a
bazillion results for the constraint "b < 1". However, the intersection of
these two constraints could be just a few rows. (Note this is exactly the
case in our database.)

Now, Postgres has two options. It could use just the one index and filter
half a bazillion rows (which is what it used to do), or it could create a
bitmap with a bazillion bits from each index, and do a logical AND
operation on them to create a new bitmap with just a few bits set (which
it now can do under some circumstances). Either way, it's going to be a
heavy operation.

An R-tree index on "a, b" would instantly return just those few rows,
without using significant amounts of memory or time.

Hope that helps,

Matthew

--
Patron: "I am looking for a globe of the earth."
Librarian: "We have a table-top model over here."
Patron: "No, that's not good enough. Don't you have a life-size?"
Librarian: (pause) "Yes, but it's in use right now."