Thread: What's the best way to index this table for speed?

What's the best way to index this table for speed?

From
"Carl Lerche"
Date:
Hello,

I have what seems to be a massively complicated problem. If I were to
try to write it out, I think the description of it would go on for
pages. Before I go into it, I think I will ask a much simplified
version of my question and hopefully work my way out from there.

How can I index 2 dimensional data (latitude / longitude) with a
status_id column too (integer) so that I can perform the following
query as fast as possible:

    SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN
<y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>;

Obviously a btree index wouldn't work well and in the documentations
it said an rtree index works for 2 dimensional queries, but I would
like to filter first by status_id since that will probably eliminate
50%+ of the rows in the table. There are currently over 600 000 rows
in the table so far, and it will be growing to well over a million.

As an extra bonus, how can I setup tsearch2 to perform a full text
query on just the results of the previous query (as in, do a text
search on all profiles that are active and in a certain area)?

Thanks,
-carl


--
EPA Rating: 3000 Lines of Code / Gallon (of coffee)

Re: What's the best way to index this table for speed?

From
Steve Wormley
Date:
On 2/1/07 10:42 AM, "Carl Lerche" <carl.lerche@gmail.com> wrote:

> How can I index 2 dimensional data (latitude / longitude) with a
> status_id column too (integer) so that I can perform the following
> query as fast as possible:
>
>     SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN
> <y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>;

Depending on what your needs are, I'd look into PostGIS and make your data
points and doing bounding box queries using a GIST index either with a
partial index or indexes depending on how many status ids:

create index xyz on profiles using gist(mypoint) where status_id = 1;

Or a multicolumn index using the btree_gist contrib module:

create index xyz on profiles using gist (status_id,mypoint);

I use the first format as I only have 2 statuses, valid and not valid and it
works quite well. If PostGIS won't work for you you still can possibly use a
partial index if it works for your status_ids.

-Steve Wormley
Senior Consultant
inCode Wireless
a Verisign Company



Re: What's the best way to index this table for speed?

From
Bruno Wolff III
Date:
On Thu, Feb 01, 2007 at 10:42:30 -0800,
  Carl Lerche <carl.lerche@gmail.com> wrote:
>
> How can I index 2 dimensional data (latitude / longitude) with a
> status_id column too (integer) so that I can perform the following
> query as fast as possible:
>
>    SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN
> <y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>;
>
> Obviously a btree index wouldn't work well and in the documentations
> it said an rtree index works for 2 dimensional queries, but I would
> like to filter first by status_id since that will probably eliminate
> 50%+ of the rows in the table. There are currently over 600 000 rows
> in the table so far, and it will be growing to well over a million.

The earth distance contrib will use gist indexes based on the underlying
cube data type (also in contrib).

The the location is at all selective, then you probably don't need to worry
about status as it isn't very selective. If your data is clustered on the
disk mostly by location, it might not by you much at all since the data will
like be read from disk anyway.