Thread: What's the best way to index this table for speed?
Hello, I have what seems to be a massively complicated problem. If I were to try to write it out, I think the description of it would go on for pages. Before I go into it, I think I will ask a much simplified version of my question and hopefully work my way out from there. How can I index 2 dimensional data (latitude / longitude) with a status_id column too (integer) so that I can perform the following query as fast as possible: SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN <y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>; Obviously a btree index wouldn't work well and in the documentations it said an rtree index works for 2 dimensional queries, but I would like to filter first by status_id since that will probably eliminate 50%+ of the rows in the table. There are currently over 600 000 rows in the table so far, and it will be growing to well over a million. As an extra bonus, how can I setup tsearch2 to perform a full text query on just the results of the previous query (as in, do a text search on all profiles that are active and in a certain area)? Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee)
On 2/1/07 10:42 AM, "Carl Lerche" <carl.lerche@gmail.com> wrote: > How can I index 2 dimensional data (latitude / longitude) with a > status_id column too (integer) so that I can perform the following > query as fast as possible: > > SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN > <y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>; Depending on what your needs are, I'd look into PostGIS and make your data points and doing bounding box queries using a GIST index either with a partial index or indexes depending on how many status ids: create index xyz on profiles using gist(mypoint) where status_id = 1; Or a multicolumn index using the btree_gist contrib module: create index xyz on profiles using gist (status_id,mypoint); I use the first format as I only have 2 statuses, valid and not valid and it works quite well. If PostGIS won't work for you you still can possibly use a partial index if it works for your status_ids. -Steve Wormley Senior Consultant inCode Wireless a Verisign Company
On Thu, Feb 01, 2007 at 10:42:30 -0800, Carl Lerche <carl.lerche@gmail.com> wrote: > > How can I index 2 dimensional data (latitude / longitude) with a > status_id column too (integer) so that I can perform the following > query as fast as possible: > > SELECT * FROM profiles WHERE status_id = 1 AND latitude BETWEEN > <y_1> AND <y_2> AND longitude BETWEEN <x_1> AND <x_2>; > > Obviously a btree index wouldn't work well and in the documentations > it said an rtree index works for 2 dimensional queries, but I would > like to filter first by status_id since that will probably eliminate > 50%+ of the rows in the table. There are currently over 600 000 rows > in the table so far, and it will be growing to well over a million. The earth distance contrib will use gist indexes based on the underlying cube data type (also in contrib). The the location is at all selective, then you probably don't need to worry about status as it isn't very selective. If your data is clustered on the disk mostly by location, it might not by you much at all since the data will like be read from disk anyway.