Hi All,
I'm attempting to implement an auto complete text field on a site I
run. The plan is to return a maximum of 5 rows where the name of the
location *begins* with the contexts of the text field (case
insensitive).
The location table contains data for 2182293 locations. I'm matching
based on the "name" column. Before we dive in, this is Postgres 8.2.4.
Table description:
db=> \d gis_regions
Table "public.gis_regions"
Column | Type | Modifiers
-----------------+------------------------+----------------------------------------------------------
id | integer | not null default
nextval('gis_regions_id_seq'::regclass)
name | character varying(200) | not null
realname | character varying(200) |
parent_id | integer | not null
lft | integer |
rgt | integer |
town_id | integer |
fipscode | character(2) |
anomolytown | boolean |
property_count | integer | default 0
link_count | integer | default 0
hierachy_string | text |
hash | character varying(32) |
regen_map | boolean |
the_geom | geometry |
Indexes:
"idx_gisregions_id" PRIMARY KEY, btree (id)
"idx_gis_regions_hash" UNIQUE, btree (hash)
"gis_regions_idx_lftrgt" btree (lft, rgt)
"gis_regions_idx_linkcount" btree (link_count)
"gis_regions_idx_parentid" btree (parent_id, property_count)
"gis_regions_idx_rgtlft" btree (rgt, lft)
"gis_regions_idx_townid" btree (town_id)
"idx_dmetaphone_name" btree (dmetaphone(name::text))
"idx_gis_regions" gist (the_geom)
"idx_name" btree (lower(name::text))
Check constraints:
"enforce_dims_the_geom" CHECK (ndims(the_geom) = 2)
"enforce_srid_the_geom" CHECK (srid(the_geom) = -1)
I added the "idx_name" index with the aim of optimizing the following query:
SELECT name from gis_regions where lower(name) LIKE 'teign%' LIMIT 5;
name
---------------
Teign Village
Teigngrace
Teigny
Teignmouth
(4 rows)
Time: 45318.544 ms
45 Seconds is much longer than I'd anticipated; no problem - to the
explain plan:
db=> EXPLAIN SELECT name from gis_regions where lower(name) LIKE
'teign%' LIMIT 5;
QUERY PLAN
----------------------------------------------------------------------
Limit (cost=0.00..87606.40 rows=1 width=13)
-> Seq Scan on gis_regions (cost=0.00..87606.40 rows=1 width=13)
Filter: (lower((name)::text) ~~ 'teign%'::text)
(3 rows)
So we're seq scanning. I unsuccessfully attempted to tweak the
optimiser costs to see if I could get it to use the index before
setting enable_seq_scan = false and re-running the query:
db=> EXPLAIN SELECT name from gis_regions where lower(name) LIKE
'teign%' LIMIT 5;
QUERY PLAN
----------------------------------------------------------------------------------
Limit (cost=100000000.00..100087606.39 rows=1 width=13)
-> Seq Scan on gis_regions (cost=100000000.00..100087606.39
rows=1 width=13)
Filter: (lower((name)::text) ~~ 'teign%'::text)
(3 rows)
And we're still running the sequential scan - My questions is why?
Both of the examples above were run with all the default optimiser
costs. The only thing that I can conclude is that it's something to do
with the datatype of "name" (VARCHAR(200)) but haven't been able to
find an explanation with the usual googling.
As an aside I'd like to upgrade to 8.3 to take advantage of the new
index ordering, but in the mean time I'd settle for making use of an
index :)
Any assistance gratefully received.
Kind Regards,
Neil.