fast case-insensitive sort - Mailing list pgsql-sql

From Sigi Jekabsons
Subject fast case-insensitive sort
Date
Msg-id 002e01c302e8$db771ab0$7c01a8c0@microvac
Whole thread Raw
Responses Re: fast case-insensitive sort
List pgsql-sql
I'm having trouble getting postgres to use an index when doing an ORDER BY
UPPER(surname), for instance. I can create an index on UPPER(surname), but
it doesn't use it in the query - is there a better way of doing a fast case
insensitive sort?

This is the query plan of ORDER BY surname, which uses the index I've
created on surname:
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY surname;
      QUERY PLAN
 
----------------------------------------------------------------------------
-----------------------------------------------------------Index Scan using cands_surname_idx on cands
(cost=0.00..3983.55rows=31348
 
width=1971) (actual time=0.27..647.09 rows=31348 loops=1)Total runtime: 664.62 msec
(2 rows)


Note the time taken.
This is the query plan of sorting by UPPER(surname) before and after
creating the index:
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
   QUERY PLAN
 
----------------------------------------------------------------------------
---------------------------------------Sort  (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3384.08..3444.77 rows=31348 loops=1)  Sort Key: upper((surname)::text)  ->  Seq Scan on cands  (cost=0.00..1165.48
rows=31348width=1971) (actual
 
time=0.06..628.09 rows=31348 loops=1)Total runtime: 3875.90 msec
(4 rows)

asp_employ=# CREATE INDEX cand_sur_up_idx ON cands (UPPER(surname));
CREATE INDEX
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
   QUERY PLAN
 
----------------------------------------------------------------------------
---------------------------------------Sort  (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3368.43..3429.51 rows=31348 loops=1)  Sort Key: upper((surname)::text)  ->  Seq Scan on cands  (cost=0.00..1165.48
rows=31348width=1971) (actual
 
time=0.07..621.78 rows=31348 loops=1)Total runtime: 3859.33 msec
(4 rows)

The Seq Scan is much slower.  It may be interesting to note that this uses
the index:
asp_employ=# explain analyse select * from cands where
upper(surname)='SMITH';                                                          QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------Index Scan using cand_sur_up_idx on cands  (cost=0.00..578.66
rows=157
width=1972) (actual time=54.52..60.06 rows=224 loops=1)  Index Cond: (upper((surname)::text) = 'SMITH'::text)Total
runtime:60.36 msec
 
(3 rows)

I created a thread on this topic on the Ars Technica forums, but there was
not a solution - the discussion for reference only is here:
http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=6330927813&
m=1070953065

So should I be using a specific type of index, or is there a better way?



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: help with query!!!
Next
From: Stephan Szabo
Date:
Subject: Re: fast case-insensitive sort