Thread: fast case-insensitive sort
I'm having trouble getting postgres to use an index when doing an ORDER BY UPPER(surname), for instance. I can create an index on UPPER(surname), but it doesn't use it in the query - is there a better way of doing a fast case insensitive sort? This is the query plan of ORDER BY surname, which uses the index I've created on surname: asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY surname; QUERY PLAN ---------------------------------------------------------------------------- -----------------------------------------------------------Index Scan using cands_surname_idx on cands (cost=0.00..3983.55rows=31348 width=1971) (actual time=0.27..647.09 rows=31348 loops=1)Total runtime: 664.62 msec (2 rows) Note the time taken. This is the query plan of sorting by UPPER(surname) before and after creating the index: asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname); QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------Sort (cost=39701.56..39779.93 rows=31348 width=1971) (actual time=3384.08..3444.77 rows=31348 loops=1) Sort Key: upper((surname)::text) -> Seq Scan on cands (cost=0.00..1165.48 rows=31348width=1971) (actual time=0.06..628.09 rows=31348 loops=1)Total runtime: 3875.90 msec (4 rows) asp_employ=# CREATE INDEX cand_sur_up_idx ON cands (UPPER(surname)); CREATE INDEX asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname); QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------Sort (cost=39701.56..39779.93 rows=31348 width=1971) (actual time=3368.43..3429.51 rows=31348 loops=1) Sort Key: upper((surname)::text) -> Seq Scan on cands (cost=0.00..1165.48 rows=31348width=1971) (actual time=0.07..621.78 rows=31348 loops=1)Total runtime: 3859.33 msec (4 rows) The Seq Scan is much slower. It may be interesting to note that this uses the index: asp_employ=# explain analyse select * from cands where upper(surname)='SMITH'; QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------------------------Index Scan using cand_sur_up_idx on cands (cost=0.00..578.66 rows=157 width=1972) (actual time=54.52..60.06 rows=224 loops=1) Index Cond: (upper((surname)::text) = 'SMITH'::text)Total runtime:60.36 msec (3 rows) I created a thread on this topic on the Ars Technica forums, but there was not a solution - the discussion for reference only is here: http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=6330927813& m=1070953065 So should I be using a specific type of index, or is there a better way?
On Tue, 15 Apr 2003, Sigi Jekabsons wrote: > I'm having trouble getting postgres to use an index when doing an ORDER BY > UPPER(surname), for instance. I can create an index on UPPER(surname), but > it doesn't use it in the query - is there a better way of doing a fast case > insensitive sort? What's the table look like (what type is surname)? I can get behavior like this using a varchar, but it uses the index when the field is declared as text.
The table 'cands' has a 'cand_id' int4 primary key, indexed and specified unique. 'surname' is varchar(255), indexed in the manner shown on my previous post. You're quite right, though, I just tried indexing and sorting by a column of type text and that worked just fine, it used the index in the order by. Why is that? > What's the table look like (what type is surname)? I can get behavior > like this using a varchar, but it uses the index when the field is > declared as text. >
"Sigi Jekabsons" <sigi.j@workskillsprofessionals.com.au> writes: > 'surname' is varchar(255), indexed in the manner shown on my > previous post. You're quite right, though, I just tried indexing and > sorting by a column of type text and that worked just fine, it used the > index in the order by. Why is that? I would have said that the presence of the implicit varchar-to-text cast prevented the system from detecting that the expression matches the index value. But if it's able to detect the match in one case and not the other, that might be a garden-variety bug. Will look at it ... regards, tom lane
Thanks for that. Is it recommended, then, that I use type of text on columns that I intend to sort with an index on UPPER(column), or is there something else I should be doing? > I would have said that the presence of the implicit varchar-to-text cast > prevented the system from detecting that the expression matches the > index value. But if it's able to detect the match in one case and not > the other, that might be a garden-variety bug. Will look at it ... > > regards, tom lane > >
"Sigi Jekabsons" <sigi.j@workskillsprofessionals.com.au> writes: > Thanks for that. Is it recommended, then, that I use type of text on > columns that I intend to sort with an index on UPPER(column), That's the best immediate workaround, for sure. regards, tom lane