Thread: fast case-insensitive sort

fast case-insensitive sort

From
"Sigi Jekabsons"
Date:
I'm having trouble getting postgres to use an index when doing an ORDER BY
UPPER(surname), for instance. I can create an index on UPPER(surname), but
it doesn't use it in the query - is there a better way of doing a fast case
insensitive sort?

This is the query plan of ORDER BY surname, which uses the index I've
created on surname:
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY surname;
      QUERY PLAN
 
----------------------------------------------------------------------------
-----------------------------------------------------------Index Scan using cands_surname_idx on cands
(cost=0.00..3983.55rows=31348
 
width=1971) (actual time=0.27..647.09 rows=31348 loops=1)Total runtime: 664.62 msec
(2 rows)


Note the time taken.
This is the query plan of sorting by UPPER(surname) before and after
creating the index:
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
   QUERY PLAN
 
----------------------------------------------------------------------------
---------------------------------------Sort  (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3384.08..3444.77 rows=31348 loops=1)  Sort Key: upper((surname)::text)  ->  Seq Scan on cands  (cost=0.00..1165.48
rows=31348width=1971) (actual
 
time=0.06..628.09 rows=31348 loops=1)Total runtime: 3875.90 msec
(4 rows)

asp_employ=# CREATE INDEX cand_sur_up_idx ON cands (UPPER(surname));
CREATE INDEX
asp_employ=# EXPLAIN ANALYSE SELECT * FROM cands ORDER BY UPPER(surname);
   QUERY PLAN
 
----------------------------------------------------------------------------
---------------------------------------Sort  (cost=39701.56..39779.93 rows=31348 width=1971) (actual
time=3368.43..3429.51 rows=31348 loops=1)  Sort Key: upper((surname)::text)  ->  Seq Scan on cands  (cost=0.00..1165.48
rows=31348width=1971) (actual
 
time=0.07..621.78 rows=31348 loops=1)Total runtime: 3859.33 msec
(4 rows)

The Seq Scan is much slower.  It may be interesting to note that this uses
the index:
asp_employ=# explain analyse select * from cands where
upper(surname)='SMITH';                                                          QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------Index Scan using cand_sur_up_idx on cands  (cost=0.00..578.66
rows=157
width=1972) (actual time=54.52..60.06 rows=224 loops=1)  Index Cond: (upper((surname)::text) = 'SMITH'::text)Total
runtime:60.36 msec
 
(3 rows)

I created a thread on this topic on the Ars Technica forums, but there was
not a solution - the discussion for reference only is here:
http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=6330927813&
m=1070953065

So should I be using a specific type of index, or is there a better way?



Re: fast case-insensitive sort

From
Stephan Szabo
Date:
On Tue, 15 Apr 2003, Sigi Jekabsons wrote:

> I'm having trouble getting postgres to use an index when doing an ORDER BY
> UPPER(surname), for instance. I can create an index on UPPER(surname), but
> it doesn't use it in the query - is there a better way of doing a fast case
> insensitive sort?

What's the table look like (what type is surname)?  I can get behavior
like this using a varchar, but it uses the index when the field is
declared as text.



Re: fast case-insensitive sort

From
"Sigi Jekabsons"
Date:
The table 'cands' has a 'cand_id' int4 primary key, indexed and specified
unique.  'surname' is varchar(255), indexed in the manner shown on my
previous post.  You're quite right, though, I just tried indexing and
sorting by a column of type text and that worked just fine, it used the
index in the order by.  Why is that?

> What's the table look like (what type is surname)?  I can get behavior
> like this using a varchar, but it uses the index when the field is
> declared as text.
>



Re: fast case-insensitive sort

From
Tom Lane
Date:
"Sigi Jekabsons" <sigi.j@workskillsprofessionals.com.au> writes:
> 'surname' is varchar(255), indexed in the manner shown on my
> previous post.  You're quite right, though, I just tried indexing and
> sorting by a column of type text and that worked just fine, it used the
> index in the order by.  Why is that?

I would have said that the presence of the implicit varchar-to-text cast
prevented the system from detecting that the expression matches the
index value.  But if it's able to detect the match in one case and not
the other, that might be a garden-variety bug.  Will look at it ...
        regards, tom lane



Re: fast case-insensitive sort

From
"Sigi Jekabsons"
Date:
Thanks for that.  Is it recommended, then, that I use type of text on
columns that I intend to sort with an index on UPPER(column), or is there
something else I should be doing?

> I would have said that the presence of the implicit varchar-to-text cast
> prevented the system from detecting that the expression matches the
> index value.  But if it's able to detect the match in one case and not
> the other, that might be a garden-variety bug.  Will look at it ...
>
> regards, tom lane
>
>



Re: fast case-insensitive sort

From
Tom Lane
Date:
"Sigi Jekabsons" <sigi.j@workskillsprofessionals.com.au> writes:
> Thanks for that.  Is it recommended, then, that I use type of text on
> columns that I intend to sort with an index on UPPER(column),

That's the best immediate workaround, for sure.
        regards, tom lane