index on lower(column) is very slow - Mailing list pgsql-general

From valerian
Subject index on lower(column) is very slow
Date
Msg-id 20030306204939.GA13707@hotpop.com
Whole thread Raw
Responses Re: index on lower(column) is very slow
List pgsql-general
I have the following table, with 4000 rows of random data:

              Table "public.test"
  Column  |          Type          | Modifiers
----------+------------------------+-----------
 id       | integer                | not null
 email    | character varying(255) |
 password | character varying(255) |
Indexes: test_pkey primary key btree (id),
         test_email_idx btree (email),
         test_email_lc_idx btree (lower(email))

In my application, I often have to find the id and password
corresponding to a particular email address (in effect, a user's login
name is their email address).  I want to preserve the case of the email
address the user submitted to the system, so converting them all to
lowercase before storing them in the DB is not an option.  Therefore I
decided to build an index on LOWER(email), and although pgsql is using this
index in my queries, it seems very slow!  Please look at the following
queries (I ran 'VACUUM FULL ANALYZE' immediately beforehand):

test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Index Scan using test_email_lc_idx on test  (cost=0.00..64.98 rows=20 width=16)
   Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

Now I drop the index, to see how much benefit it's really providing:

test=> DROP INDEX test_email_lc_idx;
DROP INDEX
test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..97.81 rows=20 width=16)
   Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

Now I do a query using test_email_idx instead of test_email_lc_idx:

test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz';
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using test_email_idx on test  (cost=0.00..5.96 rows=1 width=16)
   Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying)
(2 rows)

So it appears that althouth the DB is using the test_email_lc_idx index
in the first query, and although it's faster than a straight sequencial
scan, it's still extremely slow compared to the third query.

Is there a way to speed up my first query, or am I better of having two
columns, one containing the email address exactly as submitted by the
user, and the other containing its lowercase equivalent (which I would
then do all my queries on)?

BTW, this is PostgreSQL 7.3.1 running on Debian/Linux 3.0 (i386).


pgsql-general by date:

Previous
From: Dennis Gearon
Date:
Subject: Re: triggers
Next
From: Tom Lane
Date:
Subject: Re: first crash