index on lower(column) is very slow - Mailing list pgsql-general
From | valerian |
---|---|
Subject | index on lower(column) is very slow |
Date | |
Msg-id | 20030306204939.GA13707@hotpop.com Whole thread Raw |
Responses |
Re: index on lower(column) is very slow
|
List | pgsql-general |
I have the following table, with 4000 rows of random data: Table "public.test" Column | Type | Modifiers ----------+------------------------+----------- id | integer | not null email | character varying(255) | password | character varying(255) | Indexes: test_pkey primary key btree (id), test_email_idx btree (email), test_email_lc_idx btree (lower(email)) In my application, I often have to find the id and password corresponding to a particular email address (in effect, a user's login name is their email address). I want to preserve the case of the email address the user submitted to the system, so converting them all to lowercase before storing them in the DB is not an option. Therefore I decided to build an index on LOWER(email), and although pgsql is using this index in my queries, it seems very slow! Please look at the following queries (I ran 'VACUUM FULL ANALYZE' immediately beforehand): test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN --------------------------------------------------------------------------------- Index Scan using test_email_lc_idx on test (cost=0.00..64.98 rows=20 width=16) Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) Now I drop the index, to see how much benefit it's really providing: test=> DROP INDEX test_email_lc_idx; DROP INDEX test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on test (cost=0.00..97.81 rows=20 width=16) Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) Now I do a query using test_email_idx instead of test_email_lc_idx: test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ---------------------------------------------------------------------------- Index Scan using test_email_idx on test (cost=0.00..5.96 rows=1 width=16) Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying) (2 rows) So it appears that althouth the DB is using the test_email_lc_idx index in the first query, and although it's faster than a straight sequencial scan, it's still extremely slow compared to the third query. Is there a way to speed up my first query, or am I better of having two columns, one containing the email address exactly as submitted by the user, and the other containing its lowercase equivalent (which I would then do all my queries on)? BTW, this is PostgreSQL 7.3.1 running on Debian/Linux 3.0 (i386).
pgsql-general by date: