Thread: index on lower(column) is very slow
I have the following table, with 4000 rows of random data: Table "public.test" Column | Type | Modifiers ----------+------------------------+----------- id | integer | not null email | character varying(255) | password | character varying(255) | Indexes: test_pkey primary key btree (id), test_email_idx btree (email), test_email_lc_idx btree (lower(email)) In my application, I often have to find the id and password corresponding to a particular email address (in effect, a user's login name is their email address). I want to preserve the case of the email address the user submitted to the system, so converting them all to lowercase before storing them in the DB is not an option. Therefore I decided to build an index on LOWER(email), and although pgsql is using this index in my queries, it seems very slow! Please look at the following queries (I ran 'VACUUM FULL ANALYZE' immediately beforehand): test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN --------------------------------------------------------------------------------- Index Scan using test_email_lc_idx on test (cost=0.00..64.98 rows=20 width=16) Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) Now I drop the index, to see how much benefit it's really providing: test=> DROP INDEX test_email_lc_idx; DROP INDEX test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on test (cost=0.00..97.81 rows=20 width=16) Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) Now I do a query using test_email_idx instead of test_email_lc_idx: test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ---------------------------------------------------------------------------- Index Scan using test_email_idx on test (cost=0.00..5.96 rows=1 width=16) Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying) (2 rows) So it appears that althouth the DB is using the test_email_lc_idx index in the first query, and although it's faster than a straight sequencial scan, it's still extremely slow compared to the third query. Is there a way to speed up my first query, or am I better of having two columns, one containing the email address exactly as submitted by the user, and the other containing its lowercase equivalent (which I would then do all my queries on)? BTW, this is PostgreSQL 7.3.1 running on Debian/Linux 3.0 (i386).
I just realized that I was behind by one release, so I upgraded to pgsql 7.3.2, then added some more random data (18000 rows in the table now) and ran 'VACUUM FULL ANALYZE' again. Here's the subsequent results: test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ---------------------------------------------------------------------------------- Index Scan using test_email_lc_idx on test (cost=0.00..292.25 rows=91 width=16) Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ---------------------------------------------------------------------------- Index Scan using test_email_idx on test (cost=0.00..6.00 rows=1 width=16) Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying) (2 rows) test=> drop INDEX test_email_lc_idx ; DROP INDEX test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on test (cost=0.00..449.71 rows=91 width=16) Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) (2 rows) But the index on lower(email) still seems abnormally slow. Does anyone have any insight on what may be causing this?
valerian <valerian2@hotpop.com> writes: > I just realized that I was behind by one release, so I upgraded to pgsql > 7.3.2, then added some more random data (18000 rows in the table now) > and ran 'VACUUM FULL ANALYZE' again. Here's the subsequent results: > > test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz'; > QUERY PLAN > ---------------------------------------------------------------------------------- > Index Scan using test_email_lc_idx on test (cost=0.00..292.25 rows=91 width=16) > Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text) > (2 rows) Try "explain analyze" which will actually run the query and print timing information. Also, note that the number of records returned is probably a big factor here. The case-sensitive version is only returning 1 record whereas postgres expects the case=insensitive version to return 91 records. Try the case-sensitive version on a value that has a comparable number of records to for a better test. -- greg
On Fri, Mar 07, 2003 at 09:08:34AM -0500, Greg Stark wrote: > Try "explain analyze" which will actually run the query and print timing > information. > > Also, note that the number of records returned is probably a big factor here. > The case-sensitive version is only returning 1 record whereas postgres expects > the case=insensitive version to return 91 records. Try the case-sensitive > version on a value that has a comparable number of records to for a better > test. I inserted a row with the unique value 'asdf@asdf.com', and here are the results (after running 'VACUUM ANALYZE'): test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE email = 'asdf@asdf.com'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Index Scan using test_email_idx on test (cost=0.00..6.00 rows=1 width=16) (actual time=0.48..0.49 rows=1 loops=1) Index Cond: (email = 'asdf@asdf.com'::character varying) Total runtime: 0.65 msec (3 rows) test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE lower(email) = 'asdf@asdf.com'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using test_email_lc_idx on test (cost=0.00..292.28 rows=91 width=16) (actual time=0.47..0.47 rows=1 loops=1) Index Cond: (lower((email)::text) = 'asdf@asdf.com'::text) Total runtime: 0.63 msec (3 rows) test=> DROP INDEX test_email_lc_idx; DROP INDEX test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE lower(email) = 'asdf@asdf.com'; QUERY PLAN ---------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..449.74 rows=91 width=16) (actual time=109.09..109.39 rows=1 loops=1) Filter: (lower((email)::text) = 'asdf@asdf.com'::text) Total runtime: 109.60 msec (3 rows) I'm not sure why the planner thinks there are 91 rows? But now I can see that the index is working, and that's all that matters in the end. Thanks for the tip!