Thread: index on lower(column) is very slow

index on lower(column) is very slow

From
valerian
Date:
I have the following table, with 4000 rows of random data:

              Table "public.test"
  Column  |          Type          | Modifiers
----------+------------------------+-----------
 id       | integer                | not null
 email    | character varying(255) |
 password | character varying(255) |
Indexes: test_pkey primary key btree (id),
         test_email_idx btree (email),
         test_email_lc_idx btree (lower(email))

In my application, I often have to find the id and password
corresponding to a particular email address (in effect, a user's login
name is their email address).  I want to preserve the case of the email
address the user submitted to the system, so converting them all to
lowercase before storing them in the DB is not an option.  Therefore I
decided to build an index on LOWER(email), and although pgsql is using this
index in my queries, it seems very slow!  Please look at the following
queries (I ran 'VACUUM FULL ANALYZE' immediately beforehand):

test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Index Scan using test_email_lc_idx on test  (cost=0.00..64.98 rows=20 width=16)
   Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

Now I drop the index, to see how much benefit it's really providing:

test=> DROP INDEX test_email_lc_idx;
DROP INDEX
test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..97.81 rows=20 width=16)
   Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

Now I do a query using test_email_idx instead of test_email_lc_idx:

test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz';
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using test_email_idx on test  (cost=0.00..5.96 rows=1 width=16)
   Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying)
(2 rows)

So it appears that althouth the DB is using the test_email_lc_idx index
in the first query, and although it's faster than a straight sequencial
scan, it's still extremely slow compared to the third query.

Is there a way to speed up my first query, or am I better of having two
columns, one containing the email address exactly as submitted by the
user, and the other containing its lowercase equivalent (which I would
then do all my queries on)?

BTW, this is PostgreSQL 7.3.1 running on Debian/Linux 3.0 (i386).


Re: index on lower(column) is very slow

From
valerian
Date:
I just realized that I was behind by one release, so I upgraded to pgsql
7.3.2, then added some more random data (18000 rows in the table now)
and ran 'VACUUM FULL ANALYZE' again.  Here's the subsequent results:

test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using test_email_lc_idx on test  (cost=0.00..292.25 rows=91 width=16)
   Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

test=> EXPLAIN SELECT id, password FROM test WHERE email = 'pwcm6@pgaxd6hhuteforp966cz';
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Scan using test_email_idx on test  (cost=0.00..6.00 rows=1 width=16)
   Index Cond: (email = 'pwcm6@pgaxd6hhuteforp966cz'::character varying)
(2 rows)

test=> drop INDEX test_email_lc_idx ;
DROP INDEX
test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..449.71 rows=91 width=16)
   Filter: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
(2 rows)

But the index on lower(email) still seems abnormally slow.  Does anyone
have any insight on what may be causing this?


Re: index on lower(column) is very slow

From
Greg Stark
Date:
valerian <valerian2@hotpop.com> writes:

> I just realized that I was behind by one release, so I upgraded to pgsql
> 7.3.2, then added some more random data (18000 rows in the table now)
> and ran 'VACUUM FULL ANALYZE' again.  Here's the subsequent results:
>
> test=> EXPLAIN SELECT id, password FROM test WHERE LOWER(email) = 'pwcm6@pgaxd6hhuteforp966cz';
>                                     QUERY PLAN
> ----------------------------------------------------------------------------------
>  Index Scan using test_email_lc_idx on test  (cost=0.00..292.25 rows=91 width=16)
>    Index Cond: (lower((email)::text) = 'pwcm6@pgaxd6hhuteforp966cz'::text)
> (2 rows)

Try "explain analyze" which will actually run the query and print timing
information.

Also, note that the number of records returned is probably a big factor here.
The case-sensitive version is only returning 1 record whereas postgres expects
the case=insensitive version to return 91 records. Try the case-sensitive
version on a value that has a comparable number of records to for a better
test.

--
greg

Re: index on lower(column) is very slow

From
valerian
Date:
On Fri, Mar 07, 2003 at 09:08:34AM -0500, Greg Stark wrote:
> Try "explain analyze" which will actually run the query and print timing
> information.
>
> Also, note that the number of records returned is probably a big factor here.
> The case-sensitive version is only returning 1 record whereas postgres expects
> the case=insensitive version to return 91 records. Try the case-sensitive
> version on a value that has a comparable number of records to for a better
> test.

I inserted a row with the unique value 'asdf@asdf.com', and here are the
results (after running 'VACUUM ANALYZE'):

test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE email = 'asdf@asdf.com';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Index Scan using test_email_idx on test  (cost=0.00..6.00 rows=1 width=16) (actual time=0.48..0.49 rows=1 loops=1)
   Index Cond: (email = 'asdf@asdf.com'::character varying)
 Total runtime: 0.65 msec
(3 rows)

test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE lower(email) = 'asdf@asdf.com';
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Index Scan using test_email_lc_idx on test  (cost=0.00..292.28 rows=91 width=16) (actual time=0.47..0.47 rows=1
loops=1)
   Index Cond: (lower((email)::text) = 'asdf@asdf.com'::text)
 Total runtime: 0.63 msec
(3 rows)

test=> DROP INDEX test_email_lc_idx;
DROP INDEX
test=> EXPLAIN ANALYZE SELECT id, password FROM test WHERE lower(email) = 'asdf@asdf.com';
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..449.74 rows=91 width=16) (actual time=109.09..109.39 rows=1 loops=1)
   Filter: (lower((email)::text) = 'asdf@asdf.com'::text)
 Total runtime: 109.60 msec
(3 rows)

I'm not sure why the planner thinks there are 91 rows?  But now I can
see that the index is working, and that's all that matters in the end.
Thanks for the tip!