Re: PATCH: CITEXT 2.0 v2 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v2
Date
Msg-id 551B62DD-77F5-4CA3-9C6E-14E38A64EF26@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v2  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: PATCH: CITEXT 2.0 v2  ("David E. Wheeler" <david@kineticode.com>)
Re: PATCH: CITEXT 2.0 v2  ("David E. Wheeler" <david@kineticode.com>)
List pgsql-hackers
On Jul 7, 2008, at 08:01, Andrew Dunstan wrote:

> What does still bother me is its performance. I'd like to know if  
> any measurement has been done of using citext vs. a functional index  
> on lower(foo).

Okay, here's a start. The attached script inserts random strings of  
1-10 space-delimited words into text and citext columns, and then  
compares the performance of queries with and without indexes. The  
output for me is as follows:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 254.254 ms
SELECT * FROM try WHERE citext = 'food';
Time: 288.535 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.385 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 236.186 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 235.818 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 1.260 ms
SELECT * FROM try WHERE citext = 'food';
Time: 277.755 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.073 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 238.430 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 238.685 ms
benedict%

So for some reason, after adding the indexes, the queries against the  
CITEXT column aren't using them. Furthermore, the `lower(text) LIKE  
lower(?)` query isn't using *its* index. Huh?

So this leaves me with two questions:

1. For what reason would the query against the citext column *not* use  
the index?

2. Is there some way to get the CITEXT index to behave like a LOWER()  
index, that is, so that its value is stored using the result of the  
str_tolower() function, thus removing some of the overhead of  
converting the values for each row fetched from the index? (Does this  
question make any sense?)

Thanks,

David


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: \SET QUIET and \timing
Next
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0 v2