Re: PATCH: CITEXT 2.0 v2 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v2
Date
Msg-id 910F6761-133F-4F67-9CA2-1E8771E638EF@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v2  ("David E. Wheeler" <david@kineticode.com>)
List pgsql-hackers
Thanks to help from RhodiumToad on IRC, I got some things improved here:

On Jul 7, 2008, at 16:24, David E. Wheeler wrote:

> So for some reason, after adding the indexes, the queries against  
> the CITEXT column aren't using them. Furthermore, the `lower(text)  
> LIKE lower(?)` query isn't using *its* index. Huh?

I never knew what one needed to use the text_pattern_ops operator  
class to index a column for use with LIKE! I had no clue. Would that  
work for a citext column, too, since it's essentially the same as TEXT?

> So this leaves me with two questions:
>
> 1. For what reason would the query against the citext column *not*  
> use the index?

It turns out that it did use the index if I put `SET enable_seqscan =  
false;` into my script. So with RhodiumToad's direction, I added some  
`RESTRICT` and `JOIN` clauses to my comparison operators (copying them  
from ip4r). So now I have:

CREATE OPERATOR = (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    COMMUTATOR = =,    NEGATOR    = <>,    PROCEDURE
= citext_eq,RESTRICT   = eqsel,JOIN       = eqjoinsel,    HASHES,    MERGES
 
);

CREATE OPERATOR <> (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    NEGATOR    = =,    COMMUTATOR = <>,
PROCEDURE = citext_ne,RESTRICT   = neqsel,JOIN       = neqjoinsel
 
);

CREATE OPERATOR < (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    NEGATOR    = >=,    COMMUTATOR = >,    PROCEDURE
= citext_lt,RESTRICT   = scalarltsel,JOIN       = scalarltjoinsel
 
);

CREATE OPERATOR <= (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    NEGATOR    = >,    COMMUTATOR = <=,
PROCEDURE = citext_le,RESTRICT   = scalarltsel,JOIN       = scalarltjoinsel
 
);

CREATE OPERATOR >= (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    NEGATOR    = <,    COMMUTATOR = <=,
PROCEDURE = citext_ge,RESTRICT   = scalargtsel,JOIN       = scalargtjoinsel
 
);

CREATE OPERATOR > (    LEFTARG    = CITEXT,    RIGHTARG   = CITEXT,    NEGATOR    = <=,    COMMUTATOR = <,    PROCEDURE
= citext_gt,RESTRICT   = scalargtsel,JOIN       = scalargtjoinsel
 
);

With this change, the index was used:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 261.295 ms
SELECT * FROM try WHERE citext = 'food';
Time: 289.304 ms
Time: 1228.961 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 2.018 ms
SELECT * FROM try WHERE citext = 'food';
Time: 0.788 ms

Seems to be faster than the LOWER() version, too, which makes me  
happy. The output from EXPLAIN ANALYZE:

try=# EXPLAIN ANALYZE SELECT * FROM try WHERE citext = 'food';
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
IndexScan using idx_try_citext on try  (cost=0.00..8.31 rows=1  
 
width=119) (actual time=0.324..0.324 rows=0 loops=1)   Index Cond: (citext = 'food'::citext) Total runtime: 0.377 ms
(3 rows)

try=# EXPLAIN ANALYZE SELECT * FROM try WHERE LOWER(text) =  
LOWER('food');
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
BitmapHeap Scan on try  (cost=28.17..1336.10 rows=500 width=119)  
 
(actual time=0.170..0.170 rows=0 loops=1)   Recheck Cond: (lower(text) = 'food'::text)   ->  Bitmap Index Scan on
idx_try_text (cost=0.00..28.04 rows=500  
 
width=0) (actual time=0.168..0.168 rows=0 loops=1)         Index Cond: (lower(text) = 'food'::text) Total runtime:
0.211ms
 
(5 rows)

So my only other question related to this is:

* Are the above RESTRICT and JOIN functions the ones to use, or is  
there some way to make use of those used by the TEXT type that would  
be more appropriate?

> 2. Is there some way to get the CITEXT index to behave like a  
> LOWER() index, that is, so that its value is stored using the result  
> of the str_tolower() function, thus removing some of the overhead of  
> converting the values for each row fetched from the index? (Does  
> this question make any sense?)

Given the performance with an index, I think that this is moot, yes?  
There is, of course, much more overhead for a table scan.

Best,

David



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0
Next
From: imad
Date:
Subject: Re: Data type OID numbers fixed?