Latest on CITEXT 2.0 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Latest on CITEXT 2.0
Date
Msg-id 0E9C2D67-5F27-4747-B6F2-346BF8C2D81E@kineticode.com
Whole thread Raw
Responses Re: Latest on CITEXT 2.0  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Howdy,

I just wanted to report the latest on my pet project: implementing a  
new case-insensitive text type, "citext", to be locale-aware and to  
build and run on PostgreSQL 8.3. I'm not much of a C programmer (this  
is only the second time I've written *anything* in C), so I also have  
a few questions about my code, best practices, coverage, etc. You can  
grab the latest here:
  https://svn.kineticode.com/citext/trunk/

BTW, the tests in sql/citext.sql use the pgtap.sql file to run TAP  
regression tests. So you can run them using `make installcheck` or  
`make test`. The latter requires that pg_prove be installed; you can  
get it here:
  https://svn.kineticode.com/pgtap/trunk/

Anyway, I think I've got it pretty close to done. The tests cover a  
lot of stuff -- nearly everything I could figure out, anyway. But  
there are a few gaps.

As a result, I'd appreciate a little help with these questions, all in  
the name of making this a solid data type suitable for use on  
production systems:

* There seem to still be some implicit CASTS to text that I'd like to  
duplicate. For example,  select '192.168.1.2'::cidr::text;` works, but  
`select '192.168.1.2'::cidr::citext;` does not. Where can I find the C  
functions that do these casts for TEXT so that I can put them to work  
for citext, too? The internal cast functions used in the old citext  
distribution don't exist at all on 8.3.

* There are casts from text that I'd also like to harness for use by  
citext, like `cidr(text)`. Where can I find these C functions as well?  
(The upshot of this and the previous points is that I'd like citext to  
be as compatible with TEXT as possible, and I just need to figure out  
how to fill in the gaps in that compatibility.)

* Regular expression and LIKE comparisons using the the operators  
properly work case-insensitively, but functions like replace() and  
regexp_replace() do not. Should they? and if so, how can I make them  
do so?

* The tests assume that LC_COLLATE is set to en_US.UTF-8. Does that  
work well for standard PostgreSQL regression tests? How are locale- 
sensitive tests run in core regression tests?

* As for my C programming, well, what's broken? I'm especially  
concerned that I pfree variables appropriately, but I'm not at all  
clear on what needs to be freed. Martijn mentioned before that btree  
comparison functions free memory, but I'm such a C n00b that I don't  
know what that actually means for my implementation. I'd actually  
appreciate a bit of pedantry here. :-)

* Am I in fact getting an appropriate nul-terminated string in my  
cilower() function using this code?
    char * str  = DatumGetCString(        DirectFunctionCall1( textout, PointerGetDatum( arg ) )    );

Those are all the questions I had about my implementation. I'd like to  
get this thing done and released soon, so that I can be done with this  
particular Yak and get back to what I'm *supposed* to be doing with my  
time.

BTW, would there be any interest in this code going into contrib/ in  
the distribution? I think that, if we can ensure that it works just  
like LOWER() = LOWER(), but without requiring that code, then it would  
be a great type to point people to to use instead of that SQL hack  
(with all the usual caveats about it being locale-sensitive and not  
canonically case-insensitive in the Unicode sense). If so, I'd be  
happy to make whatever changes are necessary to make it fit in with  
the coding and organization standards of the core and to submit it.

But please, don't expect a civarchar type from me anytime soon. ;-)

Many thanks,

David


pgsql-hackers by date:

Previous
From: Jan Urbański
Date:
Subject: Re: Creating a VIEW with a POINT column
Next
From: Alvaro Herrera
Date:
Subject: GIT repo broken