Re: PATCH: CITEXT 2.0 v2 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v2
Date
Msg-id 2FB38675-96B8-4AF6-88F0-9EE271016A38@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v2  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Responses Re: PATCH: CITEXT 2.0 v2  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
On Jul 7, 2008, at 07:41, Zdenek Kotala wrote:

> However, It seems to me that code is ok now (exclude citex_eq). I  
> think there two open problems/questions:
>
> 1) regression test -
>
>  a) I think that regresion is not correct. It depends on en_US  
> locale, but it uses characters which is not in related character  
> repertoire. It means comparing is not defined and I guess it could  
> generate different result on different OS - depends on locale  
> implementation.

That I don't know about. The test requires en_US.UTF-8, at least at  
this point. How are tests run on the build farm? And how else could I  
ensure that comparisons are case-insensitive for non-ASCII characters  
other than requiring a Unicode locale? Or is it just an issue for the  
sort order tests? For those, I could potentially remove accented  
characters, just as long as I'm verifying in other tests that  
comparisons are case-insensitive (without worrying about collation).

>  b) pgTap is something new. Need make a decision if this framework  
> is acceptable or not.

Well, from the point of view of `make installcheck`, it's invisible.  
I've submitted a talk proposal for pgDay.US on ptTAP. I'm happy to  
discuss it further here though, if folks are interested.

> 2) contrib vs. pgFoundry
>
> There is unresolved answer if we want to have this in contrib or  
> not. Good to mention that citext type will be obsoleted with full  
> collation implementation in a future. I personally prefer to keep it  
> on pgFoundry because it is temporally workaround (by my opinion),  
> but I can live with contrib module as well.

I second what Andrew has said in reply to this issue. I'll also say  
that, since people *so* often end up using `WHERE LOWER(col) =  
LOWER(?)`, that it'd be very valuable to have citext in contrib,  
especially since so few folks seem to even know about pgFoundry, let  
alone be able to find it. I mean, look at these search results:
  http://www.google.com/search?q=PostgreSQL%20case-insensitive%20text

My blog entry about this patch is hit #3. pgFoundry (and CITEXT 1) is  
#7. Last time I did a query like this, it didn't turn up at all.

Belive me, I'd love for pgFoundry (or something like it) to become the  
CPAN for PostgreSQL. But without some love and SEO, I don't think  
that's gonna happen.

Besides, CITEXT 2 would be a PITA to maintain for both 8.3 and 8.4,  
given the changes in the string comparison API in CVS.

Thanks,

David


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0
Next
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0 v2