Home > mailing lists

Re: case insensitive match in unicode - Mailing list pgsql-general

From	SunWuKung
Subject	Re: case insensitive match in unicode
Date	March 27, 2006 06:45:16
Msg-id	MPG.1e91e09fca9819ea989693@news.postgresql.org Whole thread Raw
In response to	case insensitive match in unicode (SunWuKung <Balazs.Klein@axelero.hu>)
Responses	Re: case insensitive match in unicode
List	pgsql-general

Tree view

In article <20060327094829.GA30791@svana.org>, kleptog@svana.org says...
> On Mon, Mar 27, 2006 at 11:31:17AM +0200, SunWuKung wrote:
> > I would need to do case insensitive match against a field that contains
> > text in different languages - Greek, Hungarian, Arabic etc.
> > The db encoding is UTF8.
> >
> > So far I found no way to achieve that. I tried converting both strings
> > to the same case and using ~* , but neither worked.
>
> Oh, tricky. Firstly, case-insensetive means different things to
> different locales. For example, in Turkish 'i' is not the lowecase
> version of 'I'. Maybe you've chosen a locale that doesn't do UTF-8? You
> don't specify a platform either. Locale support varies wildly by
> platform.
>
> What you probably want it some kind of accent-insensetive match that
> mean that é, è, ë, e, É, È, E and Ë are all considered to match
> eachother. The way you do that is by converting unicode to a particular
> normal form and then comparing. Unfortunatly, I don't think PostgreSQL
> supplies such a function right now.
>
> However, some server-side procedural languages can do this. If you can
> find one (possibly Perl) that can do the conversion, you can create a
> function to do the mapping.
>
> Have a nice day,
>
This sounds like a very interesting concept.
It wouldn't be 'case insensitive' just insensitive.

The way I imagine it now is a special case of the ~ function.
I create matchgroups in a table and check each character if it is in the
group. If it is I will replace the character with the group in [éÉE],
[oóOÓ??] and do a regexp with that.

What do you think?
B.

pgsql-general by date:

From: JP Glutting
Date: 27 March 2006, 06:43:31
Subject: Error backing up database (Unicode)

From: Ashley Moran
Date: 27 March 2006, 07:00:43
Subject: What to index to speed up my UNION views?

Re: case insensitive match in unicode - Mailing list pgsql-general

Previous

Next