Home > mailing lists

Re: Character Conversions Handling - Mailing list pgsql-hackers

From	Martijn van Oosterhout
Subject	Re: Character Conversions Handling
Date	October 18, 2005 19:45:55
Msg-id	20051018213924.GC13902@svana.org Whole thread Raw
In response to	Character Conversions Handling (Volkan YAZICI <volkan.yazici@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Oct 18, 2005 at 10:29:30PM +0300, Volkan YAZICI wrote:
> Hi,
>
> I'm trying to understand the schema laying behind
> backend/utils/adt/like.c to downcase letters [1]. When I look at the
> other tolower() implementations, there're lots of them spread around.
> (In interfaces/libpq, backend/regex, backend/utils/adt/like and etc.)
> For example, despite having pg_wc_tolower() function in regc_locale.c,
> achieving same with manually in iwchareq() of like.c.
>
> I'd so appreciated if somebody can point me the places where I should
> start to look at to understand the character handling with different
> encodings. Also, I wonder why didn't we use any btow/mbsrtowc/wctomb
> like functions. Is this for portability with other compilers?

PostgreSQL has to be compatable across many platforms, including those
that don't have any multibyte support, and there are a few of those.
Just like PostgreSQL includes a complete copy of the timezone library,
so various bits usually handled by system libraries have been
incorporated into the backend. This include encoding support.

> [1] iwchareq() is using pg_mb2wchar_with_len() which decides the right
> mb2wchar function from pg_wchar_table. When I look at
> backend/mb/wchar.c there're some other specific to locale mblen and
> mb2wchar routines. For example, EUC_KR is handled with
> pg_euc2wchar_with_len() function, but LATIN5 is handled with
> pg_latin12wchar_with_len() function. Will we write a new function for
> latin5 like pg_latin52wchar_with_len() if we'd encounter with a new
> problem with latin5?

In this particular case it's not an issue since all the Latin-N
encodings are all single byte encodings, they don't have to be handled
seperately. But yes, this means that PostgreSQL's behaviour may vary
from that of the surrounding system.

The current planning is to use a cross-platform library (ICU) to handle
all the locale and encoding related issues. This is a large task and I
wouldn't be surprised if it takes a release or two. Hopefully it will
clean all these issues up...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

pgsql-hackers by date:

From: Martijn van Oosterhout
Date: 18 October 2005, 19:42:44
Subject: Re: 2nd try @NetBSD/2.0 Alpha

From: Josh Berkus
Date: 18 October 2005, 20:22:32
Subject: Re: Seeing context switch storm with 10/13 snapshot of

Re: Character Conversions Handling - Mailing list pgsql-hackers

Previous

Next