multi-byte aware char_length() etc. - Mailing list pgsql-hackers

From t-ishii@sra.co.jp
Subject multi-byte aware char_length() etc.
Date
Msg-id 199803190409.NAA25741@srapc451.sra.co.jp
Whole thread Raw
List pgsql-hackers
I'm planning to modify some string functions so that they would be
aware of multi-byte strings if compiled with the multi-byte
capability.  Followings are files I'm going to modify. I would like to
hear your opinions if you have any.

o character_length()

It seems that the function is implemented as textlen() in
utils/adt/varlena.c or as varcharlen() in varchar.c. Current
implementaion returns an octet length rather than a char length. So I
will change them. However, there might be necessity for getting an
octet length in some applications. Maybe this is a good chance to add
SQL92's octet_length().

o lower()/upper()

Implemented in oracle_compat.c. One thing I have noticed is that it
uses toupper()/tolower(). For ASCII, they are fine. But on some
platforms (I guess SysV) they might have some problems:

    char c;    /* c is an 8-bit letter and this platform uses char as
           signed char */
    toupper(c);    /* may cause segfault or any other bad thing */

So I will change like:

    toupper((unsigned char)c);

o position()

Implemented as textpos() in varlena.c.

o substring()

Implemented as text_substr() in varlena.c.

--
Tatsuo Ishii
t-ishii@sra.co.jp

pgsql-hackers by date:

Previous
From: "Thomas G. Lockhart"
Date:
Subject: Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6
Next
From: The Hermit Hacker
Date:
Subject: Re: [HACKERS] First mega-patch...