Re: [HACKERS] multi-byte aware char_length() etc. - Mailing list pgsql-hackers

From Thomas G. Lockhart
Subject Re: [HACKERS] multi-byte aware char_length() etc.
Date
Msg-id 3510B230.4A815C51@alumni.caltech.edu
Whole thread Raw
In response to multi-byte aware char_length() etc.  (t-ishii@sra.co.jp)
Responses Re: [HACKERS] multi-byte aware char_length() etc.  (t-ishii@sra.co.jp)
List pgsql-hackers
> I'm planning to modify some string functions so that they would be
> aware of multi-byte strings if compiled with the multi-byte
> capability.  Followings are files I'm going to modify. I would like to
> hear your opinions if you have any.
>
> o character_length()
>
> It seems that the function is implemented as textlen() in
> utils/adt/varlena.c or as varcharlen() in varchar.c. Current
> implementaion returns an octet length rather than a char length. So I
> will change them. However, there might be necessity for getting an
> octet length in some applications. Maybe this is a good chance to add
> SQL92's octet_length().

Yes.

> o lower()/upper()
>
> Implemented in oracle_compat.c. One thing I have noticed is that it
> uses toupper()/tolower(). For ASCII, they are fine. But on some
> platforms (I guess SysV) they might have some problems:
>
>         char c; /* c is an 8-bit letter and this platform uses char as
>                    signed char */
>         toupper(c);     /* may cause segfault or any other bad thing */
>
> So I will change like:
>
>         toupper((unsigned char)c);

I would like to move these routines, as you clean them up, to varlena.c
or whatever Postgres-specific source file is appropriate. Let's leave
oracle_compat.c for non-standard, Oracle-specific functions. Perhaps
eventually we can move any of those which remain to the contrib
directory, assuming that there are good equivalent functions available
in SQL92.

Sort of annoying having oracle_compat when Oracle doesn't return the
favor by having a "postgres_compat". Well, maybe DataBlades are the same
thing?? :)

> o position()
>
> Implemented as textpos() in varlena.c.
>
> o substring()
>
> Implemented as text_substr() in varlena.c.

These two are OK. I'm not yet clear on where in the parser these varlena
functions are matched up with both text and varchar() types. We may need
to do something different as we keep working on getting the
text/varchar/char behavior improved.

pgsql-hackers by date:

Previous
From: t-ishii@sra.co.jp
Date:
Subject: Re: [HACKERS] First mega-patch...
Next
From: Doug Lo
Date:
Subject: Tix + Postgres.