Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters - Mailing list pgsql-bugs

From Tom Lane
Subject Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Date
Msg-id 8241.1239038328@sss.pgh.pa.us
Whole thread Raw
In response to Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Frans <frans@geodan.nl>)
Responses Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters  (Frans <frans@geodan.nl>)
List pgsql-bugs
Frans <frans@geodan.nl> writes:
> Tom Lane wrote:
>> The
>> fuzzystrmatch module doesn't really work with utf8 (nor any other
>> multibyte encoding), because it depends on the <ctype.h> functions.
>> What you'll probably get when applying it to non-ascii utf8 is
>> an invalidly encoded string.
>>
> Well, in 8.2.6 the result for non-ASCII UTF-8 was an empty string (ASCII
> code 0).

A comparison of the 8.2 and 8.3 fuzzystrmatch sources shows no
difference.  The behavior of the ascii() function has indeed changed,
but soundex() is no more nor less broken than it was before.

[ thinks for a bit... ]  If you are seeing a difference in what soundex
itself does, the most likely explanation is a difference in the behavior
of isalpha() or perhaps toupper().  Are you running on the same
underlying C library as before?  Are you quite sure you have the same
encoding and locale selected?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Frans
Date:
Subject: Re: PostgreSQL 8.3.7: soundex function returns UTF-16 characters
Next
From: "Grzegorz Junka"
Date:
Subject: BUG #4751: Incorrect pg_dump output when dropping not null in inherited table.