Home > mailing lists

Re: invalidly encoded strings - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: invalidly encoded strings
Date	September 11, 2007 16:32:02
Msg-id	17382.1189539075@sss.pgh.pa.us Whole thread Raw
In response to	Re: invalidly encoded strings (Alvaro Herrera <alvherre@commandprompt.com>)
List	pgsql-hackers

Tree view

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I think really the technically cleanest solution would be to make
>> convert() return bytea instead of text; then we'd not have to put
>> restrictions on what encoding or locale it's working inside of.
>> However, it's not clear to me whether there are valid usages that
>> that would foreclose.  Tatsuo mentioned length() but bytea has that.

> But length(bytea) cannot count characters, only bytes.

So what?  If you want characters, just count the original text string.
Encoding conversion won't change that.

> Hmm, I wonder if counting chars is consistent regardless of the
> encoding the string is in.  To me it sounds like it should, in which
> case it works to convert to the DB encoding and count chars there.

A conversion that isn't one-for-one is not merely an encoding conversion
IMHO.
        regards, tom lane

pgsql-hackers by date:

From: Alvaro Herrera
Date: 11 September 2007, 16:27:30
Subject: Re: invalidly encoded strings

From: Tom Lane
Date: 11 September 2007, 16:42:07
Subject: Re: pg_dump and money type

Re: invalidly encoded strings - Mailing list pgsql-hackers

Previous

Next