Thread: Re: [GENERAL] Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...

Re: [GENERAL] Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...

From

"Francisco Figueiredo Jr."

Date:

18 March 2011, 10:44:23

Oh, sorry for that.

My client code is Npgsql. I pulled those bytes from a debugging session directly from the network stream. I wanted to know what bytes npgsql was receiving.

This is the method which reads the data:

public static String ReadString(Stream network_stream)
{
NpgsqlEventLog.LogMethodEnter(LogLevel.Debug, CLASSNAME, "ReadString");

List<byte> buffer = new List<byte>();
for (int bRead = network_stream.ReadByte(); bRead != 0; bRead = network_stream.ReadByte())
{
if (bRead == -1)
{
throw new IOException();
}
else
{
buffer.Add((byte) bRead);
}
}

            if (NpgsqlEventLog.Level >= LogLevel.Debug)
                NpgsqlEventLog.LogMsg(resman, "Log_StringRead", LogLevel.Debug, ENCODING_UTF8.GetString(buffer.ToArray()));

return ENCODING_UTF8.GetString(buffer.ToArray());
}

My database has encoding set to UTF-8 although my lc_collate is pt.BR.UTF-8 this lc setting my have cause some trouble?

I also have problems with psql client where the char doesn't appear at all. Andreas could see the char though...

I hope it helps.

Thanks in advance.
--
Sent from my Android phone

Francisco Figueiredo Jr.
Npgsql lead developer
fxjr.blogspot.com
twitter.com/franciscojunior

Em 18/03/2011 01:29, "Tom Lane" <tgl@sss.pgh.pa.us> escreveu:

Re: Re: [GENERAL] Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...

From

Tom Lane

Date:

18 March 2011, 22:02:06

"Francisco Figueiredo Jr." <francisco@npgsql.org> writes:
> My database has encoding set to UTF-8 although my lc_collate is pt.BR.UTF-8
> this lc setting my have cause some trouble?

Hmmm ... actually, it strikes me that this may be a downcasing problem.
PG will try to feed an unquoted identifier through tolower(), and that
basically can't work on multibyte characters.  Most implementations of
tolower() are smart enough to not change high-bit-set bytes in UTF8
locales, but maybe your platform's is not ...

            regards, tom lane

Re: [GENERAL] Re: [GENERAL] Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...

From

"Francisco Figueiredo Jr."

Date:

18 March 2011, 22:47:32

Hmmmmm,

I'm using osx 10.6.6 and I compiled PG myself from source. Is there any configure option or any library I may use to get the correct behavior? Is there any runtime setting I can make to change this tolower() behavior, maybe skip the call?

Thanks in advance.

--
Sent from my Android phone

Francisco Figueiredo Jr.
Npgsql lead developer
fxjr.blogspot.com
twitter.com/franciscojunior

Em 18/03/2011 22:01, "Tom Lane" <tgl@sss.pgh.pa.us> escreveu:
> "Francisco Figueiredo Jr." <francisco@npgsql.org> writes:
>> My database has encoding set to UTF-8 although my lc_collate is pt.BR.UTF-8
>> this lc setting my have cause some trouble?
>
> Hmmm ... actually, it strikes me that this may be a downcasing problem.
> PG will try to feed an unquoted identifier through tolower(), and that
> basically can't work on multibyte characters. Most implementations of
> tolower() are smart enough to not change high-bit-set bytes in UTF8
> locales, but maybe your platform's is not ...
>
> regards, tom lane