Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier... - Mailing list pgsql-general

From Francisco Figueiredo Jr.
Subject Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
Date
Msg-id AANLkTimCMgg=2oTjYw37Rc=WPHZv7MLYsCGg3Zhobo2D@mail.gmail.com
Whole thread Raw
In response to Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...  ("Francisco Figueiredo Jr." <francisco@npgsql.org>)
Responses Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...  ("Francisco Figueiredo Jr." <francisco@npgsql.org>)
List pgsql-general
Now, I'm using my dev machine.

With the tests I'm doing, I can see the following:

If I use:

select 'seléct' as "seléct";

column name returns ok as expected.

If I do:

select 'seléct' as seléct;


This is the sequence of bytes I receive from postgresql:

byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 227
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t


The problem lies in the byte4.
According to [1], the first byte defines how many bytes will compose
the UTF-8 char. the problem is that 227 encodes to a binary value of
1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in
sequence when actually there are only 2! :( And this seems to be the
root of the problem for me.


For the select value the correct byte is returned:

byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 195
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t


Where 195 is 1100 0011 which gives two bytes in sequence and the
decoder can decode this to the U+00E9 which is the char "é"

Do you think this can be related to my machine? I'm using OSX 10.6.6
and I compiled postgresql 9.0.1 from source code.

Thanks in advance.




[1] - http://en.wikipedia.org/wiki/UTF-8




On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr.
<francisco@npgsql.org> wrote:
> Hmmmmmmmm,
>
> What would change the encoding of the identifiers?
>
> Because on my dev machine which unfortunately isn't with me right now
> I can't get the identifier returned correctly :(
>
> I remember that it returns:
>
>  test=*# select 'tést' as tést;
>   tst
>  ------
>   tést
>
> Is there any config I can change at runtime in order to have it
> returned correctly?
>
> Thanks in advance.
>
>
> On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer
> <akretschmer@spamfence.net> wrote:
>> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:
>>
>>>
>>> What happens if you remove the double quotes in the column name identifier?
>>
>> the same:
>>
>> test=*# select 'tést' as tést;
>>  tést
>> ------
>>  tést
>> (1 Zeile)
>>
>>
>>
>> Andreas
>> --
>> Really, I'm not out to destroy Microsoft. That will just be a completely
>> unintentional side effect.                              (Linus Torvalds)
>> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
>> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> Regards,
>
> Francisco Figueiredo Jr.
> Npgsql Lead Developer
> http://www.npgsql.org
> http://fxjr.blogspot.com
> http://twitter.com/franciscojunior
>



--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

pgsql-general by date:

Previous
From: Bill Thoen
Date:
Subject: Re: Partitioned Database and Choosing Subtables
Next
From: tushar nehete
Date:
Subject: how to use savepoint and rollback in function