Thread: Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...

Hi all!

I received a bug report about problems with identifiers.

If I do the following select:

select 1 as tést

The column name returned contains strange chars.

If I do:

select 'tést' as tést

the select value returned is ok but the column name is not. :(

Is there any configuration I need to change to specify the encoding of
the identifiers?

I'm using UTF-8 for my database.

Thanks in advance.


--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:

> Hi all!
>
> I received a bug report about problems with identifiers.
>
> If I do the following select:
>
> select 1 as tést
>
> The column name returned contains strange chars.
>
> If I do:
>
> select 'tést' as tést
>
> the select value returned is ok but the column name is not. :(
>
> Is there any configuration I need to change to specify the encoding of
> the identifiers?

test=# select 'tést' as "tést";
 tést
------
 tést

looks okay for me ...


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

This is the query our user sent as one of the problems:

select "zurück" as zurück3_103_ from "Tabelle" tabellenhib0_

And here is the link to our bug report:

http://pgfoundry.org/tracker/index.php?func=detail&aid=1010988&group_id=1000140&atid=590


I didn't test putting double quotes in the column name identifier as
the bug report isn't using it.

What happens if you remove the double quotes in the column name identifier?

Thanks in advance.



On Tue, Mar 15, 2011 at 15:21, Andreas Kretschmer
<akretschmer@spamfence.net> wrote:
> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:
>
>> Hi all!
>>
>> I received a bug report about problems with identifiers.
>>
>> If I do the following select:
>>
>> select 1 as tést
>>
>> The column name returned contains strange chars.
>>
>> If I do:
>>
>> select 'tést' as tést
>>
>> the select value returned is ok but the column name is not. :(
>>
>> Is there any configuration I need to change to specify the encoding of
>> the identifiers?
>
> test=# select 'tést' as "tést";
>  tést
> ------
>  tést
>
> looks okay for me ...
>
>
> Andreas
> --
> Really, I'm not out to destroy Microsoft. That will just be a completely
> unintentional side effect.                              (Linus Torvalds)
> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:

>
> What happens if you remove the double quotes in the column name identifier?

the same:

test=*# select 'tést' as tést;
 tést
------
 tést
(1 Zeile)



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Hmmmmmmmm,

What would change the encoding of the identifiers?

Because on my dev machine which unfortunately isn't with me right now
I can't get the identifier returned correctly :(

I remember that it returns:

 test=*# select 'tést' as tést;
  tst
 ------
  tést

Is there any config I can change at runtime in order to have it
returned correctly?

Thanks in advance.


On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer
<akretschmer@spamfence.net> wrote:
> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:
>
>>
>> What happens if you remove the double quotes in the column name identifier?
>
> the same:
>
> test=*# select 'tést' as tést;
>  tést
> ------
>  tést
> (1 Zeile)
>
>
>
> Andreas
> --
> Really, I'm not out to destroy Microsoft. That will just be a completely
> unintentional side effect.                              (Linus Torvalds)
> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

Now, I'm using my dev machine.

With the tests I'm doing, I can see the following:

If I use:

select 'seléct' as "seléct";

column name returns ok as expected.

If I do:

select 'seléct' as seléct;


This is the sequence of bytes I receive from postgresql:

byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 227
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t


The problem lies in the byte4.
According to [1], the first byte defines how many bytes will compose
the UTF-8 char. the problem is that 227 encodes to a binary value of
1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in
sequence when actually there are only 2! :( And this seems to be the
root of the problem for me.


For the select value the correct byte is returned:

byte1 - 115 UTF-8 for s
byte2 - 101 UTF-8 for e
byte3 - 108 UTF-8 for l
byte4 - 195
byte5 - 169
byte6 - 99 UTF-8 for c
byte7 - 116 UTF-8 for t


Where 195 is 1100 0011 which gives two bytes in sequence and the
decoder can decode this to the U+00E9 which is the char "é"

Do you think this can be related to my machine? I'm using OSX 10.6.6
and I compiled postgresql 9.0.1 from source code.

Thanks in advance.




[1] - http://en.wikipedia.org/wiki/UTF-8




On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr.
<francisco@npgsql.org> wrote:
> Hmmmmmmmm,
>
> What would change the encoding of the identifiers?
>
> Because on my dev machine which unfortunately isn't with me right now
> I can't get the identifier returned correctly :(
>
> I remember that it returns:
>
>  test=*# select 'tést' as tést;
>   tst
>  ------
>   tést
>
> Is there any config I can change at runtime in order to have it
> returned correctly?
>
> Thanks in advance.
>
>
> On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer
> <akretschmer@spamfence.net> wrote:
>> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:
>>
>>>
>>> What happens if you remove the double quotes in the column name identifier?
>>
>> the same:
>>
>> test=*# select 'tést' as tést;
>>  tést
>> ------
>>  tést
>> (1 Zeile)
>>
>>
>>
>> Andreas
>> --
>> Really, I'm not out to destroy Microsoft. That will just be a completely
>> unintentional side effect.                              (Linus Torvalds)
>> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
>> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
>
> --
> Regards,
>
> Francisco Figueiredo Jr.
> Npgsql Lead Developer
> http://www.npgsql.org
> http://fxjr.blogspot.com
> http://twitter.com/franciscojunior
>



--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

Any ideas??

Would it be possible that Postgresql would be using another encoding
for the identifiers when they aren't wrapped by double quotes?


On Tue, Mar 15, 2011 at 23:37, Francisco Figueiredo Jr.
<francisco@npgsql.org> wrote:
> Now, I'm using my dev machine.
>
> With the tests I'm doing, I can see the following:
>
> If I use:
>
> select 'seléct' as "seléct";
>
> column name returns ok as expected.
>
> If I do:
>
> select 'seléct' as seléct;
>
>
> This is the sequence of bytes I receive from postgresql:
>
> byte1 - 115 UTF-8 for s
> byte2 - 101 UTF-8 for e
> byte3 - 108 UTF-8 for l
> byte4 - 227
> byte5 - 169
> byte6 - 99 UTF-8 for c
> byte7 - 116 UTF-8 for t
>
>
> The problem lies in the byte4.
> According to [1], the first byte defines how many bytes will compose
> the UTF-8 char. the problem is that 227 encodes to a binary value of
> 1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in
> sequence when actually there are only 2! :( And this seems to be the
> root of the problem for me.
>
>
> For the select value the correct byte is returned:
>
> byte1 - 115 UTF-8 for s
> byte2 - 101 UTF-8 for e
> byte3 - 108 UTF-8 for l
> byte4 - 195
> byte5 - 169
> byte6 - 99 UTF-8 for c
> byte7 - 116 UTF-8 for t
>
>
> Where 195 is 1100 0011 which gives two bytes in sequence and the
> decoder can decode this to the U+00E9 which is the char "é"
>
> Do you think this can be related to my machine? I'm using OSX 10.6.6
> and I compiled postgresql 9.0.1 from source code.
>
> Thanks in advance.
>
>
>
>
> [1] - http://en.wikipedia.org/wiki/UTF-8
>
>
>
>
> On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr.
> <francisco@npgsql.org> wrote:
>> Hmmmmmmmm,
>>
>> What would change the encoding of the identifiers?
>>
>> Because on my dev machine which unfortunately isn't with me right now
>> I can't get the identifier returned correctly :(
>>
>> I remember that it returns:
>>
>>  test=*# select 'tést' as tést;
>>   tst
>>  ------
>>   tést
>>
>> Is there any config I can change at runtime in order to have it
>> returned correctly?
>>
>> Thanks in advance.
>>
>>
>> On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer
>> <akretschmer@spamfence.net> wrote:
>>> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote:
>>>
>>>>
>>>> What happens if you remove the double quotes in the column name identifier?
>>>
>>> the same:
>>>
>>> test=*# select 'tést' as tést;
>>>  tést
>>> ------
>>>  tést
>>> (1 Zeile)
>>>
>>>
>>>
>>> Andreas
>>> --
>>> Really, I'm not out to destroy Microsoft. That will just be a completely
>>> unintentional side effect.                              (Linus Torvalds)
>>> "If I was god, I would recompile penguin with --enable-fly."   (unknown)
>>> Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>>
>>
>>
>>
>> --
>> Regards,
>>
>> Francisco Figueiredo Jr.
>> Npgsql Lead Developer
>> http://www.npgsql.org
>> http://fxjr.blogspot.com
>> http://twitter.com/franciscojunior
>>
>
>
>
> --
> Regards,
>
> Francisco Figueiredo Jr.
> Npgsql Lead Developer
> http://www.npgsql.org
> http://fxjr.blogspot.com
> http://twitter.com/franciscojunior
>



--
Regards,

Francisco Figueiredo Jr.
Npgsql Lead Developer
http://www.npgsql.org
http://fxjr.blogspot.com
http://twitter.com/franciscojunior

"Francisco Figueiredo Jr." <francisco@npgsql.org> writes:
> Would it be possible that Postgresql would be using another encoding
> for the identifiers when they aren't wrapped by double quotes?

No.  I'm betting this is a client-side bug ... but you haven't told us
what the client-side code is.

            regards, tom lane