Thread: Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
"Francisco Figueiredo Jr."
Date:
Hi all! I received a bug report about problems with identifiers. If I do the following select: select 1 as tést The column name returned contains strange chars. If I do: select 'tést' as tést the select value returned is ok but the column name is not. :( Is there any configuration I need to change to specify the encoding of the identifiers? I'm using UTF-8 for my database. Thanks in advance. -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior
Re: Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string andidentifier...
From
Andreas Kretschmer
Date:
Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > Hi all! > > I received a bug report about problems with identifiers. > > If I do the following select: > > select 1 as tést > > The column name returned contains strange chars. > > If I do: > > select 'tést' as tést > > the select value returned is ok but the column name is not. :( > > Is there any configuration I need to change to specify the encoding of > the identifiers? test=# select 'tést' as "tést"; tést ------ tést looks okay for me ... Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
"Francisco Figueiredo Jr."
Date:
This is the query our user sent as one of the problems: select "zurück" as zurück3_103_ from "Tabelle" tabellenhib0_ And here is the link to our bug report: http://pgfoundry.org/tracker/index.php?func=detail&aid=1010988&group_id=1000140&atid=590 I didn't test putting double quotes in the column name identifier as the bug report isn't using it. What happens if you remove the double quotes in the column name identifier? Thanks in advance. On Tue, Mar 15, 2011 at 15:21, Andreas Kretschmer <akretschmer@spamfence.net> wrote: > Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > >> Hi all! >> >> I received a bug report about problems with identifiers. >> >> If I do the following select: >> >> select 1 as tést >> >> The column name returned contains strange chars. >> >> If I do: >> >> select 'tést' as tést >> >> the select value returned is ok but the column name is not. :( >> >> Is there any configuration I need to change to specify the encoding of >> the identifiers? > > test=# select 'tést' as "tést"; > tést > ------ > tést > > looks okay for me ... > > > Andreas > -- > Really, I'm not out to destroy Microsoft. That will just be a completely > unintentional side effect. (Linus Torvalds) > "If I was god, I would recompile penguin with --enable-fly." (unknown) > Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior
Re: Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string andidentifier...
From
Andreas Kretschmer
Date:
Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > > What happens if you remove the double quotes in the column name identifier? the same: test=*# select 'tést' as tést; tést ------ tést (1 Zeile) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
"Francisco Figueiredo Jr."
Date:
Hmmmmmmmm, What would change the encoding of the identifiers? Because on my dev machine which unfortunately isn't with me right now I can't get the identifier returned correctly :( I remember that it returns: test=*# select 'tést' as tést; tst ------ tést Is there any config I can change at runtime in order to have it returned correctly? Thanks in advance. On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer <akretschmer@spamfence.net> wrote: > Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > >> >> What happens if you remove the double quotes in the column name identifier? > > the same: > > test=*# select 'tést' as tést; > tést > ------ > tést > (1 Zeile) > > > > Andreas > -- > Really, I'm not out to destroy Microsoft. That will just be a completely > unintentional side effect. (Linus Torvalds) > "If I was god, I would recompile penguin with --enable-fly." (unknown) > Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior
Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
"Francisco Figueiredo Jr."
Date:
Now, I'm using my dev machine. With the tests I'm doing, I can see the following: If I use: select 'seléct' as "seléct"; column name returns ok as expected. If I do: select 'seléct' as seléct; This is the sequence of bytes I receive from postgresql: byte1 - 115 UTF-8 for s byte2 - 101 UTF-8 for e byte3 - 108 UTF-8 for l byte4 - 227 byte5 - 169 byte6 - 99 UTF-8 for c byte7 - 116 UTF-8 for t The problem lies in the byte4. According to [1], the first byte defines how many bytes will compose the UTF-8 char. the problem is that 227 encodes to a binary value of 1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in sequence when actually there are only 2! :( And this seems to be the root of the problem for me. For the select value the correct byte is returned: byte1 - 115 UTF-8 for s byte2 - 101 UTF-8 for e byte3 - 108 UTF-8 for l byte4 - 195 byte5 - 169 byte6 - 99 UTF-8 for c byte7 - 116 UTF-8 for t Where 195 is 1100 0011 which gives two bytes in sequence and the decoder can decode this to the U+00E9 which is the char "é" Do you think this can be related to my machine? I'm using OSX 10.6.6 and I compiled postgresql 9.0.1 from source code. Thanks in advance. [1] - http://en.wikipedia.org/wiki/UTF-8 On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > Hmmmmmmmm, > > What would change the encoding of the identifiers? > > Because on my dev machine which unfortunately isn't with me right now > I can't get the identifier returned correctly :( > > I remember that it returns: > > test=*# select 'tést' as tést; > tst > ------ > tést > > Is there any config I can change at runtime in order to have it > returned correctly? > > Thanks in advance. > > > On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer > <akretschmer@spamfence.net> wrote: >> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: >> >>> >>> What happens if you remove the double quotes in the column name identifier? >> >> the same: >> >> test=*# select 'tést' as tést; >> tést >> ------ >> tést >> (1 Zeile) >> >> >> >> Andreas >> -- >> Really, I'm not out to destroy Microsoft. That will just be a completely >> unintentional side effect. (Linus Torvalds) >> "If I was god, I would recompile penguin with --enable-fly." (unknown) >> Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > > > > -- > Regards, > > Francisco Figueiredo Jr. > Npgsql Lead Developer > http://www.npgsql.org > http://fxjr.blogspot.com > http://twitter.com/franciscojunior > -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior
Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
"Francisco Figueiredo Jr."
Date:
Any ideas?? Would it be possible that Postgresql would be using another encoding for the identifiers when they aren't wrapped by double quotes? On Tue, Mar 15, 2011 at 23:37, Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: > Now, I'm using my dev machine. > > With the tests I'm doing, I can see the following: > > If I use: > > select 'seléct' as "seléct"; > > column name returns ok as expected. > > If I do: > > select 'seléct' as seléct; > > > This is the sequence of bytes I receive from postgresql: > > byte1 - 115 UTF-8 for s > byte2 - 101 UTF-8 for e > byte3 - 108 UTF-8 for l > byte4 - 227 > byte5 - 169 > byte6 - 99 UTF-8 for c > byte7 - 116 UTF-8 for t > > > The problem lies in the byte4. > According to [1], the first byte defines how many bytes will compose > the UTF-8 char. the problem is that 227 encodes to a binary value of > 1110 0011 and so, the UTF-8 decoder will think there are 3 bytes in > sequence when actually there are only 2! :( And this seems to be the > root of the problem for me. > > > For the select value the correct byte is returned: > > byte1 - 115 UTF-8 for s > byte2 - 101 UTF-8 for e > byte3 - 108 UTF-8 for l > byte4 - 195 > byte5 - 169 > byte6 - 99 UTF-8 for c > byte7 - 116 UTF-8 for t > > > Where 195 is 1100 0011 which gives two bytes in sequence and the > decoder can decode this to the U+00E9 which is the char "é" > > Do you think this can be related to my machine? I'm using OSX 10.6.6 > and I compiled postgresql 9.0.1 from source code. > > Thanks in advance. > > > > > [1] - http://en.wikipedia.org/wiki/UTF-8 > > > > > On Tue, Mar 15, 2011 at 15:52, Francisco Figueiredo Jr. > <francisco@npgsql.org> wrote: >> Hmmmmmmmm, >> >> What would change the encoding of the identifiers? >> >> Because on my dev machine which unfortunately isn't with me right now >> I can't get the identifier returned correctly :( >> >> I remember that it returns: >> >> test=*# select 'tést' as tést; >> tst >> ------ >> tést >> >> Is there any config I can change at runtime in order to have it >> returned correctly? >> >> Thanks in advance. >> >> >> On Tue, Mar 15, 2011 at 15:45, Andreas Kretschmer >> <akretschmer@spamfence.net> wrote: >>> Francisco Figueiredo Jr. <francisco@npgsql.org> wrote: >>> >>>> >>>> What happens if you remove the double quotes in the column name identifier? >>> >>> the same: >>> >>> test=*# select 'tést' as tést; >>> tést >>> ------ >>> tést >>> (1 Zeile) >>> >>> >>> >>> Andreas >>> -- >>> Really, I'm not out to destroy Microsoft. That will just be a completely >>> unintentional side effect. (Linus Torvalds) >>> "If I was god, I would recompile penguin with --enable-fly." (unknown) >>> Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° >>> >>> -- >>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-general >>> >> >> >> >> -- >> Regards, >> >> Francisco Figueiredo Jr. >> Npgsql Lead Developer >> http://www.npgsql.org >> http://fxjr.blogspot.com >> http://twitter.com/franciscojunior >> > > > > -- > Regards, > > Francisco Figueiredo Jr. > Npgsql Lead Developer > http://www.npgsql.org > http://fxjr.blogspot.com > http://twitter.com/franciscojunior > -- Regards, Francisco Figueiredo Jr. Npgsql Lead Developer http://www.npgsql.org http://fxjr.blogspot.com http://twitter.com/franciscojunior
Re: Re: [GENERAL] Re: [GENERAL] Different encoding for string values and identifier strings? Or (select 'tést' as tést) returns different values for string and identifier...
From
Tom Lane
Date:
"Francisco Figueiredo Jr." <francisco@npgsql.org> writes: > Would it be possible that Postgresql would be using another encoding > for the identifiers when they aren't wrapped by double quotes? No. I'm betting this is a client-side bug ... but you haven't told us what the client-side code is. regards, tom lane