Thread: characters converted to ??? in postgres
Hi all, Got the following thing : ≠, ≤, and ≥ store in the database as question marks according to one of my developers. I have postgres installed on both MAC OS X and Centos 7 All locale on both point to UTF8 LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= insert into jt1 values ('≤') ; This I can run either copy/paste which is case 1 (which does reproduce the developer issue) , or have it in an sql scriptwhich case 2 On OS X: - case 1 fails testdb=# insert into jt1 values ('??') ; ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27 Note that at paste time ≤ changed in ?? - case 2 is fine - echo -n '≤' |hexdump -C 00000000 e2 89 a4 |...| 00000003 On Centos: - Both cases are fine - echo -n '≤' |hexdump -C 00000000 e2 89 a4 |...| 00000003 http://www.fileformat.info/info/unicode/char/2264/index.htm UTF-8 (hex) 0xE2 0x89 0xA4 (e289a4) So to me the representation is fine in all cases. Also in all cases my encoding is UTF8. I am trying to understand in OS X where does the change occur ? What is causing the failure ? In the bigger picture a developer complained about this failure and I am fairly sure this is not a postgres issue but I needto prove it Many thanks for help -- Armand
On 1/11/18 16:34, armand pirvu wrote: > On OS X: > - case 1 fails > testdb=# insert into jt1 values ('??') ; > ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27 > Note that at paste time ≤ changed in ?? This looks like something is wrong with your libedit library. Try running psql with the -n option. If that helps, then look into building psql with libreadline instead. Because libedit is terrible. > - case 2 is fine > - echo -n '≤' |hexdump -C > 00000000 e2 89 a4 |...| > 00000003 -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Peter The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres built withlibedit instead of libreadline , just curious. Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this badconversion happens, in other words it passes to the backend the ?? characters Any similar trick I could use on the postgres jdbc driver ? Will report more once I find more from him Many thanks Armand > On Jan 11, 2018, at 4:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 1/11/18 16:34, armand pirvu wrote: >> On OS X: >> - case 1 fails >> testdb=# insert into jt1 values ('??') ; >> ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27 >> Note that at paste time ≤ changed in ?? > > This looks like something is wrong with your libedit library. Try > running psql with the -n option. If that helps, then look into building > psql with libreadline instead. Because libedit is terrible. > >> - case 2 is fine >> - echo -n '≤' |hexdump -C >> 00000000 e2 89 a4 |...| >> 00000003 > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/11/18 20:52, armand pirvu wrote: > The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres builtwith libedit instead of libreadline , just curious. You'll have to ask that of whoever built the binaries you are using. There are different sources. libedit is part of the macOS operating system, whereas libreadline has to be obtained separately, so that's probably a reason. > Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this badconversion happens, in other words it passes to the backend the ?? characters > Any similar trick I could use on the postgres jdbc driver ? That appears to be a completely separate issue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Yes it appears to be a separate issue but using OS X is the only way I could reproduce trying to see with my own eyes soto speak I did though a quick java program and that confirmed to me (at least so far) that the whole issue resides in the so calleddata object Thank you so much for help and insight — Armand > On Jan 12, 2018, at 8:10 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 1/11/18 20:52, armand pirvu wrote: >> The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres builtwith libedit instead of libreadline , just curious. > > You'll have to ask that of whoever built the binaries you are using. > There are different sources. > > libedit is part of the macOS operating system, whereas libreadline has > to be obtained separately, so that's probably a reason. > >> Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where thisbad conversion happens, in other words it passes to the backend the ?? characters >> Any similar trick I could use on the postgres jdbc driver ? > > That appears to be a completely separate issue. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services