Thread: characters converted to ??? in postgres

characters converted to ??? in postgres

From
armand pirvu
Date:
Hi all,


Got the following thing :  ≠, ≤, and ≥ store in the database as question marks according to one of my developers.


I have postgres installed on both MAC OS X and Centos 7

All locale on both point to UTF8

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

insert into jt1 values ('≤') ;

This I can run either copy/paste which is case 1 (which does reproduce the developer issue) , or have it in an sql
scriptwhich case 2 


On OS X:
- case 1 fails
testdb=# insert into jt1 values ('??') ;
ERROR:  invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
Note that at paste time  ≤ changed in ??
- case 2 is fine
- echo -n '≤' |hexdump -C
00000000  e2 89 a4                                          |...|
00000003

On Centos:
- Both cases are fine
- echo -n '≤' |hexdump -C
00000000  e2 89 a4                                          |...|
00000003

http://www.fileformat.info/info/unicode/char/2264/index.htm
UTF-8 (hex)    0xE2 0x89 0xA4 (e289a4)


So to me the representation is fine in all cases. Also in all cases my encoding is UTF8.
I am trying to understand in OS X where does the change occur ? What is causing the failure ?
In the bigger picture a developer complained about this failure and I am fairly sure this is not a postgres issue but I
needto prove it 


Many thanks for help

-- Armand









Re: characters converted to ??? in postgres

From
Peter Eisentraut
Date:
On 1/11/18 16:34, armand pirvu wrote:
> On OS X:
> - case 1 fails
> testdb=# insert into jt1 values ('??') ;
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
> Note that at paste time  ≤ changed in ??

This looks like something is wrong with your libedit library.  Try
running psql with the -n option.  If that helps, then look into building
psql with libreadline instead.  Because libedit is terrible.

> - case 2 is fine
> - echo -n '≤' |hexdump -C
> 00000000  e2 89 a4                                          |...|
> 00000003

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: characters converted to ??? in postgres

From
armand pirvu
Date:
Hi Peter

The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres built
withlibedit instead of libreadline , just curious. 
Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this
badconversion happens, in other words it passes to the backend the ?? characters 
Any similar trick I could use on the postgres jdbc driver ?

Will report more once I find more from him


Many thanks
Armand


> On Jan 11, 2018, at 4:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 1/11/18 16:34, armand pirvu wrote:
>> On OS X:
>> - case 1 fails
>> testdb=# insert into jt1 values ('??') ;
>> ERROR:  invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
>> Note that at paste time  ≤ changed in ??
>
> This looks like something is wrong with your libedit library.  Try
> running psql with the -n option.  If that helps, then look into building
> psql with libreadline instead.  Because libedit is terrible.
>
>> - case 2 is fine
>> - echo -n '≤' |hexdump -C
>> 00000000  e2 89 a4                                          |...|
>> 00000003
>
> --
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: characters converted to ??? in postgres

From
Peter Eisentraut
Date:
On 1/11/18 20:52, armand pirvu wrote:
> The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres
builtwith libedit instead of libreadline , just curious.
 

You'll have to ask that of whoever built the binaries you are using.
There are different sources.

libedit is part of the macOS operating system, whereas libreadline has
to be obtained separately, so that's probably a reason.

> Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this
badconversion happens, in other words it passes to the backend the ?? characters
 
> Any similar trick I could use on the postgres jdbc driver ? 

That appears to be a completely separate issue.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: characters converted to ??? in postgres

From
armand pirvu
Date:
Yes it appears to be a separate issue but using OS X is the only way I could reproduce trying to see with my own eyes
soto speak 
I did though a quick java program and that confirmed to me (at least so far) that the whole issue resides in the so
calleddata object 

Thank you so much for help and insight

— Armand



> On Jan 12, 2018, at 8:10 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>
> On 1/11/18 20:52, armand pirvu wrote:
>> The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres
builtwith libedit instead of libreadline , just curious. 
>
> You'll have to ask that of whoever built the binaries you are using.
> There are different sources.
>
> libedit is part of the macOS operating system, whereas libreadline has
> to be obtained separately, so that's probably a reason.
>
>> Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where
thisbad conversion happens, in other words it passes to the backend the ?? characters 
>> Any similar trick I could use on the postgres jdbc driver ?
>
> That appears to be a completely separate issue.
>
> --
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services