Thread: Server/Client Encoding Errors

Server/Client Encoding Errors

From
APseudoUtopia
Date:
Hey,

I'm having some problems when inserting special characters into a
column. Here's the table:

----------------------------------
                                     Table "public.users_history_ip"
   Column   |            Type             |                           Modifiers
------------+-----------------------------+---------------------------------------------------------------
 id         | bigint                      | not null default
nextval('users_history_ip_id_seq'::regclass)
 userid     | integer                     | not null
 ip         | inet                        | not null
 hostname   | character varying(512)      | not null
 geoip_info | character varying(512)      | not null
 start_time | timestamp without time zone | not null
 last_seen  | timestamp without time zone | not null
 type       | ip_history_type             | not null
Indexes:
    "users_history_ip_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "users_history_ip_userid_fkey" FOREIGN KEY (userid) REFERENCES
users_main(id) ON DELETE CASCADE
----------------------------------

I'm trying to insert information into the geoip_info column. Here's
some of the information that I'm trying to insert, and the errors:

'Portugal, 09, Vila Real De Santo António'
ERROR:  invalid byte sequence for encoding "UTF8": 0xf36e696f

'Norway, 08, Ålesund'
ERROR:  invalid byte sequence for encoding "UTF8": 0xc56c

'Portugal, 04, Vila Nova De Famalicão'
ERROR:  invalid byte sequence for encoding "UTF8": 0xe36f2c

The locale on the server is "C" and the encoding is "UTF8". I thought
the UTF8 encoding would allow characters like this? Why is it
disallowing it?
Note, the GeoIP info is generated automatically by a module, so I am
unable to determine exactly what characters will be returned.

Thanks for the help.

Re: Server/Client Encoding Errors

From
"Albe Laurenz"
Date:
APseudoUtopia wrote:
> I'm having some problems when inserting special characters into a
> column. Here's the table:
> 
> ----------------------------------
>                                      Table "public.users_history_ip"
>    Column   |            Type             |                           Modifiers
> ------------+-----------------------------+---------------------------------------------------------------
>  id         | bigint                      | not null default nextval('users_history_ip_id_seq'::regclass)
>  userid     | integer                     | not null
>  ip         | inet                        | not null
>  hostname   | character varying(512)      | not null
>  geoip_info | character varying(512)      | not null
>  start_time | timestamp without time zone | not null
>  last_seen  | timestamp without time zone | not null
>  type       | ip_history_type             | not null
> Indexes:
>     "users_history_ip_pkey" PRIMARY KEY, btree (id)
> Foreign-key constraints:
>     "users_history_ip_userid_fkey" FOREIGN KEY (userid) REFERENCES users_main(id) ON DELETE CASCADE
> ----------------------------------
> 
> I'm trying to insert information into the geoip_info column. Here's
> some of the information that I'm trying to insert, and the errors:
> 
> 'Portugal, 09, Vila Real De Santo António'
> ERROR:  invalid byte sequence for encoding "UTF8": 0xf36e696f
> 
> 'Norway, 08, Ålesund'
> ERROR:  invalid byte sequence for encoding "UTF8": 0xc56c
> 
> 'Portugal, 04, Vila Nova De Famalicão'
> ERROR:  invalid byte sequence for encoding "UTF8": 0xe36f2c
> 
> The locale on the server is "C" and the encoding is "UTF8". I thought
> the UTF8 encoding would allow characters like this? Why is it
> disallowing it?
> Note, the GeoIP info is generated automatically by a module, so I am
> unable to determine exactly what characters will be returned.

The UTF8 encoding allows you to store the characters ó, Å and ã, but
you have to encode them correctly.

Judging from the error messages, you have set your client_encoding to UTF8,
but feed data that are encoded in LATIN1 or WIN1252.

If you feed the client LATIN1 data, set client_encoding to LATIN1 so
that PostgreSQL can correctly convert the characters to UTF-8.

Yours,
Laurenz Albe