Thread: UTF8 problem

UTF8 problem

From
"Kai Otto"
Date:

Hi all,

 

I have created a database using version 9.0 and set the client encoding to UTF* in the file postgresql.conf

Runnig the query:

INSERT INTO "JapaneseTest" ("ID", "name") Values(2, '\x83}\x83C \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')

 

Results in:

 

ERROR:  invalid byte sequence for encoding "UTF8": 0x83

 

********** Error **********

 

ERROR: invalid byte sequence for encoding "UTF8": 0x83

SQL state: 22021

 

I am running the query in pgAdmin III

 

 

My table looks like:

 

CREATE TABLE "JapaneseTest"

(

  "ID" bigint NOT NULL,

  "name" text,

  CONSTRAINT "JapaneseTest_pkey" PRIMARY KEY ("ID")

)

WITH (

  OIDS=FALSE

);

ALTER TABLE "JapaneseTest" OWNER TO postgres;

GRANT ALL ON TABLE "JapaneseTest" TO public;

GRANT ALL ON TABLE "JapaneseTest" TO postgres;

 

Can anyone help me?

 

Thanks, MeOtto

Re: UTF8 problem

From
"Jean-Yves F. Barbier"
Date:
On Thu, 17 Nov 2011 13:05:39 +0100
"Kai Otto" <Kai@medis.nl> wrote:

> I have created a database using version 9.0 and set the client encoding
> to UTF* in the file postgresql.conf
>
> Runnig the query:
>
> INSERT INTO "JapaneseTest" ("ID", "name") Values(2, '\x83}\x83C
> \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')

Works perfectly here: Linux, Pg v9.1, pgdamin3 v1.14.0, svr+cli in UTF-8.

What is the DB collation?

> Results in:
> ERROR:  invalid byte sequence for encoding "UTF8": 0x83
> ********** Error **********
> ERROR: invalid byte sequence for encoding "UTF8": 0x83
> SQL state: 22021

== character not in repertoire

> I am running the query in pgAdmin III
> My table looks like:
> CREATE TABLE "JapaneseTest" (
>   "ID" bigint NOT NULL,
>   "name" text,
>   CONSTRAINT "JapaneseTest_pkey" PRIMARY KEY ("ID")
> ) WITH ( OIDS=FALSE );

Are there any *good* reasons to use double quotes everywhere?
(that double complicate your live)

> ALTER TABLE "JapaneseTest" OWNER TO postgres;
> GRANT ALL ON TABLE "JapaneseTest" TO public;
> GRANT ALL ON TABLE "JapaneseTest" TO postgres;

Last line is useless as you already set ownership to user 'postgres'
which gives him the whole control of this table.

--
To be loved is very demoralizing.
        -- Katharine Hepburn

Re: UTF8 problem

From
"Kai Otto"
Date:
Hi,

Two questions:
1. Where can I find that? (I am using Win xp 32 in Europe)
2. I have set the client encoding to UTF8 in the file postgresql.conf.
How do I set the server config? And if this is the server config how do
I set the client config?


> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-
> owner@postgresql.org] On Behalf Of Jean-Yves F. Barbier
> Sent: Thursday, November 17, 2011 3:31 PM
> To: pgsql-novice@postgresql.org
> Subject: Re: [NOVICE] UTF8 problem
>
> On Thu, 17 Nov 2011 13:05:39 +0100
> "Kai Otto" <Kai@medis.nl> wrote:
>
> > I have created a database using version 9.0 and set the client
> encoding
> > to UTF* in the file postgresql.conf
> >
> > Runnig the query:
> >
> > INSERT INTO "JapaneseTest" ("ID", "name") Values(2, '\x83}\x83C
> > \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')
>
> Works perfectly here: Linux, Pg v9.1, pgdamin3 v1.14.0, svr+cli in
UTF-
> 8.
>
> What is the DB collation?
>
> > Results in:
> > ERROR:  invalid byte sequence for encoding "UTF8": 0x83
> > ********** Error **********
> > ERROR: invalid byte sequence for encoding "UTF8": 0x83
> > SQL state: 22021
>
> == character not in repertoire
>
> > I am running the query in pgAdmin III
> > My table looks like:
> > CREATE TABLE "JapaneseTest" (
> >   "ID" bigint NOT NULL,
> >   "name" text,
> >   CONSTRAINT "JapaneseTest_pkey" PRIMARY KEY ("ID")
> > ) WITH ( OIDS=FALSE );
>
> Are there any *good* reasons to use double quotes everywhere?
> (that double complicate your live)
>
> > ALTER TABLE "JapaneseTest" OWNER TO postgres;
> > GRANT ALL ON TABLE "JapaneseTest" TO public;
> > GRANT ALL ON TABLE "JapaneseTest" TO postgres;
>
> Last line is useless as you already set ownership to user 'postgres'
> which gives him the whole control of this table.
>
> --
> To be loved is very demoralizing.
>         -- Katharine Hepburn
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice

Re: UTF8 problem

From
"Kai Otto"
Date:
I have created the database like this:
CREATE DATABASE postgres
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'English_United States.1252'
       LC_CTYPE = 'English_United States.1252'
       CONNECTION LIMIT = -1;

ALTER DEFAULT PRIVILEGES
    GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER
ON TABLES
    TO public;

ALTER DEFAULT PRIVILEGES
    GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER
ON TABLES
    TO postgres;

Does that help?

> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-
> owner@postgresql.org] On Behalf Of Jean-Yves F. Barbier
> Sent: Thursday, November 17, 2011 3:31 PM
> To: pgsql-novice@postgresql.org
> Subject: Re: [NOVICE] UTF8 problem
>
> On Thu, 17 Nov 2011 13:05:39 +0100
> "Kai Otto" <Kai@medis.nl> wrote:
>
> > I have created a database using version 9.0 and set the client
> encoding
> > to UTF* in the file postgresql.conf
> >
> > Runnig the query:
> >
> > INSERT INTO "JapaneseTest" ("ID", "name") Values(2, '\x83}\x83C
> > \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')
>
> Works perfectly here: Linux, Pg v9.1, pgdamin3 v1.14.0, svr+cli in
UTF-
> 8.
>
> What is the DB collation?
>
> > Results in:
> > ERROR:  invalid byte sequence for encoding "UTF8": 0x83
> > ********** Error **********
> > ERROR: invalid byte sequence for encoding "UTF8": 0x83
> > SQL state: 22021
>
> == character not in repertoire
>
> > I am running the query in pgAdmin III
> > My table looks like:
> > CREATE TABLE "JapaneseTest" (
> >   "ID" bigint NOT NULL,
> >   "name" text,
> >   CONSTRAINT "JapaneseTest_pkey" PRIMARY KEY ("ID")
> > ) WITH ( OIDS=FALSE );
>
> Are there any *good* reasons to use double quotes everywhere?
> (that double complicate your live)
>
> > ALTER TABLE "JapaneseTest" OWNER TO postgres;
> > GRANT ALL ON TABLE "JapaneseTest" TO public;
> > GRANT ALL ON TABLE "JapaneseTest" TO postgres;
>
> Last line is useless as you already set ownership to user 'postgres'
> which gives him the whole control of this table.
>
> --
> To be loved is very demoralizing.
>         -- Katharine Hepburn
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice

Re: UTF8 problem

From
"Jean-Yves F. Barbier"
Date:
On Thu, 17 Nov 2011 15:52:05 +0100
"Kai Otto" <Kai@medis.nl> wrote:

OOPS: putting post back into the ML.

> I have created the database like this:
> CREATE DATABASE postgres
>   WITH OWNER = postgres
>        ENCODING = 'UTF8'
>        TABLESPACE = pg_default
>        LC_COLLATE = 'English_United States.1252'
>        LC_CTYPE = 'English_United States.1252'
>        CONNECTION LIMIT = -1;

Ok, so your DB is natively working in UTF-8 but collates to cp1252.

I don't use collate (yet?), but it don't seem a good idea to use it directly
at source (svr) - On this part I'm helpless, there are more experienced
people that should tell you the best way to go.

> ALTER DEFAULT PRIVILEGES
>     GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER
> ON TABLES
>     TO public;

You really don't wanna do that (means anyone w/ CONNECT right to DB will be
able to do whatever on public.* tables).

This is typical of w$ (lack of) security deformation: opening everything,
then closing when in the mood for.
Use the Linux way: close everything and only allow what's needed - meaning:
grant *only* needed privileges on a user basis.

Use also schemas: when you have a large normalized DB it allows you to have
tables with same names and (almost) same content, but in different schemas.

> ALTER DEFAULT PRIVILEGES
>     GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER
> ON TABLES
>     TO postgres;

Totally useless as user postgres is already THE big chief; you should
carefully read (AND experiment) about Pg rights system (and side effects).

And don't use double quotes unless imperative need.
As Pg is by default case insensitive, it is easier to have (and write!):

CREATE TABLE japanesetest (
   id     bigint   PRIMARY KEY,
   name   text
) WITHOUT OIDS;

INSERT INTO japanesetest (id, name) VALUES (2, '\x83}\x83C
\x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')

And if primary-key-number-designation-when-inserting (that's its Indian
name:) isn't really a concern, use a bigserial instead, and autoincrement
it using .... VALUES( default, .....

--
Interchangeable parts won't.

Re: UTF8 problem

From
"Kai Otto"
Date:
> > I have created the database like this:
> > CREATE DATABASE postgres
> >   WITH OWNER = postgres
> >        ENCODING = 'UTF8'
> >        TABLESPACE = pg_default
> >        LC_COLLATE = 'English_United States.1252'
> >        LC_CTYPE = 'English_United States.1252'
> >        CONNECTION LIMIT = -1;
>
> Ok, so your DB is natively working in UTF-8 but collates to cp1252.
>
> I don't use collate (yet?), but it don't seem a good idea to use it
> directly
> at source (svr) - On this part I'm helpless, there are more
experienced
> people that should tell you the best way to go.
>
> > ALTER DEFAULT PRIVILEGES
> >     GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES,
> TRIGGER
> > ON TABLES
> >     TO public;
>
> You really don't wanna do that (means anyone w/ CONNECT right to DB
> will be
> able to do whatever on public.* tables).
>
> This is typical of w$ (lack of) security deformation: opening
> everything,
> then closing when in the mood for.
> Use the Linux way: close everything and only allow what's needed -
> meaning:
> grant *only* needed privileges on a user basis.
>
> Use also schemas: when you have a large normalized DB it allows you to
> have
> tables with same names and (almost) same content, but in different
> schemas.
>
> > ALTER DEFAULT PRIVILEGES
> >     GRANT INSERT, SELECT, UPDATE, DELETE, TRUNCATE, REFERENCES,
> TRIGGER
> > ON TABLES
> >     TO postgres;
>
> Totally useless as user postgres is already THE big chief; you should
> carefully read (AND experiment) about Pg rights system (and side
> effects).
>

Thanks for the suggestions. Will dive into that ASAP.

> And don't use double quotes unless imperative need.
> As Pg is by default case insensitive, it is easier to have (and
> write!):
>
> CREATE TABLE japanesetest (
>    id     bigint   PRIMARY KEY,
>    name   text
> ) WITHOUT OIDS;
>
> INSERT INTO japanesetest (id, name) VALUES (2, '\x83}\x83C
> \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')
>
> And if primary-key-number-designation-when-inserting (that's its
Indian
> name:) isn't really a concern, use a bigserial instead, and
> autoincrement
> it using .... VALUES( default, .....
>
> --
> Interchangeable parts won't.

Without double quotes the table is not recognized:
      SELECT * FROM JapaneseTest
Results in:
      ERROR:  relation "japanesetest" does not exist

For the ID column, I am actually using uuid. But for testing this was
easier.

Anyway, thanks for your help.

Re: UTF8 problem

From
"Jean-Yves F. Barbier"
Date:
On Fri, 18 Nov 2011 09:53:31 +0100
"Kai Otto" <Kai@medis.nl> wrote:

...
> > And don't use double quotes unless imperative need.
> > As Pg is by default case insensitive, it is easier to have (and
> > write!):
> >
> > CREATE TABLE japanesetest (
> >    id     bigint   PRIMARY KEY,
> >    name   text
> > ) WITHOUT OIDS;
> >
> > INSERT INTO japanesetest (id, name) VALUES (2, '\x83}\x83C
> > \x83h\x83L\x83\x85\x83\x81\x83\x93\x83g (My Documents)')
> >
> > And if primary-key-number-designation-when-inserting (that's its
> Indian
> > name:) isn't really a concern, use a bigserial instead, and
> > autoincrement
> > it using .... VALUES( default, .....
> >
> > --
> > Interchangeable parts won't.
>
> Without double quotes the table is not recognized:
>       SELECT * FROM JapaneseTest
> Results in:
>       ERROR:  relation "japanesetest" does not exist

Of course it don't work, this is because double-quotes render Pg case sensitive,
this is why I rewrote the whole shebang.
Retest with the (whole) above syntax.

> For the ID column, I am actually using uuid. But for testing this was
> easier.

IF you need a special UUID, then use it; however if it's not mandatory,
use a serial or a big serial - shorter and faster and optimized to process
for Pg instead of a char(36) for UUID.

> Anyway, thanks for your help.



--
X-rated movies are all alike ... the only thing they leave to the
imagination is the plot.