Thread: encoding and LC_COLLATE

encoding and LC_COLLATE

From

LPlateAndy

Date:

14 November 2011, 07:10:55

Hi,

I set up my postgres 9.0 install  6 months ago and generally everything is
fine but a recent data load with an e acute character failed which an
unsupported message which surprised me as we're using UTF-8.

However, i can now see that the listing for the database set up show a
restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
blocking of this character. Oddly, this is set even if i only specify UTF-8
which i guess means that it is set against the template. I can only assume
that i selected this option on install but have since forgotten.

CREATE DATABASE testing
  WITH OWNER = postgres
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'English_United Kingdom.1252'
       LC_CTYPE = 'English_United Kingdom.1252'
       CONNECTION LIMIT = -1;

Is there any way that i can change this, preferably against the template.

If i try creating a new database by right clicking at the top of the
database tree in pgAdmin i do note that i also have the options of "C" or
"POSIX" but have read elsewhere that these are even more restrictive.

Any ideas - hoping to avoid a complete re-install!

Thanks in advance

Andy


--
View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4990415.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: encoding and LC_COLLATE

From

LPlateAndy

Date:

14 November 2011, 08:21:16

Hi,

In response to my own question i have now read the following:

http://stackoverflow.com/questions/6579621/lc-collate-and-lc-ctype-suport-for-utf-8-in-postgresql

It seems to show that apart from a complete reinstall i should just use a
Locale of C (as i'm on Windows and POSIX wont work).

This is a bit of an issue as i wanted to use a PostGIS template table with
all of the functions assigned but i can only create a new database with
Locale C by setting the template as template_0.

So, any info on being able to reset the value on an existing table would be
much appreciate - otherwise i'll get on to Postgres 9.1 and start reloading
all the data...

Thanks

Andy

--
View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4990605.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: encoding and LC_COLLATE

From

Adrian Klaver

Date:

14 November 2011, 11:14:17

On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote:
> Hi,
>
> I set up my postgres 9.0 install  6 months ago and generally everything is
> fine but a recent data load with an e acute character failed which an
> unsupported message which surprised me as we're using UTF-8.
>
> However, i can now see that the listing for the database set up show a
> restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
> blocking of this character. Oddly, this is set even if i only specify UTF-8
> which i guess means that it is set against the template. I can only assume
> that i selected this option on install but have since forgotten.
>
> CREATE DATABASE testing
>   WITH OWNER = postgres
>        ENCODING = 'UTF8'
>        TABLESPACE = pg_default
>        LC_COLLATE = 'English_United Kingdom.1252'
>        LC_CTYPE = 'English_United Kingdom.1252'
>        CONNECTION LIMIT = -1;
>
> Is there any way that i can change this, preferably against the template.
>
> If i try creating a new database by right clicking at the top of the
> database tree in pgAdmin i do note that i also have the options of "C" or
> "POSIX" but have read elsewhere that these are even more restrictive.
>
> Any ideas - hoping to avoid a complete re-install!

See:
http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-
SUPPORTED

22.2.3. Automatic Character Set Conversion Between Server and Client


I am going to assume that the data load is the COPY you mentioned in the other
post. Before doing the COPY use one of the methods shown in the link above to
set the incoming encoding.  This of course depends on you knowing what the
encoding is of the data you are importing:)

>
> Thanks in advance
>
> Andy
>
>


--
Adrian Klaver
adrian.klaver@gmail.com

Re: encoding and LC_COLLATE

From

LPlateAndy

Date:

14 November 2011, 12:25:26

Hi Adrian,

You’re right, i’m trying to get the copy command to put a load of data into a table. It’s now working fine except for any instances with an e acute

I tried putting “ SET CLIENT_ENCODING TO ‘UTF-8’; ” but still got the error. I guess that just because i’m verifying what’s incoming, it doesn’t mean it’s going to go into a database which doesn’t support it?

Let me know if i’m missing something!

Cheers

Andy

From: Adrian Klaver-3 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 15:15
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE

On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote:

> Hi,
>
> I set up my postgres 9.0 install 6 months ago and generally everything is
> fine but a recent data load with an e acute character failed which an
> unsupported message which surprised me as we're using UTF-8.
>
> However, i can now see that the listing for the database set up show a
> restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
> blocking of this character. Oddly, this is set even if i only specify UTF-8
> which i guess means that it is set against the template. I can only assume
> that i selected this option on install but have since forgotten.
>
> CREATE DATABASE testing
> WITH OWNER = postgres
> ENCODING = 'UTF8'
> TABLESPACE = pg_default
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
> CONNECTION LIMIT = -1;
>
> Is there any way that i can change this, preferably against the template.
>
> If i try creating a new database by right clicking at the top of the
> database tree in pgAdmin i do note that i also have the options of "C" or
> "POSIX" but have read elsewhere that these are even more restrictive.
>
> Any ideas - hoping to avoid a complete re-install!

See:
click here.
See how NAML generates this email

View this message in context: RE: encoding and LC_COLLATE
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: encoding and LC_COLLATE

From

Adrian Klaver

Date:

14 November 2011, 14:02:59

On Mon, Nov 14, 2011 at 8:25 AM, LPlateAndy <andy@centremaps.co.uk> wrote:

Hi Adrian,

You’re right, i’m trying to get the copy command to put a load of data into a table. It’s now working fine except for any instances with an e acute

I tried putting “ SET CLIENT_ENCODING TO ‘UTF-8’; ” but still got the error. I guess that just because i’m verifying what’s incoming, it doesn’t mean it’s going to go into a database which doesn’t support it?

Let me know if i’m missing something!

First what is the actual error message you are getting?
Second is the data coming in actually UTF8 or some other encoding?

Cheers

Andy

From: Adrian Klaver-3 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 15:15
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE

On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote:

> Hi,
>
> I set up my postgres 9.0 install 6 months ago and generally everything is
> fine but a recent data load with an e acute character failed which an
> unsupported message which surprised me as we're using UTF-8.
>
> However, i can now see that the listing for the database set up show a
> restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
> blocking of this character. Oddly, this is set even if i only specify UTF-8
> which i guess means that it is set against the template. I can only assume
> that i selected this option on install but have since forgotten.
>
> CREATE DATABASE testing
> WITH OWNER = postgres
> ENCODING = 'UTF8'
> TABLESPACE = pg_default
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
> CONNECTION LIMIT = -1;
>
> Is there any way that i can change this, preferably against the template.
>
> If i try creating a new database by right clicking at the top of the
> database tree in pgAdmin i do note that i also have the options of "C" or
> "POSIX" but have read elsewhere that these are even more restrictive.
>
> Any ideas - hoping to avoid a complete re-install!
See:
click here.
See how NAML generates this email
View this message in context: RE: encoding and LC_COLLATE

Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Adrian Klaver
adrian.klaver@gmail.com

Re: encoding and LC_COLLATE

From

"Mark Watson"

Date:

14 November 2011, 16:27:46

De : pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
>Envoyé : 14 novembre 2011 13:03
>...
>
>Second is the data coming in actually UTF8 or some other encoding?
>... 

Hi Andy,
I have to agree with Adrian in that the data may be coming in under a
different encoding. An e acute is a valid character in 1252 encoding.
However, if the source computer is using, for example, code page 850, an e
acute is hex(82) whereas the equivalent in 1252 is hex(e9). UTF-8 "doesn't
like" hex(82).
HTH,
Mark

Re: encoding and LC_COLLATE

From

LPlateAndy

Date:

15 November 2011, 12:23:55

Hi Mark (and Adrian),

As as update i've now found the same data fails on my postgres 8 which doesn't seem to have the LC_COLLATE etc setting and is just UTF-8 so i guess there is possibly just something about the way the data is getting passed in.

This is the error message from postgres 9.0 with the LC_COLLATE as previously described:

===============================================

ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
CONTEXT: COPY pointsofinterest, line 2

********** Error **********

ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
SQL state: 22021
Context: COPY pointsofinterest, line 2

===============================================

This is the error message from the postgres 8.1 with just UTF-8 set:

===============================================

ERROR: invalid UTF-8 byte sequence detected near byte 0xe9
CONTEXT: COPY pointsofinterest, line 2, column street_name: "Near Café)"

===============================================

Does that help? Is there an easy way to check exactly what encoding an existing piece of data is in?

Thanks again for your help so far...

Andy

From: Mark Watson-12 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 20:29
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE

De : [hidden email]
[mailto:[hidden email]] De la part de Adrian Klaver
>Envoyé : 14 novembre 2011 13:03
>...
>
>Second is the data coming in actually UTF8 or some other encoding?
>...

Hi Andy,
I have to agree with Adrian in that the data may be coming in under a
different encoding. An e acute is a valid character in 1252 encoding.
However, if the source computer is using, for example, code page 850, an e
acute is hex(82) whereas the equivalent in 1252 is hex(e9). UTF-8 "doesn't
like" hex(82).
HTH,
Mark

--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
click here.
NAML

View this message in context: RE: encoding and LC_COLLATE
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: encoding and LC_COLLATE

From

LPlateAndy

Date:

15 November 2011, 12:57:48

Hi Adrian/Mark

Thanks again for your help, i have now got the load working by setting the
encoding to WIN1252. I had been assuming i was setting it to UTF8

SET CLIENT_ENCODING TO 'WIN1252';

Andy

--
View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4994930.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.