Thread: encoding and LC_COLLATE
Hi, I set up my postgres 9.0 install 6 months ago and generally everything is fine but a recent data load with an e acute character failed which an unsupported message which surprised me as we're using UTF-8. However, i can now see that the listing for the database set up show a restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the blocking of this character. Oddly, this is set even if i only specify UTF-8 which i guess means that it is set against the template. I can only assume that i selected this option on install but have since forgotten. CREATE DATABASE testing WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default LC_COLLATE = 'English_United Kingdom.1252' LC_CTYPE = 'English_United Kingdom.1252' CONNECTION LIMIT = -1; Is there any way that i can change this, preferably against the template. If i try creating a new database by right clicking at the top of the database tree in pgAdmin i do note that i also have the options of "C" or "POSIX" but have read elsewhere that these are even more restrictive. Any ideas - hoping to avoid a complete re-install! Thanks in advance Andy -- View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4990415.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hi, In response to my own question i have now read the following: http://stackoverflow.com/questions/6579621/lc-collate-and-lc-ctype-suport-for-utf-8-in-postgresql It seems to show that apart from a complete reinstall i should just use a Locale of C (as i'm on Windows and POSIX wont work). This is a bit of an issue as i wanted to use a PostGIS template table with all of the functions assigned but i can only create a new database with Locale C by setting the template as template_0. So, any info on being able to reset the value on an existing table would be much appreciate - otherwise i'll get on to Postgres 9.1 and start reloading all the data... Thanks Andy -- View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4990605.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote: > Hi, > > I set up my postgres 9.0 install 6 months ago and generally everything is > fine but a recent data load with an e acute character failed which an > unsupported message which surprised me as we're using UTF-8. > > However, i can now see that the listing for the database set up show a > restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the > blocking of this character. Oddly, this is set even if i only specify UTF-8 > which i guess means that it is set against the template. I can only assume > that i selected this option on install but have since forgotten. > > CREATE DATABASE testing > WITH OWNER = postgres > ENCODING = 'UTF8' > TABLESPACE = pg_default > LC_COLLATE = 'English_United Kingdom.1252' > LC_CTYPE = 'English_United Kingdom.1252' > CONNECTION LIMIT = -1; > > Is there any way that i can change this, preferably against the template. > > If i try creating a new database by right clicking at the top of the > database tree in pgAdmin i do note that i also have the options of "C" or > "POSIX" but have read elsewhere that these are even more restrictive. > > Any ideas - hoping to avoid a complete re-install! See: http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET- SUPPORTED 22.2.3. Automatic Character Set Conversion Between Server and Client I am going to assume that the data load is the COPY you mentioned in the other post. Before doing the COPY use one of the methods shown in the link above to set the incoming encoding. This of course depends on you knowing what the encoding is of the data you are importing:) > > Thanks in advance > > Andy > > -- Adrian Klaver adrian.klaver@gmail.com
Hi Adrian,
You’re right, i’m trying to get the copy command to put a load of data into a table. It’s now working fine except for any instances with an e acute
I tried putting “ SET CLIENT_ENCODING TO ‘UTF-8’; ” but still got the error. I guess that just because i’m verifying what’s incoming, it doesn’t mean it’s going to go into a database which doesn’t support it?
Let me know if i’m missing something!
Cheers
Andy
From: Adrian Klaver-3 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 15:15
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE
On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote:
> Hi,
>
> I set up my postgres 9.0 install 6 months ago and generally everything is
> fine but a recent data load with an e acute character failed which an
> unsupported message which surprised me as we're using UTF-8.
>
> However, i can now see that the listing for the database set up show a
> restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
> blocking of this character. Oddly, this is set even if i only specify UTF-8
> which i guess means that it is set against the template. I can only assume
> that i selected this option on install but have since forgotten.
>
> CREATE DATABASE testing
> WITH OWNER = postgres
> ENCODING = 'UTF8'
> TABLESPACE = pg_default
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
> CONNECTION LIMIT = -1;
>
> Is there any way that i can change this, preferably against the template.
>
> If i try creating a new database by right clicking at the top of the
> database tree in pgAdmin i do note that i also have the options of "C" or
> "POSIX" but have read elsewhere that these are even more restrictive.
>
> Any ideas - hoping to avoid a complete re-install!
click here.
See how NAML generates this email
View this message in context: RE: encoding and LC_COLLATE
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hi Adrian,
You’re right, i’m trying to get the copy command to put a load of data into a table. It’s now working fine except for any instances with an e acute
I tried putting “ SET CLIENT_ENCODING TO ‘UTF-8’; ” but still got the error. I guess that just because i’m verifying what’s incoming, it doesn’t mean it’s going to go into a database which doesn’t support it?
Let me know if i’m missing something!
First what is the actual error message you are getting?
Second is the data coming in actually UTF8 or some other encoding?
See:
Cheers
Andy
From: Adrian Klaver-3 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 15:15
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE
On Monday, November 14, 2011 3:03:32 am LPlateAndy wrote:
> Hi,
>
> I set up my postgres 9.0 install 6 months ago and generally everything is
> fine but a recent data load with an e acute character failed which an
> unsupported message which surprised me as we're using UTF-8.
>
> However, i can now see that the listing for the database set up show a
> restriction under LC_COLLATE and LC_CTYPE to the UK which would explain the
> blocking of this character. Oddly, this is set even if i only specify UTF-8
> which i guess means that it is set against the template. I can only assume
> that i selected this option on install but have since forgotten.
>
> CREATE DATABASE testing
> WITH OWNER = postgres
> ENCODING = 'UTF8'
> TABLESPACE = pg_default
> LC_COLLATE = 'English_United Kingdom.1252'
> LC_CTYPE = 'English_United Kingdom.1252'
> CONNECTION LIMIT = -1;
>
> Is there any way that i can change this, preferably against the template.
>
> If i try creating a new database by right clicking at the top of the
> database tree in pgAdmin i do note that i also have the options of "C" or
> "POSIX" but have read elsewhere that these are even more restrictive.
>
> Any ideas - hoping to avoid a complete re-install!
click here.
See how NAML generates this email
View this message in context: RE: encoding and LC_COLLATE
--
Adrian Klaver
adrian.klaver@gmail.com
De : pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver >Envoyé : 14 novembre 2011 13:03 >... > >Second is the data coming in actually UTF8 or some other encoding? >... Hi Andy, I have to agree with Adrian in that the data may be coming in under a different encoding. An e acute is a valid character in 1252 encoding. However, if the source computer is using, for example, code page 850, an e acute is hex(82) whereas the equivalent in 1252 is hex(e9). UTF-8 "doesn't like" hex(82). HTH, Mark
Hi Mark (and Adrian),
As as update i've now found the same data fails on my postgres 8 which doesn't seem to have the LC_COLLATE etc setting and is just UTF-8 so i guess there is possibly just something about the way the data is getting passed in.
This is the error message from postgres 9.0 with the LC_COLLATE as previously described:
===============================================
ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
CONTEXT: COPY pointsofinterest, line 2
********** Error **********
ERROR: invalid byte sequence for encoding "UTF8": 0xe92922
SQL state: 22021
Context: COPY pointsofinterest, line 2
===============================================
This is the error message from the postgres 8.1 with just UTF-8 set:
===============================================
ERROR: invalid UTF-8 byte sequence detected near byte 0xe9
CONTEXT: COPY pointsofinterest, line 2, column street_name: "Near Café)"
===============================================
Does that help? Is there an easy way to check exactly what encoding an existing piece of data is in?
Thanks again for your help so far...
Andy
From: Mark Watson-12 [via PostgreSQL] [mailto:[hidden email]]
Sent: 14 November 2011 20:29
To: LPlateAndy
Subject: Re: encoding and LC_COLLATE
De : [hidden email]
[mailto:[hidden email]] De la part de Adrian Klaver
>Envoyé : 14 novembre 2011 13:03
>...
>
>Second is the data coming in actually UTF8 or some other encoding?
>...
Hi Andy,
I have to agree with Adrian in that the data may be coming in under a
different encoding. An e acute is a valid character in 1252 encoding.
However, if the source computer is using, for example, code page 850, an e
acute is hex(82) whereas the equivalent in 1252 is hex(e9). UTF-8 "doesn't
like" hex(82).
HTH,
Mark
--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
click here.
NAML
View this message in context: RE: encoding and LC_COLLATE
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hi Adrian/Mark Thanks again for your help, i have now got the load working by setting the encoding to WIN1252. I had been assuming i was setting it to UTF8 SET CLIENT_ENCODING TO 'WIN1252'; Andy -- View this message in context: http://postgresql.1045698.n5.nabble.com/encoding-and-LC-COLLATE-tp4990415p4994930.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.