Re: encoding advice requested - Mailing list pgsql-general

From Rick Schumeyer
Subject Re: encoding advice requested
Date
Msg-id 4558758C.3030705@ieee.org
Whole thread Raw
In response to Re: encoding advice requested  ("Albe Laurenz" <all@adv.magwien.gv.at>)
Responses Re: encoding advice requested
Re: encoding advice requested
List pgsql-general
Albe Laurenz wrote:
>>> My database locale is en_US, and by default my databases are UTF8.
>>>
>>> My application code allows the user to paste text into a box and
>>>
> submit
>
>>> it to the database.  Sometimes the pasted text contains non UTF8
>>> characters, typically the "fancy" forms of quotes and apostrophes.
>>>
> The
>
>>> database does not appreciate it when the application attempts to
>>>
> store
>
>>> these characters.
>>>
>>> What is the best option to deal with this problem?
>>>
>>> a) I think I could re-create the database with a LATIN1 encoding.
>>>
> I'm
>
>>> not real experienced with different encodings, are there any issues
>>>
> with
>
>>> combining en_US and LATIN1?
>>> b) I can issue a SET CLIENT_ENCODING TO 'LATIN1'; statement every
>>>
> time I
>
>>> open a connection.  A brief test indicates this will work.
>>>
>> Be aware that "fancy" quotes and apostrophes are not representable in
>> LATIN1, the closest character set in which they are is probably
>> WIN1252. See http://en.wikipedia.org/wiki/Windows-1252, especially
>> characters in the 0x91-0x94 range.
>> Maybe your application implicitly uses this encoding, especially
>> if it runs under Windows, in which case the more appropriate
>> solution to your problem would be to set the client_encoding to
>> WIN1252 while keeping your database in UTF8.
>>
>
> This is good advice!
>
> To add an answer to your second question:
>
> You can
> ALTER ROLE username SET client_encoding = WIN1252
> to make this encoding the default for this user.
>
> If you want to change the setting for all users connecting
> to this database, you can also
> ALTER DATABASE mydb SET client_encoding = WIN1252
>
> Yours,
> Laurenz Albe
>
I will have to try the WIN1252 encoding.

On the client side, my application is a web browser.  On the server
side, it is php scripts on a linux box.  The data comes from copying
data from a browser window (pointing to another web site) and pasting it
into an html textarea, which is then submitted.

Given this, would you still suggest the WIN1252 encoding?

pgsql-general by date:

Previous
From: "Shoaib Mir"
Date:
Subject: Re: SQL - update table problem...
Next
From: "SunWuKung"
Date:
Subject: chop off non-meaningful digits