Albe Laurenz wrote:
>>> My database locale is en_US, and by default my databases are UTF8.
>>>
>>> My application code allows the user to paste text into a box and
>>>
> submit
>
>>> it to the database. Sometimes the pasted text contains non UTF8
>>> characters, typically the "fancy" forms of quotes and apostrophes.
>>>
> The
>
>>> database does not appreciate it when the application attempts to
>>>
> store
>
>>> these characters.
>>>
>>> What is the best option to deal with this problem?
>>>
>>> a) I think I could re-create the database with a LATIN1 encoding.
>>>
> I'm
>
>>> not real experienced with different encodings, are there any issues
>>>
> with
>
>>> combining en_US and LATIN1?
>>> b) I can issue a SET CLIENT_ENCODING TO 'LATIN1'; statement every
>>>
> time I
>
>>> open a connection. A brief test indicates this will work.
>>>
>> Be aware that "fancy" quotes and apostrophes are not representable in
>> LATIN1, the closest character set in which they are is probably
>> WIN1252. See http://en.wikipedia.org/wiki/Windows-1252, especially
>> characters in the 0x91-0x94 range.
>> Maybe your application implicitly uses this encoding, especially
>> if it runs under Windows, in which case the more appropriate
>> solution to your problem would be to set the client_encoding to
>> WIN1252 while keeping your database in UTF8.
>>
>
> This is good advice!
>
> To add an answer to your second question:
>
> You can
> ALTER ROLE username SET client_encoding = WIN1252
> to make this encoding the default for this user.
>
> If you want to change the setting for all users connecting
> to this database, you can also
> ALTER DATABASE mydb SET client_encoding = WIN1252
>
> Yours,
> Laurenz Albe
>
I will have to try the WIN1252 encoding.
On the client side, my application is a web browser. On the server
side, it is php scripts on a linux box. The data comes from copying
data from a browser window (pointing to another web site) and pasting it
into an html textarea, which is then submitted.
Given this, would you still suggest the WIN1252 encoding?