Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade - Mailing list pgsql-general

From Adrian Klaver
Subject Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date
Msg-id d12e4b06-756e-5a61-f6d0-97d9cfeca991@aklaver.com
Whole thread Raw
In response to client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade  (Keith Fiske <keith.fiske@crunchydata.com>)
List pgsql-general
On 04/16/2018 08:16 AM, Keith Fiske wrote:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes, 
> I know, please keep the out of support comments to a minimum, thanks :).
> 
> The old database was in SQL_ASCII and it needs to stay that way for now 
> unfortunately. The dump and restore itself works fine, but we're now 
> running into issues with some data returning encoding errors unless we 
> specifically set the client_encoding value to SQL_ASCII.
> 
> Looking at the 8.3 database, it has the client_encoding value set to 
> UTF8 and queries seem to work fine. Is this just a bug in the old 8.3 
> not enforcing encoding properly?e

AFAIK, SQL_ASCII basically means no encoding:

https://www.postgresql.org/docs/10/static/multibyte.html

"The SQL_ASCII setting behaves considerably differently from the other 
settings. When the server character set is SQL_ASCII, the server 
interprets byte values 0-127 according to the ASCII standard, while byte 
values 128-255 are taken as uninterpreted characters. No encoding 
conversion will be done when the setting is SQL_ASCII. Thus, this 
setting is not so much a declaration that a specific encoding is in use, 
as a declaration of ignorance about the encoding. In most cases, if you 
are working with any non-ASCII data, it is unwise to use the SQL_ASCII 
setting because PostgreSQL will be unable to help you by converting or 
validating non-ASCII characters."


What client are you working with?

If psql then its behavior has changed between 8.3 and 10:

https://www.postgresql.org/docs/10/static/release-9-1.html#id-1.11.6.121.3

"

Have psql set the client encoding from the operating system locale by 
default (Heikki Linnakangas)

This only happens if the PGCLIENTENCODING environment variable is not set.
"

https://www.postgresql.org/docs/10/static/app-psql.html

"If both standard input and standard output are a terminal, then psql 
sets the client encoding to “auto”, which will detect the appropriate 
client encoding from the locale settings (LC_CTYPE environment variable 
on Unix systems). If this doesn't work out as expected, the client 
encoding can be overridden using the environment variable PGCLIENTENCODING."


> 
> The other thing I noticed on the 10 instance was that, while the LOCALE 
> was set to SQL_ASCII, the COLLATE and CTYPE values for the restored 
> databases were en_US.UTF-8. Could this be having an affect? Is there any 
> way to see what these values were on the old 8.3 database? The 
> pg_database catalog does not have these values stored back then.
> 
> -- 
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com


-- 
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Next
From: Tom Lane
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade