On 04/16/2018 08:16 AM, Keith Fiske wrote:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes,
> I know, please keep the out of support comments to a minimum, thanks :).
>
> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.
>
> Looking at the 8.3 database, it has the client_encoding value set to
> UTF8 and queries seem to work fine. Is this just a bug in the old 8.3
> not enforcing encoding properly?e
AFAIK, SQL_ASCII basically means no encoding:
https://www.postgresql.org/docs/10/static/multibyte.html
"The SQL_ASCII setting behaves considerably differently from the other
settings. When the server character set is SQL_ASCII, the server
interprets byte values 0-127 according to the ASCII standard, while byte
values 128-255 are taken as uninterpreted characters. No encoding
conversion will be done when the setting is SQL_ASCII. Thus, this
setting is not so much a declaration that a specific encoding is in use,
as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII
setting because PostgreSQL will be unable to help you by converting or
validating non-ASCII characters."
What client are you working with?
If psql then its behavior has changed between 8.3 and 10:
https://www.postgresql.org/docs/10/static/release-9-1.html#id-1.11.6.121.3
"
Have psql set the client encoding from the operating system locale by
default (Heikki Linnakangas)
This only happens if the PGCLIENTENCODING environment variable is not set.
"
https://www.postgresql.org/docs/10/static/app-psql.html
"If both standard input and standard output are a terminal, then psql
sets the client encoding to “auto”, which will detect the appropriate
client encoding from the locale settings (LC_CTYPE environment variable
on Unix systems). If this doesn't work out as expected, the client
encoding can be overridden using the environment variable PGCLIENTENCODING."
>
> The other thing I noticed on the 10 instance was that, while the LOCALE
> was set to SQL_ASCII, the COLLATE and CTYPE values for the restored
> databases were en_US.UTF-8. Could this be having an affect? Is there any
> way to see what these values were on the old 8.3 database? The
> pg_database catalog does not have these values stored back then.
>
> --
> Keith Fiske
> Senior Database Engineer
> Crunchy Data - http://crunchydata.com
--
Adrian Klaver
adrian.klaver@aklaver.com