Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade - Mailing list pgsql-general

From Tom Lane
Subject Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date
Msg-id 10581.1523894949@sss.pgh.pa.us
Whole thread Raw
In response to client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade  (Keith Fiske <keith.fiske@crunchydata.com>)
Responses Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
List pgsql-general
Keith Fiske <keith.fiske@crunchydata.com> writes:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes, I
> know, please keep the out of support comments to a minimum, thanks :).

> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.

I'm guessing you might be hitting this 9.1 change:

    * Have psql set the client encoding from the operating system locale
      by default (Heikki Linnakangas)

      This only happens if the PGCLIENTENCODING environment variable is
      not set.

I think the previous default was to set client encoding equal to the
server encoding.

> Looking at the 8.3 database, it has the client_encoding value set to UTF8
> and queries seem to work fine. Is this just a bug in the old 8.3 not
> enforcing encoding properly?

Somewhere along the line we made SQL_ASCII -> something else conversions
check that the data was valid per the other encoding, even though no
actual data change happens.

> The other thing I noticed on the 10 instance was that, while the LOCALE was
> set to SQL_ASCII,

You mean encoding, I assume.

> the COLLATE and CTYPE values for the restored databases
> were en_US.UTF-8. Could this be having an affect?

This is not a great idea, no.  You could be getting strange misbehaviors
in e.g. string comparison, because strcoll() will expect UTF8 data and
will likely not cope well with data that isn't valid in that encoding.

If you can't sanitize the encoding of your data, I'd suggest running
with lc_collate and lc_ctype set to "C".

            regards, tom lane


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Next
From: Keith Fiske
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade