Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade - Mailing list pgsql-general

From Keith Fiske
Subject Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Date
Msg-id CAODZiv75hfwLbfH0BWKFdq6c5M5NjyhB+oexLWPDvpJuwBLmeg@mail.gmail.com
Whole thread Raw
In response to Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
List pgsql-general


On Mon, Apr 16, 2018 at 12:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Keith Fiske <keith.fiske@crunchydata.com> writes:
> Running into an issue with helping a client upgrade from 8.3 to 10 (yes, I
> know, please keep the out of support comments to a minimum, thanks :).

> The old database was in SQL_ASCII and it needs to stay that way for now
> unfortunately. The dump and restore itself works fine, but we're now
> running into issues with some data returning encoding errors unless we
> specifically set the client_encoding value to SQL_ASCII.

I'm guessing you might be hitting this 9.1 change:

    * Have psql set the client encoding from the operating system locale
      by default (Heikki Linnakangas)

      This only happens if the PGCLIENTENCODING environment variable is
      not set.

I think the previous default was to set client encoding equal to the
server encoding.

> Looking at the 8.3 database, it has the client_encoding value set to UTF8
> and queries seem to work fine. Is this just a bug in the old 8.3 not
> enforcing encoding properly?

Somewhere along the line we made SQL_ASCII -> something else conversions
check that the data was valid per the other encoding, even though no
actual data change happens.

> The other thing I noticed on the 10 instance was that, while the LOCALE was
> set to SQL_ASCII,

You mean encoding, I assume.

> the COLLATE and CTYPE values for the restored databases
> were en_US.UTF-8. Could this be having an affect?

This is not a great idea, no.  You could be getting strange misbehaviors
in e.g. string comparison, because strcoll() will expect UTF8 data and
will likely not cope well with data that isn't valid in that encoding.

If you can't sanitize the encoding of your data, I'd suggest running
with lc_collate and lc_ctype set to "C".

                        regards, tom lane


Thanks to both of you Adrian & Tom.

It is the 9.1 change to the psql client that seems to be causing this. 

And pg_controldata was able to show that the CTYPE and COLLATE were UTF8 on the old system. If that's the case, do you still think it's a good idea to set the COLLATE and CTYPE to "C"?


--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade
Next
From: Tom Lane
Date:
Subject: Re: client_encoding issue with SQL_ASCII on 8.3 to 10 upgrade