Re: [External] postgres 9.5 DB corruption: invalid byte sequencefor encoding "UTF8" - Mailing list pgsql-general

From Thomas Tignor
Subject Re: [External] postgres 9.5 DB corruption: invalid byte sequencefor encoding "UTF8"
Date
Msg-id 458346306.10942574.1553559949857@mail.yahoo.com
Whole thread Raw
In response to Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8"  ("Brad Nicholson" <bradn@ca.ibm.com>)
Responses Re: [External] postgres 9.5 DB corruption: invalid byte sequence forencoding "UTF8"  ("Brad Nicholson" <bradn@ca.ibm.com>)
List pgsql-general
Hi Brad,
Thanks for writing. As I mentioned to Vijay, the "source" is a JVM using the postgres v42.0.0 JDBC driver. I do not believe we have any explicit encoding set, and so I expect the client encoding is SQL_ASCII. The DB is most definitely UTF8. Our log shows no issue with the input data we've discovered (at the time that it's logged.) If the data is somehow corrupted before inserting, won't the server encoding kick in and generate an error? We can certainly test that.

Tom    :-)


On Monday, March 25, 2019, 3:56:04 PM EDT, Brad Nicholson <bradn@ca.ibm.com> wrote:


Vijaykumar Jain <vjain@opentable.com> wrote on 03/25/2019 03:07:19 PM:


> but why do you think this as db corruption and not just a bad input?
> INVALID URI REMOVED
> u=https-3A__github.com_postgres_postgres_blob_master_src_pl_plperl_expected_plperl-5Flc-5F1.out&d=DwIFaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=BX8eA7xgfVJIpaY_30xSZQ&m=7u71qfQylE2M0dQlbUBn399O53IK1HQHm-
> Unxl9LUzw&s=K6nXHvrx3aX4riGMLnucLoRa76QNC0_TOS5R4AziTVM&e=



This looked interesting to me in the settings below:



>   client_encoding                | SQL_ASCII          | client



Unless you have set this explicitly, it will use the default encoding for the database.  If it hasn't been explicitly set, then the source database (assuming that that output was from the source) is SQL_ASCII.

Double check the database encoding for the source database and target database.  I'm wondering if you have SQL_ASCII for the source, and UTF8 for the target.  If that is the case, you can take invalid UTF8 characters into the source, and they will fail to replicate to the target.  That's not a Postgres problem, but an encoding mismatch


Brad

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: stale WAL files?
Next
From: Thomas Kellerer
Date:
Subject: Re: Forks of pgadmin3?