Re: PostgreSQL JDBC Driver ERROR: invalid byte sequence for encoding "UTF8": 0x00 on Select stmt - Mailing list pgsql-admin

From Hotmail
Subject Re: PostgreSQL JDBC Driver ERROR: invalid byte sequence for encoding "UTF8": 0x00 on Select stmt
Date
Msg-id BYAPR06MB4023A238CF0690D8B5F8ED00D7460@BYAPR06MB4023.namprd06.prod.outlook.com
Whole thread Raw
In response to PostgreSQL JDBC Driver ERROR: invalid byte sequence for encoding "UTF8": 0x00 on Select stmt  (Hotmail <crajac66@hotmail.com>)
List pgsql-admin
Hi Samed, 

Thanks for the detailed response. I still have few questions that I would like to check my understanding on. 

First let me provide some additional background information. 
  • We moved a 10TB oracle database to postgres prior to the current one we are working with and oracle_fdw consistently terminated our insert jobs when encountering an invalid byte sequence.  We would identify and then fix the invalid data on the oracle side and then resume the inserts successfully until completion. It seems unlikely the oracle_fdw would consistently identify the invalid 0x00  character in the 10TB database and ignore that same string in this much smaller database (100GB). 

  • We tested loading the current data set  with oracle_fdw several times. No invalid byte sequence errors in testing. This makes us feel that the loaded data from Oracle does not have any invalid byte sequences.

  • After the initial load of the data, the cord is cut to Oracle. The application only talks to the postgres database and does not rely on oracle_fdw for retrieving data. 

Turning to the  application and the errors on the select statements. 
  • My  first question is shouldn’t we  see the invalid byte sequence errors on  insert statements if the client application is attempting to insert incompatible data? (We aren’t seeing any errors on inserts)

  • Is our select statement failing because there is an invalid byte sequence in the actual select statement? It would seem the most likely candidate if this the case would be in the variables being passed in by the where clause predicates.

    e.g. select * from table where <col1>=“string w invalid byte sequence” 

  • Is it a valid assumption that if we can "select * from table” with psql that there aren’t any existing invalid byte sequences in the data stored in postgres on disk?

  • Is there a way to identify the actual variable values that are causing the failure with postgres tracing utilities? In the server log we only see the sql statement with the where clause variables replaced with “?”. (We are working with our developers to get the application logs to reveal this information when the exception occurs.)

Thanks again for any additional insight provided. 

Regards, 

Craig Jackson

On Aug 8, 2020, 5:58 AM -0600, Samed YILDIRIM <samed@reddoc.net>, wrote:
Hi,
 
First of all, 0x00 character is not a valid UTF-8 character. And, PostgreSQL is more strict compare to Oracle about character encodings.
select count(*) from <table_name> where instr(<table_cole>, unistr('\0000')) > 0;
If your character encoding of your Postgres database is UTF-8 and data type of the column is a type for storing character, such as varchar, text etc., answer is no. There is no equivalent in Postgres of the query above. Reason is simple, you can't store character 0x00 by the reason that it is not a valid character string :)
 
Most probably, while copying of data from Oracle to Postgres, oracle_fdw or oracle driver it used skipped character 0x00. So, your data in PostgreSQL is clean in the aspect of character 0x00.
 
The question is what is the source of the errors. Answer is that the application itself is the source of the errors. Your application sends invalid UTF-8 strings to database. Also the application is responsible to invalid UTF-8 strings stored in Oracle database. You should check and fix your application.  
Best regards.
Samed YILDIRIM
 
08.08.2020, 12:59, "Hotmail" <crajac66@hotmail.com>:
Hi All, 

We migrated data from Oracle to postgres using oracle_fdw without seeing any invalid byte sequence for encoding "UTF8": 0x00 errors. The data transfer from Oracle to Postgres using oracle_fdw completed successfully without any errors.

After we started our application using Postgres as the new data source we are consistently seeing "invalid byte sequence for encoding "UTF8": 0x00 errors" once or twice a minute on a basic select statement like this:

Select * from <table> where <col1>=‘foo’ and <col2>=‘foo2’;

Not all queries fail just some. We can select * from the table in question from psql without any invalid byte sequence errors (A problem with the jdbc client maybe?)


We went back to the source Oracle database and re-ran queries to try to find any invalid byte sequences and we could not find any. Again the data was copied with oracle_fdw with no errors. 

We would like to know if there are any queries we can run on the postgres table generating this error to identify the source of the invalid byte sequence errors. The query we used in the Oracle db to identify invalid byte sequences does not translate directly because it uses the instr oracle function. Here’s the oracle query:

select count(*) from <table_name> where instr(<table_cole>, unistr('\0000')) > 0;

Is there an equivalent query we could run in Postgres?

Regards,

Craig Jackson

pgsql-admin by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: Re: Recovery from WAL archives not advancing timeline?
Next
From: Ian Lawrence Barwick
Date:
Subject: Re: Recovery from WAL archives not advancing timeline?