RE: An I/O error occurred while sending to the backend (PG 13.4) - Mailing list pgsql-performance

From ldh@laurent-hasson.com
Subject RE: An I/O error occurred while sending to the backend (PG 13.4)
Date
Msg-id MN2PR15MB2560956DE25947E5E944C96785049@MN2PR15MB2560.namprd15.prod.outlook.com
Whole thread Raw
In response to Re: An I/O error occurred while sending to the backend (PG 13.4)  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: An I/O error occurred while sending to the backend (PG 13.4)  (Ranier Vilela <ranier.vf@gmail.com>)
List pgsql-performance

   >  -----Original Message-----
   >  From: Justin Pryzby <pryzby@telsasoft.com>
   >  Sent: Tuesday, March 1, 2022 14:27
   >  To: ldh@laurent-hasson.com
   >  Cc: pgsql-performance@postgresql.org
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Tue, Mar 01, 2022 at 04:28:31PM +0000, ldh@laurent-hasson.com
   >  wrote:
   >  > Now, there is an additional component I think... Storage is on an array
   >  and I am not getting a clear answer as to where it is 😊 Is it possible that
   >  something is happening at the storage layer? Could that be reported as a
   >  network issue vs a storage issue for Postgres?
   >  
   >  No.  If there were an error with storage, it'd be reported as a local error,
   >  and the query would fail, rather than failing with client-server
   >  communication.
   >  
   >  > Also, both machines are actually VMs. I forgot to mention that and not
   >  sure if that's relevant.
   >  
   >  Are they running on the same hypervisor ?  Is that hyperv ?
   >  Lacking other good hypotheses, that does seem relevant.
   >  
   >  --
   >  Justin

Issue happened again last night. I did implement your recommendations but it didn't seem to prevent the issue:

tcp_keepalives_idle=9        # TCP_KEEPIDLE, in seconds;
                    # 0 selects the system default
tcp_keepalives_interval=9        # TCP_KEEPINTVL, in seconds;
                    # 0 selects the system default
tcp_keepalives_count=0        # TCP_KEEPCNT;
                    # 0 selects the system default
#tcp_user_timeout = 0        # TCP_USER_TIMEOUT, in milliseconds;
                    # 0 selects the system default

On the client application, the exceptions are:

2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Unexpected
error
2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) :
org.pentaho.di.core.exception.KettleStepException:
 
2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop because of:
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -    at
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313)
2022/03/03 01:04:56 - Upsert2.0 -    at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2022/03/03 01:04:56 - Upsert2.0 -    at java.lang.Thread.run(Thread.java:745)
2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -    at org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
2022/03/03 01:04:56 - Upsert2.0 -    at org.pentaho.di.core.database.Database.insertRow(Database.java:1245)
2022/03/03 01:04:56 - Upsert2.0 -    at org.pentaho.di.core.database.Database.insertRow(Database.java:1233)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
2022/03/03 01:04:56 - Upsert2.0 -    ... 2 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to
thebackend.
 
2022/03/03 01:04:56 - Upsert2.0 -    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382)
2022/03/03 01:04:56 - Upsert2.0 -    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
2022/03/03 01:04:56 - Upsert2.0 -    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134)
2022/03/03 01:04:56 - Upsert2.0 -    at org.pentaho.di.core.database.Database.insertRow(Database.java:1288)
2022/03/03 01:04:56 - Upsert2.0 -    ... 6 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: Connection reset
2022/03/03 01:04:56 - Upsert2.0 -    at java.net.SocketInputStream.read(SocketInputStream.java:209)
2022/03/03 01:04:56 - Upsert2.0 -    at java.net.SocketInputStream.read(SocketInputStream.java:141)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:128)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:113)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:73)
2022/03/03 01:04:56 - Upsert2.0 -    at org.postgresql.core.PGStream.receiveChar(PGStream.java:453)
2022/03/03 01:04:56 - Upsert2.0 -    at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2119)
2022/03/03 01:04:56 - Upsert2.0 -    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
2022/03/03 01:04:56 - Upsert2.0 -    ... 11 more

On the DB:

2022-03-03 01:04:40 EST [21228] LOG:  could not receive data from client: An existing connection was forcibly closed by
theremote host.
 
2022-03-03 01:04:40 EST [21228] LOG:  unexpected EOF on client connection with an open transaction
2022-03-03 01:04:40 EST [21228] LOG:  disconnection: session time: 0:02:07.570 user=postgres database=Pepper
host=10.64.17.211port=63686
 
2022-03-03 01:04:41 EST [21160] LOG:  could not receive data from client: An existing connection was forcibly closed by
theremote host.
 
2022-03-03 01:04:41 EST [21160] LOG:  unexpected EOF on client connection with an open transaction
2022-03-03 01:04:41 EST [21160] LOG:  disconnection: session time: 0:02:07.730 user=postgres database=Pepper
host=10.64.17.211port=63688
 

I don't know if that is meaningful, but I see a 15s delay between the timestamp on the database and on the application.
Theservers are synchronized properly. I have asked the IT team to look at the VMs and see if anything strange is
happening.They are not too happy with installing WireShark to do more analysis given the "complexity of the tools and
sizeof the logs" 😊 I keep on pushing.
 

Thank you,
Laurent.


pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: OOM killer while pg_restore
Next
From: Tom Lane
Date:
Subject: Re: OOM killer while pg_restore