Thread: Connection terminated but client didn't realise

Connection terminated but client didn't realise

From
David Wheeler
Date:

Hello wise postgresians,

 

We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25 minutes) and then it will usually return the results to the application and everything is golden. However occasionally we see something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’ perspective. From the application’s perspective the connection is still alive, and it sits there forever waiting for the result.

 

postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.202 AEDT [4034]: [4693187-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 CONTEXT:  SQL statement "refresh materialized view cube02.fact_sales"

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.202 AEDT [4034]: [4693188-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 STATEMENT:  select analytics.refresh($1)

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693189-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG:  duration: 1444211.603 ms  execute S_126: select analytics.refresh($1)

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693190-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 DETAIL:  parameters: $1 = 'f'

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693191-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG:  could not receive data from client: Connection reset by peer

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.217 AEDT [4034]: [4693192-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG:  unexpected EOF on client connection with an open transaction

    postgresql-9.5-sdb4.log_2019-11-11_224900.gz:2019-11-11 22:59:04.845 AEDT [4034]: [4693193-1] user=server@usn2082000510,db=usn2082000510,app=PostgreSQL JDBC Driver,client=172.30.3.21 LOG:  disconnection: session time: 82:44:13.962 user=server@usn2082000510 database=usn2082000510 host=172.30.3.21 port=52722

 

From the application side, this is the thread that executes the query

 

        at java.net.SocketInputStream.socketRead0(Native Method)

        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)

        at java.net.SocketInputStream.read(SocketInputStream.java:171)

        at java.net.SocketInputStream.read(SocketInputStream.java:141)

        at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:140)

        at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:109)

        at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:67)

        at org.postgresql.core.PGStream.receiveChar(PGStream.java:288)

        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1963)

        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)

        - locked <0x0000000683a1fe70> (a org.postgresql.core.v3.QueryExecutorImpl)

        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)

        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)

        at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)

        at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:117)

 

Possibly relevant is that there’s another thread attempting to close this connection, and is waiting for lock on 0x0000000683a1fe70 (this is Glassfish’s connection leak reclaiming)

 

        at org.postgresql.core.QueryExecutorBase.getTransactionState(QueryExecutorBase.java:242)

        - waiting to lock <0x0000000683a1fe70> (a org.postgresql.core.v3.QueryExecutorImpl)

        at org.postgresql.jdbc.PgConnection.rollback(PgConnection.java:780)

        at org.postgresql.ds.PGPooledConnection.close(PGPooledConnection.java:86)

        at com.sun.gjc.spi.ManagedConnectionImpl.destroy(ManagedConnectionImpl.java:433)

 

I haven’t confirmed this but my guess is that this second thread doesn’t kick in until much later in the events, well after 2019-11-11 22:59:04.845.

 

 

postgres version: PostgreSQL 9.5.19 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

JDBC driver postgresql-42.1.4

 

 

Does anyone have an idea what might be causing the connection to close? Or perhaps why the JDBC thread hasn’t detected that the connection is closed?

 

TIA

 

 

Best regards,

 

David Wheeler

General Manager Bali Office

Bali T +62 361 475 2333   M +62 819 3660 9180

E dwheeler@dgitsystems.com

Jl. Pura Mertasari No. 7, Sunset Road Abian Base

Kuta, Badung – Bali 80361, Indonesia

http://www.dgitsystems.com

 

signature_1079027276

 

 

 

Attachment

Re: Connection terminated but client didn't realise

From
Tom Lane
Date:
David Wheeler <dwheeler@dgitsystems.com> writes:
> We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25
minutes)and then it will usually return the results to the application and everything is golden. However occasionally
wesee something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’
perspective.From the application’s perspective the connection is still alive, and it sits there forever waiting for the
result.

Is the application remote from the database server?  My gut reaction
to this type of report is "something timed out the network connection",
but there would have to be a router or firewall or the like in between
to make that a tenable explanation.

If that is the issue, you should be able to fix it by making the
server's TCP keepalive settings more aggressive.

            regards, tom lane



Re: Connection terminated but client didn't realise

From
David Wheeler
Date:
>  Is the application remote from the database server?  My gut reaction to this type of report is "something timed out
thenetwork connection", but there would have to be a router or firewall or the like in between to make that a tenable
explanation.
>  If that is the issue, you should be able to fix it by making the server's TCP keepalive settings more aggressive.

Yes the application server is separate from the database server, and the application is running within docker which I
suspectadds some complexity too. I had suspicions about something in the middle closing the connection too, but your
emailhas clarified my thinking a bit. 
 

TCP Keepalive appears to be enabled on the application server and within docker, and the client holds the allegedly
deadconnection for much longer (24h) than the keepalive should take to kill it (<3h), so I think the next step is to
tryto identify the connection at the OS level with netstat to see what state it's in. 
 

Thanks for your help. 


Regards, 
 
David

On 2/12/19, 11:17 pm, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

    David Wheeler <dwheeler@dgitsystems.com> writes:
    > We have a query that our system runs nightly to refresh materialised views. This takes some time to execute (~25
minutes)and then it will usually return the results to the application and everything is golden. However occasionally
wesee something like the below, where the query finishes, but the connection gets unexpectedly closed from Postgres’
perspective.From the application’s perspective the connection is still alive, and it sits there forever waiting for the
result.
    
    Is the application remote from the database server?  My gut reaction
    to this type of report is "something timed out the network connection",
    but there would have to be a router or firewall or the like in between
    to make that a tenable explanation.
    
    If that is the issue, you should be able to fix it by making the
    server's TCP keepalive settings more aggressive.
    
                regards, tom lane